NLP Content Store
The NLP content store contains over 350m+ biomedical documents
Answer key questions such as:
Access to Linguamatics ClinicalTrials.gov for text mining enables researchers to assess clinical trial inclusion/exclusion criteria for patient selection, trial site evaluation and study design as well as to discover competitive intelligence around companies, diseases, targets and novel drugs.
As well as the standard ontologies, the FAERS index includes its own domain-specific ontologies containing classes which can be used to filter or display FAERS documents by the contents of their structured fields (sections within the documents).
FDA Drug Labels provide a rich source of detailed intelligence on marketed drug products, including mechanism of action, pharmacology, safety/toxicity data, adverse events, contra-indications, and information on preclinical and clinical study outcomes.
The Content Store GEO index contains enriched versions of each series with ontology mapping providing the ability to search for genes, organisms, chemicals, numerical information, etc. via synonyms, common names, etc.
PubMed is an excellent source of biomedical research knowledge, covering decades of published articles from academic journals covering biochemistry, medicine, nursing, pharmacy, dentistry, veterinary medicine, and health care.
Access to NIH grants can facilitate the development of new collaborations, and provide information on most recent research challenges, through the rapid discovery and recommendation of researchers, key opinion leaders, current expertise, and resources.
Curated at John Hopkins University, OMIM has data on over 12000 genes, 5000 phenotypes, and provides a powerful resource for mining genotype-phenotype relationships, for target identification, personalized medicine and pharmacogenomics. Use cases for OMIM data include early discovery projects, to search for novel mechanisms and protein targets for disease areas; and in clinical projects to look at patient stratification, or diagnostic gene variant annotations.
As well as the standard ontologies, the Patents index includes its own domain-specific ontologies providing concepts specific to patent classification using Cooperative Patent Classification (CPC).
The Patent Solution allows users to generate powerful and bespoke queries for patent search and analysis, for patent landscapes, white space analyses, freedom-to-operate searches, research methodologies, competitive intelligence and state-of-the-art reviews for confident decision making.
PubMed Central provides a valuable source of biomedical research knowledge; in particular, access to the full-text papers can facilitate extraction of specific methods, assays, or details of healthcare costs, patient outcomes, and other in-depth information.
bioRxiv is a free online archive and distribution service for unpublished preprints in the life sciences. By posting preprints on bioRxiv, authors are able to make their findings immediately available to the scientific community and receive feedback on draft manuscripts before they are submitted to journals. By posting on bioRxiv, authors explicitly consent to text mining of their work.
medRxiv is a free online archive and distribution service for complete but unpublished preprints in medical, clinical and related health sciences. medRxiv provides free and unrestricted access to all the articles posted on the server for both human readers and machine analysis.
The partnership with Springer Nature offers full-text article access to 600 Springer Nature life sciences journals from 1997 to 2020. This includes 76 Nature branded journals, with all content being updated as new articles are published.
Linguamatics biomedical terminologies enable identification, extraction and normalization of over a million concepts, covering a wide variety of life science domains: diseases, genes, proteins, biomarkers, gene variants & mutations, phenotypes, drugs, adverse events, biological processes, organs, tissues and cells.
Healthcare terminologies are integrated into Linguamatics platform covering key medical domains and categories. These are recognized using a combination of standard ontologies, pattern-based approaches and linguistic rules to enable the context around any patient variable to be taken into account (e.g. a family's history of disease). They are often used alongside the biomedical terminologies to maximize the amount of information that can be extracted from medical records.
Healthcare terminologies are valuable for identifying key patient data from a variety of medical records, including patient problem lists, disease history and vital signs (blood pressure, heart rate, pulse, respiratory rate, temperature, gender and age). Lifestyle factors such as smoking, drug use, alcohol consumption, exercise, diet and sexual activity can also be analyzed.
Chemical entities can be found using ChEBI, MeSH and the NCI Thesaurus. In addition, the Linguamatics ChemAxon add-on identifies known and novel chemical structures within documents: by name, structure, substructure or similarity.
Linguamatics provides a pattern ontology that enables the identification and extraction of many different pharmaceutical company chemical identifiers (such as LY-170053, SQ 34676, ICI 204, 219).
Linguamatics provides pattern ontologies that identify numerical data, such as times, dates, numerics, and units of measurement. These allow for the identification of concepts that can be expressed in many ways, extend search by annotating novel textual descriptions of key concepts or concept types and normalize results to greatly simplify downstream analysis.
Information on organizations can be extracted and categorized by sector, type and geographic location. Searching by sector allows named pharmaceutical companies, universities or government agencies to be extracted. Organization types are also available, using linguistic rules and patterns to automatically detect whether an entity is a corporation, division, hospital or institute. Organizations can also be identified by geographical location (region, country, state or city). In addition, pattern ontologies allow for the identification of telephone numbers, names of people, and email addresses.
Linguamatics supports bespoke or custom vocabularies. These can be imported from academic or commercial sources. In-house vocabularies can also be employed, for example: a dictionary of employees from an organizational chart, or a controlled vocabulary for an internal drug development project.
Linguamatics incorporates data from the sources in the Content Store to provide source-specific dictionaries. These include Patent classification codes, listings of product names in FDA Drug Labels and specific FAERS terms.