NLP Content Store
The NLP content store contains over 350m+ biomedical documents
Answer key questions such as:
PubMed is an excellent source of biomedical research knowledge, covering decades of published articles from academic journals covering biochemistry, medicine, nursing, pharmacy, dentistry, veterinary medicine, and health care.
PubMed Central provides a valuable source of biomedical research knowledge; in particular, access to the full-text papers can facilitate extraction of specific methods, assays, or details of healthcare costs, patient outcomes, and other in-depth information.
The collaboration with Insightmeme and Pharmaspectra grants access to an industry-leading scientific conference database. With nearly 2 million abstracts uploaded annually, 55% of which are from major conferences and available before the event, users can stay ahead of the science and identify new KOLs and rising stars before the competition.
bioRxiv is a free online archive and distribution service for unpublished preprints in the life sciences. By posting preprints on bioRxiv, authors are able to make their findings immediately available to the scientific community and receive feedback on draft manuscripts before they are submitted to journals. By posting on bioRxiv, authors explicitly consent to text mining of their work.
medRxiv is a free online archive and distribution service for complete but unpublished preprints in medical, clinical and related health sciences. medRxiv provides free and unrestricted access to all the articles posted on the server for both human readers and machine analysis.
Easy access to reference drug label sources allows pharmaceutical regulatory, safety and medical affairs teams to quickly verify critical information about drug indications, dosages, and safety warnings. This ensures that they can provide accurate and timely responses to stakeholders (including healthcare professionals, patients, and regulatory bodies), enhancing patient safety and compliance. Additionally, it streamlines the process of updating and maintaining drug information, which is crucial for effective communication and decision-making within the industry. The Linguamatics Content Store provides access to a suite of drug labels from key reference regulatory bodies, including FDA, EMA, UK, France, Canada and Spain.
As well as the standard ontologies, the FAERS index includes its own domain-specific ontologies containing classes which can be used to filter or display FAERS documents by the contents of their structured fields (sections within the documents).
Access to Linguamatics ClinicalTrials.gov for text mining enables researchers to assess clinical trial inclusion/exclusion criteria for patient selection, trial site evaluation and study design as well as to discover competitive intelligence around companies, diseases, targets and novel drugs.
As well as the standard ontologies, the Patents index includes its own domain-specific ontologies providing concepts specific to patent classification using Cooperative Patent Classification (CPC).
The Patent Solution allows users to generate powerful and bespoke queries for patent search and analysis, for patent landscapes, white space analyses, freedom-to-operate searches, research methodologies, competitive intelligence and state-of-the-art reviews for confident decision making.
Access to NIH grants can facilitate the development of new collaborations, and provide information on most recent research challenges, through the rapid discovery and recommendation of researchers, key opinion leaders, current expertise, and resources.
The Content Store GEO index contains enriched versions of each series with ontology mapping providing the ability to search for genes, organisms, chemicals, numerical information, etc. via synonyms, common names, etc.
Linguamatics biomedical terminologies enable identification, extraction and normalization of over a million concepts, covering a wide variety of life science domains: diseases, genes, proteins, biomarkers, gene variants & mutations, phenotypes, drugs, adverse events, biological processes, organs, tissues and cells.
Healthcare terminologies are integrated into Linguamatics platform covering key medical domains and categories. These are recognized using a combination of standard ontologies, pattern-based approaches and linguistic rules to enable the context around any patient variable to be taken into account (e.g. a family's history of disease). They are often used alongside the biomedical terminologies to maximize the amount of information that can be extracted from medical records.
Healthcare terminologies are valuable for identifying key patient data from a variety of medical records, including patient problem lists, disease history and vital signs (blood pressure, heart rate, pulse, respiratory rate, temperature, gender and age). Lifestyle factors such as smoking, drug use, alcohol consumption, exercise, diet and sexual activity can also be analyzed.
Chemical entities can be found using ChEBI, MeSH and the NCI Thesaurus. In addition, the Linguamatics ChemAxon add-on identifies known and novel chemical structures within documents: by name, structure, substructure or similarity.
Linguamatics provides a pattern ontology that enables the identification and extraction of many different pharmaceutical company chemical identifiers (such as LY-170053, SQ 34676, ICI 204, 219).
Linguamatics provides pattern ontologies that identify numerical data, such as times, dates, numerics, and units of measurement. These allow for the identification of concepts that can be expressed in many ways, extend search by annotating novel textual descriptions of key concepts or concept types and normalize results to greatly simplify downstream analysis.
Information on organizations can be extracted and categorized by sector, type and geographic location. Searching by sector allows named pharmaceutical companies, universities or government agencies to be extracted. Organization types are also available, using linguistic rules and patterns to automatically detect whether an entity is a corporation, division, hospital or institute. Organizations can also be identified by geographical location (region, country, state or city). In addition, pattern ontologies allow for the identification of telephone numbers, names of people, and email addresses.
Linguamatics supports bespoke or custom vocabularies. These can be imported from academic or commercial sources. In-house vocabularies can also be employed, for example: a dictionary of employees from an organizational chart, or a controlled vocabulary for an internal drug development project.
Linguamatics incorporates data from the sources in the Content Store to provide source-specific dictionaries. These include Patent classification codes, listings of product names in FDA Drug Labels and specific FAERS terms.