Skip to main content

Mining Adverse Event Data with NLP at Eli Lilly

15th May 2017
Jane Reed

The combined value of NLP and Machine Learning – a concrete example

With the rising costs of de novo drug discovery, and increasing focus on rare diseases, there is continuous innovation for methods and solutions to find new uses for existing drugs.  I was interested to hear of a novel approach for this, published recently by Eric Su and Todd Sanger at Eli Lilly. In this paper, “Systematic drug repositioning through mining adverse event data in ClinicalTrials.gov”, the authors describe the combined use of Natural Language Processing (NLP) and Machine Learning (ML), to extract potential new uses of existing drugs.

It’s quite astonishing how often in the last weeks and months I’ve been asked about the interplay between NLP, Artificial Intelligence (AI), and ML. It seems that everyone wants to understand more about the real potential (rather than the hype that is being shouted from the rooftops) that these tools will provide to impact healthcare, research, and many other areas of our lives, in the next decade.

So, let’s delve further into this concrete example of the combined value of NLP and ML. The innovative step here was to exclude trials for a specific indication, such as cancer, and then find trials with Serious Adverse Events (SAEs) classified as cancerous. The researchers then looked to see if the placebo arm had more cancerous SAEs. If the placebo arm had more cancer-related SAEs than the treatment arm, they hypothesized that the treatment has a positive anti-cancer effect.

For example, NCT00549757 was a clinical trial that tested Aliskiren for cardiovascular and renal disease in patients with type 2 diabetes. The SAE data from this study showed that only 1 out of 4,272 patients in the Aliskiren arm reported gastric cancer versus 8 out of 4,285 patients in the placebo arm. A significant difference. 

Extracting insights from hundreds of unstructured ClinicalTrials.gov records with I2E to feed ML predictions

The team used Linguamatics Life Science’s text mining solution, I2E, to extract all the information needed from over 2,500 trials. This included any SAE from randomized trials in ClinicalTrials.gov, along with study arm information (treatment, placebo, patient number), indication, trial description, and more. They then used PolyAnalyst (Megaputer), which provides access to a selection of machine learning algorithms, to calculate ranking statistics for the treatment-indication association. 

In the paper, the authors describe a number of drugs that this workflow revealed that could be re-purposed for specific cancers. These include Telmisartan for colon cancer, Phylloquinone (vitamin K1) as a cancer preventative, and Aliskiren for gastric cancer. Obviously, further literature and laboratory investigations would be needed; however this approach can be used to generate drug-repurposing hypotheses rapidly, from the fast-growing sets of clinical data, such as FAERS, patient forum, EHRs and more.

Find out how you can extract actionable insights from clinical trial data.