Dienst van SURF
© 2025 SURF
Introduction: To reduce continuously increasing costs in drug development, adverse effects of drugs need to be detected as early as possible in the process. In recent years, compound-induced gene expression profiling methodologies have been developed to assess compound toxicity, including Gene Ontology term and pathway over-representation analyses. The objective of this study was to introduce an additional approach, in which literature information is used for compound profiling to evaluate compound toxicity and mode of toxicity. Methods: Gene annotations were built by text mining in Medline abstracts for retrieval of co-publications between genes, pathology terms, biological processes and pathways. This literature information was used to generate compound-specific keyword fingerprints, representing over-represented keywords calculated in a set of regulated genes after compound administration. To see whether keyword fingerprints can be used for assessment of compound toxicity, we analyzed microarray data sets of rat liver treated with 11 hepatotoxicants. Results: Analysis of keyword fingerprints of two genotoxic carcinogens, two nongenotoxic carcinogens, two peroxisome proliferators and two randomly generated gene sets, showed that each compound produced a specific keyword fingerprint that correlated with the experimentally observed histopathological events induced by the individual compounds. By contrast, the random sets produced a flat aspecific keyword profile, indicating that the fingerprints induced by the compounds reflect biological events rather than random noise. A more detailed analysis of the keyword profiles of diethylhexylphthalate, dimethylnitrosamine and methapyrilene (MPy) showed that the differences in the keyword fingerprints of these three compounds are based upon known distinct modes of action. Visualization of MPy-linked keywords and MPy-induced genes in a literature network enabled us to construct a mode of toxicity proposal for MPy, which is in agreement with known effects of MPy in literature. Conclusion: Compound keyword fingerprinting based on information retrieved from literature is a powerful approach for compound profiling, allowing evaluation of compound toxicity and analysis of the mode of action. © 2007 Future Medicine Ltd.
From the article: Abstract Sub-chronic toxicity studies of 163 non-genotoxic chemicals were evaluated in order to predict the tumour outcome of 24-month rat carcinogenicity studies obtained from the EFSA and ToxRef databases. Hundred eleven of the 148 chemicals that did not induce putative preneoplastic lesions in the sub-chronic study also did not induce tumours in the carcinogenicity study (True Negatives). Cellular hypertrophy appeared to be an unreliable predictor of carcinogenicity. The negative predictivity, the measure of the compounds evaluated that did not show any putative preneoplastic lesion in de sub-chronic studies and were negative in the carcinogenicity studies, was 75%, whereas the sensitivity, a measure of the sub-chronic study to predict a positive carcinogenicity outcome was only 5%. The specificity, the accuracy of the sub-chronic study to correctly identify non-carcinogens was 90%. When the chemicals which induced tumours generally considered not relevant for humans (33 out of 37 False Negatives) are classified as True Negatives, the negative predictivity amounts to 97%. Overall, the results of this retrospective study support the concept that chemicals showing no histopathological risk factors for neoplasia in a sub-chronic study in rats may be considered non-carcinogenic and do not require further testing in a carcinogenicity study.
To study the ways in which compounds can induce adverse effects, toxicologists have been constructing Adverse Outcome Pathways (AOPs). An AOP can be considered as a pragmatic tool to capture and visualize mechanisms underlying different types of toxicity inflicted by any kind of stressor, and describes the interactions between key entities that lead to the adverse outcome on multiple biological levels of organization. The construction or optimization of an AOP is a labor intensive process, which currently depends on the manual search, collection, reviewing and synthesis of available scientific literature. This process could however be largely facilitated using Natural Language Processing (NLP) to extract information contained in scientific literature in a systematic, objective, and rapid manner that would lead to greater accuracy and reproducibility. This would support researchers to invest their expertise in the substantive assessment of the AOPs by replacing the time spent on evidence gathering by a critical review of the data extracted by NLP. As case examples, we selected two frequent adversities observed in the liver: namely, cholestasis and steatosis denoting accumulation of bile and lipid, respectively. We used deep learning language models to recognize entities of interest in text and establish causal relationships between them. We demonstrate how an NLP pipeline combining Named Entity Recognition and a simple rules-based relationship extraction model helps screen compounds related to liver adversities in the literature, but also extract mechanistic information for how such adversities develop, from the molecular to the organismal level. Finally, we provide some perspectives opened by the recent progress in Large Language Models and how these could be used in the future. We propose this work brings two main contributions: 1) a proof-of-concept that NLP can support the extraction of information from text for modern toxicology and 2) a template open-source model for recognition of toxicological entities and extraction of their relationships. All resources are openly accessible via GitHub (https://github.com/ontox-project/en-tox).