Biomedical Natural Language Processing (BioNLP)

Biomedical Natural Language Processing (BioNLP), also known as biomedical text mining, is a multidisciplinary research field on the edge of computer science, natural language processing, bioinformatics, medical & health informatics and computational linguistics. It refers to text mining approaches and methods applied to biomedical literature.

BioNLP main applications

  • The identification of biological entities (named entity recognition) such us genes, transcripts, proteines, chemical compounds, and drugs.[1],[2],[3]
  • Identification of gene-gene or gene-drug interactions from published clinical trials and biomedical literature.
  • Text Clustering

NLP libraries and frameworks

The main clinical trial registries are:



  • [1] Lou Y et al. (2017). A Transition-based Joint Model for Disease Named Entity Recognition and Normalization. Bioinformatics; btx172. PubMed
  • [2] Saha S et al. (2015). Named entity recognition and classification in biomedical text using classifier ensemble. Int J Data Min Bioinform.; 11(4):365-91. PubMed
  • [3] M Krallinger, F Leitner, O Rabal, M Vazquez, J Oyarzabal and A Valencia, Overview of the chemical compound and drug name recognition (CHEMDNER) task. Proceedings of the Fourth BioCreative Challenge Evaluation Workshop vol. 2.