New article on chemical named entity recognition
2015-01-20
We are pleased to announce the publication of a new article on chemical named entity recognition in the Journal of Cheminformatics:
Batista-Navarro, R., Rak, R. and Ananiadou, S. (2015). Optimising chemical named entity recognition with pre-processing analytics, knowledge-rich features and heuristics. Journal of Cheminformatics, 7(Suppl 1), S6.
The paper reports on a new conditional random fields-based chemical entity recogniser, whose performance is optimised through the incorporation a number of customisations, such as specialised pre-processing analytics, use of chemistry knowledge-rich features and post-processing rules. The recogniser achieves state-of-the-art performance, and is able to outperform two popular chemical NER tools. The suite of tools that form the recogniser has been made publicly available as a configurable workflow in the interoperable text mining workbench Argo.
Full abstract
Background
The development of robust methods for chemical named entity recognition, a challenging natural language processing task, was previously hindered by the lack of publicly available, large-scale, gold standard corpora. The recent public release of a large chemical entity-annotated corpus as a resource for the CHEMDNER track of the Fourth BioCreative Challenge Evaluation (BioCreative IV) workshop greatly alleviated this problem and allowed us to develop a conditional random fields-based chemical entity recogniser. In order to optimise its performance, we introduced customisations in various aspects of our solution. These include the selection of specialised pre-processing analytics, the incorporation of chemistry knowledge-rich features in the training and application of the statistical model, and the addition of post-processing rules.
Results
Our evaluation shows that optimal performance is obtained when our customisations are integrated into the chemical entity recogniser. When its performance is compared with that of state-of-the-art methods, under comparable experimental settings, our solution achieves competitive advantage. We also show that our recogniser that uses a model trained on the CHEMDNER corpus is suitable for recognising names in a wide range of corpora, consistently outperforming two popular chemical NER tools.
Conclusion
The contributions resulting from this work are two-fold. Firstly, we present the details of a chemical entity recognition methodology that has demonstrated performance at a competitive, if not superior, level as that of state-of-the-art methods. Secondly, the developed suite of solutions has been made publicly available as a configurable workflow in the interoperable text mining workbench Argo. This allows interested users to conveniently apply and evaluate our solutions in the context of other chemical text mining tasks.
| Previous item | Next item |
| Back to news summary page |
Featured News
- NaCTeM success at EMNLP 2025 - 7/7 papers accepted
- 1st Workshop on Misinformation Detection in the Era of LLMs - Presentation slides now available
- Prof. Ananiadou appointed Deputy Director of the Christabel Pankhurst Institute
- ELLIS Workshop on Misinformation Detection - Presentation slides now available
- Prof. Sophia Ananiadou accepted as an ELLIS fellow
- BioNLP 2025 and Shared Tasks accepted for co-location at ACL 2025
- Prof. Junichi Tsujii honoured as Person of Cultural Merit in Japan
Other News & Events
- AI for Research: How Can AI Disrupt the Research Process?
- CL4Health @ NAACL 2025 - Extended submission deadline - 04/02/2025
- Invited talk at the 15th Marbach Castle Drug-Drug Interaction Workshop
- Participation in panel at Cyber Greece 2024 Conference, Athens
- Shared Task on Financial Misinformation Detection at FinNLP-FNP-LLMFinLegal








