The Intute Project
The Intute project, co-funded by JISC (Joint Information Systems Committee) and AHRC (Arts and Humanities Research Council) , is a joint work between NaCTeM, MIMAS and the Intute Repository Search Project. The aim is to develop an intelligent semantic search service using NaCTeM's text mining tools, which will grant users the benefit of searching within an enhanced subset of the Intute repository, a collection of academic/technical reports under the domain-heading of Bio-medical Science or Social Science.
In particular, the Intute project considers four directions to improve the current search ability of Intute Repository Search:
- Enhancing the metadata using text mining technologies;
- Applying the technique(s) of text clustering/classification in the search system;
- Developing improved technique(s) for query expansion; and
- Involving the idea of personalisation in the search system.
Duration: May 1st, 2008 ~ April 30th, 2009
Principal Investigator: Dr. Sophia Ananiadou
Project Team (NaCTeM): Scott Piao and Brian Rea
Project Timetable
Project Flowchart
Project Documentation (Progress Reports & Presentations)
Progress of Project
1) Tools have been developed for indexing documents based on metadata (provided by UKOLN) and additional metadata generated by processing full texts. In particular, Genia POS tagger and Termine term extractor are integrated into the indexing package to extract terms from abstracts and pdf full-text documents (where available via the metadata) for indexing purpose. A sample index of over 197,000 documents, including about 3,500 full texts, has been created.
2) A demonstrator semantic document search package has been developed, in which advanced document searching functions are implemented, such as real time clustering of retrieved documents using Carrot2 package, term-based searching of similar and topic-sharing documents, complex query builing etc. In addition, the visualisation package Aduna has been integrated to graphically show the relationships between topics.
NaCTeM IRS Demo Site
Here is a video clip demonstrating the main functions of the NaCTeM IRS search demo site.
Click any of the screenshots below to access the demo site.
Figure 1: Simple search and cluster page: |
Figure 2: Full document information page: |
Figure 3: Document cluster visualisation page: |
Figure 3: Complex query builder page: |
Featured News
- Shared Task on Financial Misinformation Detection at FinNLP-FNP-LLMFinLegal
- New Named Entity Corpus for Occupational Substance Exposure Assessment
- FinNLP-FNP-LLMFinLegal @ COLING-2025 - Call for papers
- Keynote talk at Manchester Law and Technology Conference
- Keynote talk at ACM Summer School on Data Science, Athens
- Congratulations to PhD student Panagiotis Georgiades
Other News & Events
- Invited talk at the 8th Annual Women in Data Science Event at the American University of Beirut
- Invited talk at the 2nd Symposium on NLP for Social Good (NSG), University of Liverpool
- Invited talk at Annual Meeting of the Danish Society of Occupational and Environmental Medicine
- Advances in Data Science and Artificial Intelligence Conference 2024
- New review article on emotion detection for misinformation