The Intute Project
The Intute project, co-funded by JISC (Joint Information Systems Committee) and AHRC (Arts and Humanities Research Council) , is a joint work between NaCTeM, MIMAS and the Intute Repository Search Project. The aim is to develop an intelligent semantic search service using NaCTeM's text mining tools, which will grant users the benefit of searching within an enhanced subset of the Intute repository, a collection of academic/technical reports under the domain-heading of Bio-medical Science or Social Science.
In particular, the Intute project considers four directions to improve the current search ability of Intute Repository Search:
- Enhancing the metadata using text mining technologies;
- Applying the technique(s) of text clustering/classification in the search system;
- Developing improved technique(s) for query expansion; and
- Involving the idea of personalisation in the search system.
Duration: May 1st, 2008 ~ April 30th, 2009
Principal Investigator: Dr. Sophia Ananiadou
Project Team (NaCTeM): Scott Piao and Brian Rea
Project Timetable
Project Flowchart
Project Documentation (Progress Reports & Presentations)
Progress of Project
1) Tools have been developed for indexing documents based on metadata (provided by UKOLN) and additional metadata generated by processing full texts. In particular, Genia POS tagger and Termine term extractor are integrated into the indexing package to extract terms from abstracts and pdf full-text documents (where available via the metadata) for indexing purpose. A sample index of over 197,000 documents, including about 3,500 full texts, has been created.
2) A demonstrator semantic document search package has been developed, in which advanced document searching functions are implemented, such as real time clustering of retrieved documents using Carrot2 package, term-based searching of similar and topic-sharing documents, complex query builing etc. In addition, the visualisation package Aduna has been integrated to graphically show the relationships between topics.
NaCTeM IRS Demo Site
Here is a video clip demonstrating the main functions of the NaCTeM IRS search demo site.
Click any of the screenshots below to access the demo site.
Figure 1: Simple search and cluster page: |
Figure 2: Full document information page: |
Figure 3: Document cluster visualisation page: |
Figure 3: Complex query builder page: |
Featured News
- Invited talk at the 8th Annual Women in Data Science Event at the American University of Beirut
- Invited talk at the 2nd Symposium on NLP for Social Good (NSG), University of Liverpool
- Postdoctoral research position in Athens, Greece. Application deadline: 18th March 2024
- Four-year funded PhD in collaboration with A*STAR, Singapore. Deadline 20 March 2024
- PhD opportunity in collaboration with Athens Univ. of Economics and Business. Deadline 31 Mar 2024
- iCASE EPSRC funded PhD- multimodal NLP - UoM & BAE - Application deadline 30th March 2024
- CFP: BIONLP 2024 and Shared Tasks @ ACL 2024
- Advances in Data Science and Artificial Intelligence Conference 2024
- New review article on emotion detection for misinformation
Other News & Events
- Invited talk at Annual Meeting of the Danish Society of Occupational and Environmental Medicine
- BioNLP 2024 accepted as workshop at ACL 2024
- Junichi Tsujii awarded Order of the Sacred Treasure, Gold Rays with Neck Ribbon
- Chinese Government AwardAward for PhD student Tianlin Zhang
- Keynote talk at EMBL-EBI industry club Machine Learning for Text Mining