The Intute Project
The Intute project, co-funded by JISC (Joint Information Systems Committee) and AHRC (Arts and Humanities Research Council) , is a joint work between NaCTeM, MIMAS and the Intute Repository Search Project. The aim is to develop an intelligent semantic search service using NaCTeM's text mining tools, which will grant users the benefit of searching within an enhanced subset of the Intute repository, a collection of academic/technical reports under the domain-heading of Bio-medical Science or Social Science.
In particular, the Intute project considers four directions to improve the current search ability of Intute Repository Search:
- Enhancing the metadata using text mining technologies;
- Applying the technique(s) of text clustering/classification in the search system;
- Developing improved technique(s) for query expansion; and
- Involving the idea of personalisation in the search system.
Duration: May 1st, 2008 ~ April 30th, 2009
Principal Investigator: Dr. Sophia Ananiadou
Project Team (NaCTeM): Scott Piao and Brian Rea
Project Timetable
Project Flowchart
Project Documentation (Progress Reports & Presentations)
Progress of Project
1) Tools have been developed for indexing documents based on metadata (provided by UKOLN) and additional metadata generated by processing full texts. In particular, Genia POS tagger and Termine term extractor are integrated into the indexing package to extract terms from abstracts and pdf full-text documents (where available via the metadata) for indexing purpose. A sample index of over 197,000 documents, including about 3,500 full texts, has been created.
2) A demonstrator semantic document search package has been developed, in which advanced document searching functions are implemented, such as real time clustering of retrieved documents using Carrot2 package, term-based searching of similar and topic-sharing documents, complex query builing etc. In addition, the visualisation package Aduna has been integrated to graphically show the relationships between topics.
NaCTeM IRS Demo Site
Here is a video clip demonstrating the main functions of the NaCTeM IRS search demo site.
Click any of the screenshots below to access the demo site.
Figure 1: Simple search and cluster page:
|
Figure 2: Full document information page:
|
Figure 3: Document cluster visualisation page:
|
Figure 3: Complex query builder page:
|
Featured News
- New paper on dimensionality reduction for multi-label classification
- New homepage for the GENIA project and biomedical annotated corpora
- Detection and classification of anatomical entities - new resources, tools and paper
- Third Workshop on Building and Evaluating Resources for Biomedical Text Mining - Call for Papers
- Detecting Structure in Scholarly Discourse - Call for papers
- NaCTeM to join forces with Elsevier to develop SciVerse Applications
- Prof. Ananiadou to give keynote speech at IHI 2012 - Call for participation
Other News & Events
- Event at House of Commons to discuss Hargreaves Review
- Computational Intelligence special issue on BioNLP Shared Task 2009 published
- Special issue of BMC Bioinformatics on BioCreative III
- Invited talk at STM Innovations Seminar 2011
- Invited talk at IPRC Workshop "Copyright exceptions in the UK: time for reform?"





