BBC News Browser Pilot Project
The aim of this pilot project is to analyse, structure and visualise BBC news available on the Web according to a user's query using advanced text mining techniques. The major outcomes will include a web demonstrator of two concept clustering tools and also presentations to identified sets of potential users within New Media & Technology, News and BBC Monitoring, and to BBC Research/Technology Group.
Duration: July - December 2007
Principal Investigator: Sophia Ananiadou
Research Associate: Brian Rea
Due to restrictions on the use of the news data we unfortunately cannot make the tool available for general use. Instead we have created a video demonstration to show you they key functionality and benefits of the project output.
Application 1: Concept Discovery and Retrieval
The proposed system will linguistically process and analyse the terminology within all of the news articles provided by the BBC, in order to discover the most important concepts and the relations between them. The interface allows a user to enter a query across the document collection and automatically calculate a list of concepts specific to the query and ranked by perceived importance. An example from a biomedical collection of document s would be a query for documents relating to "myocardial infarction". The ranked set of results returned includes 'myocardial infarction', 'coronary artery', 'risk factor', 'artery disease', 'acute coronary syndrome', 'heart disease', 'heart failure', 'ventricular tachycardia', 'blood pressure' and 'unstable angina'.
The basic method for this includes advanced indexing of these concepts as well as standard keyword based approaches of other more common search engines. This allows for more complete retrieval of document collections without having to know the key terminology and variants ahead of time. This also enables the user to drill down inside the results with each step becoming more focussed on a particular goal and the irrelevant documents being discarded. Finally, as the articles are all stored within the system during processing it is possible to offer multiple visualisations of the documents, ranging from raw text or styled html, to annotated and enhanced versions highlighting key concepts and providing links to related material.
Application 2: Concept Visualisation
This application takes the results of the concept discovery process which are then visualised with the aim to create user oriented knowledge maps. The generation of knowledge maps is achieved by recognising clusters of articles and their automatic categorisation based on concept (terminological) processing. The user selects a collection of online news, specifies a set of query terms and topic maps are created automatically. The figure below exemplifies a topic map that has been generated from news articles. The target information is extracted from a small number of articles concerning terrorism and suggests the documents (yellow dots) that relate the topics.
The basic method includes categorization and mapping of concepts in order to enhance information presentation. The system integrates automatic term recognition, concept clustering, information retrieval, and visualization. Its main objective is to facilitate knowledge presentation and discovery from documents through concept similarities and automatically visualizing them in news stories. Additionally, in order to accelerate information discovery, we propose a visualization method for generating similarity-based knowledge maps. This method is based on real-time terminology-based knowledge clustering and categorization, and it allows users to observe the generated knowledge maps graphically and in real time. This technique can be applied to compare news stories from current or past news articles and/or different channels showing differences in perspective.
Deliverables:
- Explore how the two applications can be used effectively with news articles, and an interim report on this for each.
- A web based demonstrator of both applications and results.
Featured News
- Shared Task on Financial Misinformation Detection at FinNLP-FNP-LLMFinLegal
- New Named Entity Corpus for Occupational Substance Exposure Assessment
- FinNLP-FNP-LLMFinLegal @ COLING-2025 - Call for papers
- Keynote talk at Manchester Law and Technology Conference
- Keynote talk at ACM Summer School on Data Science, Athens
- Congratulations to PhD student Panagiotis Georgiades
Other News & Events
- Invited talk at the 8th Annual Women in Data Science Event at the American University of Beirut
- Invited talk at the 2nd Symposium on NLP for Social Good (NSG), University of Liverpool
- Invited talk at Annual Meeting of the Danish Society of Occupational and Environmental Medicine
- Advances in Data Science and Artificial Intelligence Conference 2024
- New review article on emotion detection for misinformation