NaCTeM

Seminar - Dr. Ann Copestake

Speaker: Dr. Ann Copestake (University of Cambridge)
Title: Robust Semantic Processing for Information Extraction
Date: 2pm, 13th January, 2006
Location: Lecture Theatre E7, Renold Building (building 8 on the campus map)
Abstract: Natural language processing techniques have different strengths and weaknesses. Shallow processing may be very fast and robust, but extracts limited information. Deep processors can produce detailed semantic representations, but are relatively slow and brittle and require much more knowledge. Various approaches to building combined systems have been tried, for instance so that deep processing is only invoked on regions of text which have been identified as interesting by shallow processors. But different processors typically assume very different representations, which makes it difficult to combine them flexibly.

We have developed a common semantic representation language (Robust Minimal Recursion Semantics: RMRS) for deep and shallow processing. Shallow processors provide a representation which is compatible with deep processing, but relatively underspecified. In the Deep Thought project and subsequent work, we have demonstrated that various systems (including part-of-speech taggers, noun phrase chunkers, named entity recognisers, robust parsers and deep parsers) can be adapted to output RMRSs. Information extraction systems can be built which utilise RMRS markup as a base. In a current project, SciBorg, we are further developing this approach and applying it to Chemistry.

We treat the different processing stages as providing levels of standoff annotation with respect to the scientific text. Our aim is to provide infrastructure that can be used by Chemistry researchers to support a variety of tasks, including enhanced search, information extraction and ontology expansion.

Download slide