Seminar - Sreeram Balakrishnan
Speaker: | Sreeram Balakrishnan (Manager, Unstructured Information Management Architecture group, IBM India Research Lab) |
Title: | Factoid Extraction from the Web |
Date: | 12:30, Thursday 26th May |
Location: | Room F10, MSS Building |
Abstract: | The World Wide Web has grown into an information-mesh, with the most important facts being reported through websites. While the information is in plenty, its form is heavily unstructured, making it difficult to deploy an automated information retrieval system that could extract useful factoids. We present a new method capable of extracting relevant factoids from unstructured Web data (hypertext). A factoid is a news-item that might be of interest with respect to particular category such as change in leadership; in our case they are motivated by corporate or market changes that can be used for market intelligence purposes. We associate a factoid with a snippet of natural language text. Factoid extraction, for a given category, is formulated as a two-class classification problem. Feature abstraction using named entity annotations is used to ameliorate the data sparsity problem We present a method for learning a category specific classifier from a set of pure hand labelled positives and noisy positive instances generated by smartly querying the Web. The system is evaluated on two particular factoid categories, corporate leadership changes and mergers & acquisitions. The experiments yield promising empirical results. Time permitting I would also like to discuss IBM's open-source text analytics platform UIMA. |
Featured News
- 24-month postdoctoral research position in Athens, Greece
- PhD opportunity in collaboration with Athens Univ. of Economics and Business
- iCASE EPSRC funded PhD- multimodal NLP - UoM & BAE - Application deadline 30th April 2024
- Invited talk at the 8th Annual Women in Data Science Event at the American University of Beirut
- Invited talk at the 2nd Symposium on NLP for Social Good (NSG), University of Liverpool
- CFP: BIONLP 2024 and Shared Tasks @ ACL 2024
- Advances in Data Science and Artificial Intelligence Conference 2024
Other News & Events
- Invited talk at Annual Meeting of the Danish Society of Occupational and Environmental Medicine
- New review article on emotion detection for misinformation
- BioNLP 2024 accepted as workshop at ACL 2024
- Junichi Tsujii awarded Order of the Sacred Treasure, Gold Rays with Neck Ribbon
- Chinese Government AwardAward for PhD student Tianlin Zhang