Seminar — Louise Corti
Speaker: | Louise Corti, Associate Director, UK Data Archive |
Title: | Automated indexing of survey questionnaires and interviews |
Date: | Friday 25th January at 11:00 a.m. |
Location: | MIB Lecture Theatre (MLG.001), Manchester Interdisciplinary Biocentre |
Abstract: | I will talk about use cases for social science applications for text mining in the area of automated indexing of survey questionnaire and transcribed interview data. The ASSERT project at Manchester can be harnessed to adapt and develop the technologies and tools to suit the extraction of terms and concepts from these very specific kinds of documents. There are three main aims for a potentially shared project: 1. The first is to understand better how researchers and data processors manually assign keyword terms to both individual survey questions and to in-depth interview data. In this way the texts are summarised to a single or multiple concept. For the UKDA index terms are assigned from a social science thesaurus (HASSET) are used to help users locate datasets of interest through their online catalogue of 4000 plus collections. Currently there is no evidence to establish how reliable the manual classifying process is, although it is guided by a set of organisational cataloguing rules. The process remains somewhat subjective, and as a manual process is extremely labour intensive. No quality control is in place at the UKDA to check the reliability nor robustness of the terms assigned. The systems from ASSERT could be refined to deal with term extraction and summarisation of these data collections. 2. The second and more practical outcome of a project would be to develop a front-end friendly tool that will assist in the laborious tasks of manually assigning (extracting) key words or concepts to survey questions and qualitative texts. This would likely be Java based and will slot into the work flow of the UK Data Archive data processing. That is, the tools would be completely integrated into the process so that the largely non-technical (and certainly not a unix user) data processor would be able to run the automated text mining tools via a GUI interface and then check and edit with manual intervention. 3. Finally there is exploratory work carried out by The UK Data Archive to be done on the application of named entity recognition tools to qualitative interview data and using this to create basic automated anonymisation tools. While other NLP toolsets were used for this project, joint work could investigate how the ASSERT tools might be adapted to look at coreferencing in spoken interview texts. |
Featured News
- 24-month postdoctoral research position in Athens, Greece
- PhD opportunity in collaboration with Athens Univ. of Economics and Business
- iCASE EPSRC funded PhD- multimodal NLP - UoM & BAE - Application deadline 30th April 2024
- Invited talk at the 8th Annual Women in Data Science Event at the American University of Beirut
- Invited talk at the 2nd Symposium on NLP for Social Good (NSG), University of Liverpool
- CFP: BIONLP 2024 and Shared Tasks @ ACL 2024
- Advances in Data Science and Artificial Intelligence Conference 2024
Other News & Events
- Invited talk at Annual Meeting of the Danish Society of Occupational and Environmental Medicine
- New review article on emotion detection for misinformation
- BioNLP 2024 accepted as workshop at ACL 2024
- Junichi Tsujii awarded Order of the Sacred Treasure, Gold Rays with Neck Ribbon
- Chinese Government AwardAward for PhD student Tianlin Zhang