Integrated Social History Environment for Research (ISHER) – Digging into Social UnrestISHER is one of the fourteen projects that won the second Digging Into Data Challenge, a competition to promote innovative humanities and social science research using large-scale data analysis. 67 international teams competed in the challenge.
Social historians and other researchers rely on text data for their research. These data are increasingly available in electronic form, but researchers are hampered in discovering information and answers to questions, as available exploratory tools are inadequate: research questions currently take much manual effort to answer or remain un(der)answered. To mitigate this, we shall develop an integrated environment using sophisticated text mining tools.
In particular, we will develop a digital humanities toolkit to facilitate basic knowledge discovery in social history research. Our text mining-based search system will supply a powerful new transformational research tool for the exploration and discovery of patterns and facts in primary historical sources originating from the digitised historical newspaper archives of the New York Times (NYT) and the National Library of the Netherlands (KB). It will provide social historians and social scientists with the means to detect and associate events, trends, people, organisations, and other entities of specific interest to social historians, related to social unrest.
ISHER aims to enhance search over digitised resources for social history. Enhancement comes through text mining-based rich semantic metadata extraction for collection indexing, clustering and classification. This then allows semantic search while reducing the manual costs currently involved in such activities.
Interoperability of text mining tools is a key objective and an organizing principle for the software architecture of our project. IBM’s Unstructured Information Management Architecture (UIMA) forms the basis of our interoperable text mining platform U-Compare, which has over 50 text mining components in its library, and is extensible so can accommodate ISHER’s requirements by including also text mining tools from third parties.
Anticipated Outputs and Outcomes
The output of the project will be an integrated social history environment for research (ISHER) - which will also be re-usable for other types of humanities research. The outcome for social historians will be a transformation in their work, due to enrichment of digital archives with text mining semantic metadata, enabling users to investigate collections through advanced semantic search, in ways they could not do before.
The project started in January 2012 and is funded by JISC until July 2013.
Principal Investigator: Prof. Sophia Ananiadou
Co-Investigator: Mr. John McNaught
Software Engineer: Jacob Carter
ISHER-NYT demo - search environment for New York Times articles from 1987 to 2007, based on entities and events.
Thompson, P., Nawaz, R., Korkontzelos, I., Black, W.J., McNaught, J. and Ananiadou, S. (In Press). News Search Using Discourse Analytics. In Proceedings of the Digital Heritage 2013 International Congress
Ananiadou, S., Thompson, P. and Nawaz, R. (2013). Enhancing Search: Events and their Discourse Context. Computational Linguistics and Intelligent Text Processing, Lecture Notes in Computer Science, Volume 7817, pages 318-334, Springer.
Batista-Navarro, R. T. B., Kontonatsios, G., Mihăilă, C., Thompson, P., Rak, R., Nawaz, R., Korkontzelos, I. and Ananiadou, S. (2013). Facilitating the Analysis of Discourse Phenomena in an Interoperable NLP Platform. Computational Linguistics and Intelligent Text Processing, Lecture Notes in Computer Science, Volume 7816, pages 559-571, Springer.
Kontonatsios, G., Korkontzelos, I., Kolluru, B., Thompson, P. and Ananiadou, S. (2013). Deploying and Sharing U-Compare Workflows as Web Services. Journal of Biomedical Semantics, 4:7
Kontonatsios, G., Korkontzelos, I. and Ananiadou, S. (2012). Developing Multilingual Text Mining Workflows in UIMA and U-Compare. In Proceedings of the 17th International conference on Applications of Natural Language Processing to Information Systems, pp. 82 - 93, Springer.
Kontonatsios, G., Korkontzelos, I., Kolluru, B. and Ananiadou, S. (2011). Adding Text Mining Workflows as Web Services to the BioCatalogue. In Proceedings of the 4th International Workshop on Semantic Web Aplications and Tools for the Life Sciences (SWAT4LS)
Zervanou, K., Korkontzelos, I., van den Bosch, A. and Ananiadou, S. (2011). Enrichment and Structuring of Archival Description Metadata. In Proceedings of the 5th ACL Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities, pp. 44-53
- Participation in event on copyright and the case of text and data mining at European Parliament
- New paper and resources to support anatomical entity recognition at literature scale
- Keynote speech Pharma Documentation Ring special meeting in Bruges
- COLING 2014
- NaCTeM success at BioCreative IV
- Participation in Workshop on Text and Data Mining for Data Driven Innovation - Highlights available
- NaCTeM student selected to participate in Global Young Scientists Summit
Other News & Events
- NaCTeM papers accepted at ACL
- New paper on integrating and ranking textual evidence for biochemical pathways
- UK Government publishes draft legislation on copyright exception for data analysis
- ICHI - Call for Participation
- New paper on wide-coverage event extraction using multiple partially overlapping corpora