NaCTeM

OSSMETER

Description

OSSMETER aims to extend the state-of-the-art in the field of automated analysis and measurement of Open Source Software, and develop a platform that will support decision makers in the process of discovering, comparing, assessing and monitoring the health, quality, impact and activity of open-source software.

To achieve this, OSSMETER will compute trustworthy quality indicators by performing advanced analysis and integration of information from diverse sources including the project metadata, source code repositories, communication channels and bug tracking systems of Open Source Software projects.

OSSMETER does not aim at building another OSS forge but instead at providing a meta-platform for analysing existing Open Source Software projects that are developed in existing Open Source Software forges and foundations such as SourceForge, Google Code, GitHub, Eclipse, Mozilla and Apache.

OSSMETER is a 30-month small or medium-scale focused research project (STREP) project funded by the European Community’s Seventh Framework Program [(FP7/2007- 2013) [grant agreement number 318736 (OSSMETER)]. It started in October 2012.

NaCTeM's role in OSSMETER

NaCTeM is leading workpackage 4, which concerms the extraction of quality metrics related to the communication channels, and bug tracking facilities of Open Source Software projects using Natural Language Processing and text mining techniques.

Text mining objectives

The objective of workpackage 4 is to derive results that contribute to the overall measurement and evaluation of the quality of user support and the level of user satisfaction over time in relation to Open Source Software. This is carried out through analysis of discussion threads in Open Source Software online forums via:

  1. classification of Open Source Software online discussion threads in sets of questions and their answers
  2. identification of contents (problems, solutions, complaints, feedback)
  3. identification of opinions (positive, negative) in threads

Methods to help achieve this objective will be based on supervised text mining techniques to identify automatically questions and answers in threads, to analyse types of threads (e.g. problems, solutions, complaints) based on the extracted questions and answers in threads. Opinion mining techniques for the classification of sentiment in threads will be based on a combination of supervised methods using statistical, linguistic and pragmatic features, and resources such as Wordnet and Wiktionary. Text mining analysis of online threads at several levels will result in rich multi-layer, feature-based annotations over the input texts, enabling indexing, flexible interrogation, manipulation and re-use in subsequent OSSMETER processes.

OSSMETER website: http://ossmeter.eu

OSSMETER LinkedIn group: http://linkedin.com/groups/OSSMETER-6531488

OSSMETER on Twitter: http://linkedin.com/groups/OSSMETER-6531488

Project team

Prinicpal Investigator: Prof. Sophia Ananiadou
Researchers: Dr. Ioannis Korkontzelos, Mr. Paul Thompson
Software Engineers: Jacob Carter, Andrew Rowley

Related publications

Internationally refereed conference proceedings

B. Almeida, S. Ananiadou, A. Bagnato, A. B. Barbero, J. Di Rocco, D. Di Ruscio, D. Kolovos, I. Korkontzelos, S. Hansen, P. Maló, N. Matragkas, R. Paige, J. Vinju, (2015). OSSMETER: Automated Measurement and Analysis of Open Source Software Project showcase at STAF 2015 - Software Technologies: Applications and Foundations

Miwa, M., Thompson, P., Korkontzelos, I. and Ananiadou, S. (2014). Comparable Study of Event Extraction in Newswire and Biomedical Domains. In Proceedings of Coling 2014

Kontonatsios, G., Mihaila, C., Korkontzelos, I., Thompson, P. and Ananiadou, S. (2014). A hybrid approach to compiling bilingual dictionaries of medical terms from parallel corpora. In: Statistical Language and Speech Processing, Second International Conference, SLSP 2014, pages 57-69, Springer

Kontonatsios, G., Korkontzelos, I., Tsujii, J. and Ananiadou, S. (2014). Combining String and Context Similarity for Bilingual Term Alignment from Comparable Corpora. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, pp. 1701-1712, Association for Computational Linguistics

Kontonatsios, G., Korkontzelos, I., Tsujii, J. and Ananiadou, S.. (2014). Using a Random Forest Classifier to Compile Bilingual Dictionaries of Technical Terms from Comparable Corpora. In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, volume 2: Short Papers, Association for Computational Linguistics, Gothenburg, Sweden, pp. 111-116, Association for Computational Linguistics

Kontonatsios, G., Thompson, P., Batista-Navarro, R. T. B., Mihaila, C., Korkontzelos, I. and Ananiadou, S. (2013). Extending an interoperable platform to facilitate the creation of multilingual and multimodal NLP applications. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics: System Demonstrations, Association for Computational Linguistics, Sofia, Bulgaria, pp. 43-48

Korkontzelos, I. and Ananiadou, S. (2014). Locating Requests among Open Source Software Communication Messages. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), Reykjavik, Iceland, pp. 1347-1354, European Language Resources Association (ELRA)

Internationally refereed workshop proceedings

J. Williams, N. Matragkas, D. Kolovos, I. Korkontzelos, S. Ananiadou, and R. Paige (2014). Software Analytics for MDE Communities. In Proceedings of the Open Source Software for Model Driven Engineering Workshop (OSS4MDE’14).

Mihaila, C., Kontonatsios, G., Batista-Navarro, R. T. B., Thompson, P., Korkontzelos, I. and Ananiadou, S. (2013). Towards a Better Understanding of Discourse: Integrating Multiple Discourse Annotation Perspectives Using UIMA. In: Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse, Association for Computational Linguistics, Sofia, Bulgaria, pp. 79-88 (LAW Challenge Award)

Ioannis Korkontzelos, Torsten Zesch, Fabio Massimo Zanzotto, and Chris Biemann (2013) SemEval-2013 Task 5: Evaluating Phrasal Semantics. In Proceedings of the 6th International Workshop on Semantic Evaluation (SemEval 2012), Atlanta, Georgia, USA.

Georgios Kontonatsios, Ioannis Korkontzelos, Sophia Ananiadou and Jun’ichi Tsujii (2013). Using a Random Forest Classifier to recognise translations of biomedical terms across languages. In Proceedings of the 6th Workshop on Building and Using Comparable Corpora, Association for Computational Linguistics, Sofia, Bulgaria.

Journal papers

Ioannis Korkontzelos, Dimitrios Piliouras, Andrew Dowsey, and Sophia Ananiadou (to appear). Boosting Drug Named Entity Recognition using an Aggregate Classifier. Artificial Intelligence in Medicine, Special Issue.

Tingting Mu, John Y. Goulermas, Ioannis Korkontzelos, and Sophia Ananiadou (In Press). Descriptive Clustering via Discriminant Learning in a Coembedded Space of Multi-level Similarities. In: Journal of the Association for Information Science and Technology

Book chapters

Mihaila, C., Batista-Navarro, R. T. B., Alnazzawi, N., Kontonatsios, G., Korkontzelos, I., Rak, R., Thompson, P. and Ananiadou, S. (In Press). Mining the biomedical literature. In: Health Care Analytics, CRC Press

Korkontzelos, I. and Ananiadou, S. (2014). Term Extraction. In: Oxford Handbook of Computational Linguistics (2nd Ed.)

Korkontzelos, I (2014). Mining Big Textual Data. Editor: Prof. Stephan Kudyba In: Big Data, Mining and Analytics: Key Components to Strategic Decisions, CRC Press/Taylor & Francis Group