Call for Papers

Workshop date: 12th July 2012

** Workshop Presentations and papers now available to download**

The detection of discourse structure in scientific documents is important for a number of tasks, including biocuration efforts, text summarization, error correction, information extraction and the creation of enriched formats for scientific publishing. Currently, many parallel efforts exist to detect a range of discourse elements at different levels of granularity and for different purposes. Discourse elements detected include the statement of facts, claims and hypotheses, the identification of methods and protocols, and as the differentiation between new and existing work. In medical texts, efforts are underway to automatically identify prescription and treatment guidelines, patient characteristics, and to annotate research data. Ambitious long-term goals include the modeling of argumentation and rhetorical structure and more recently narrative structure, by recognizing 'motifs' inspired by folktale analysis.

A rich variety of feature classes is used to identify discourse elements, including verb tense/mood/voice, semantic verb class, speculative language or negation, various classes of stance markers, text-structural components, or the location of references. These features are motivated by linguistic inquiry into the detection of subjectivity, opinion, entailment, inference, but also author stance and author disagreement, motif and focus.

Several workshops have been focused on the detection of some of these features in scientific text, such as speculation and negation in the 2010 workshop on Negation and Speculation in Natural Language Processing and the BioNLP'09 Shared Task, and hedging in the CoNLL-2010 Shared Task Learning to detect hedges and their scope in natural language textM. Other efforts that have included a clear focus on scientific discourse annotation include STIL2011 and Force11, the Future of Research Communications and e-Science. There have been several efforts to produce large-scale corpora in this field, such as BioScope, where negation and speculation information were annotated, and the GENIA Event corpus.

The goal of the 2012 workshop Detecting Structure in Scholarly Discourse is to discuss and compare the techniques and principles applied in these various approaches, to consider ways in which they can complement each other, and to initiate collaborations to develop standards for annotating appropriate levels of discourse, with enhanced accuracy and usefulness.

We invite submissions of long papers that span the range from theory to application, including research on and the practice of manual and automated annotation systems and are interested in discussing questions like the following:

  • What correlations can be demonstrated among document structure, argumentation and rhetorical functions?
  • What are the text linguistic and philosophical motivations underpinning current efforts to identify discourse structure? Are the assumptions made by current text processing tools supported by discourse linguistic research; are there unused opportunities for fruitful cross-fertilization?
  • Can we port parallel efforts from neighboring fields, such as motifs in folktale research, to annotate and detect narrative structures?
  • Which discourse annotation schemes are the most portable? Can they be applied to both full papers and abstracts? Can they be applied to texts in different domains and different genres (research papers, reviews, patents, etc)?
  • How can we compare annotations, and how can we decide which features, approaches or techniques work best? What are the most topical use cases? How can we evaluate performance and what are the most appropriate tasks?
  • What corpora are currently available for comparing and contrasting discourse annotation, and how can we improve and increase these?
  • How applicable are these efforts for improving methods of publishing, detecting and correcting author's errors at the discourse level, or summarizing scholarly text? How close are we to implementing them at a production scale?