2014 Dataset

Document Collection

The data set for task 3 consists of a set of medical-related documents, provided by the Khresmoi project. This collection contains documents covering a broad set of medical topics, and does not contain any patient information. The documents in the collection come from several online sources, including the Health On the Net organization certified websites, as well as well-known medical sites and databases (e.g. Genetics Home Reference, ClinicalTrial.gov, Diagnosia).


The topics are built from discharge summaries provided by task 2. From the main disorder diagnosed in the discharge summary, medical professional generated topics containing the following fields: 
  • Title: text of the query, 
  • Description: longer description of what the query means, 
  • Narrative: expected content of the relevant documents,
  • Profile: main information on the patient (age, gender, condition)
  • Discharge_summary: ID of the  matching discharge summary
The training set contains 5 queries and the matching relevance assessment. 
The test set contains 50 queries. 
All the queries are translated by professionals in German, French and Czech for task 3b. 

Discharge Summaries (optional)

Registered lab participants are free to obtain access to the discharge summaries from task 2. It is not mandatory to obtain the Task 2 dataset to participate in Task 3 but it can be used as an external resource if desired.

Obtaining Task 3 Dataset

Guidelines to get access to task 3 dataset are given here.