2014 Dataset

Task 2 Dataset

To support the continuum of care, our goal is to develop annotated data, resources, methods that make clinical documents easier to understand for patients.  We will extend Task 1 from 2013 by focusing this year's task on Disease/Disorder Template Filling. For this task, participants will be provided an empty template for each disease/disorder mention; each template consists of the mention's Unified Medical Language System concept unique identifiers (CUI) and mention boundaries. Participants are required to fill in values for each of 10 attributes. Attributes have two slot types, a normalized category (normalization) and the lexical cue from the sentence that indicates the normalized value (cue). Task 2a will evaluate participants’ ability to predict each normalization slot value; Task 2b will evaluate participants’ ability to predict the cue slot value for each disease/disorder template.

There are 10 different attribute types: Negation Indicator, Subject Class, Uncertainty Indicator, Course Class, Severity Class, Conditional Class, Generic Class, Body Location, DocTime Class, and Temporal Expression Normalization values for nine of the attributes come from a list of possible values, such as “yes, no” for Negation Indicator. Normalized values for the tenth attribute—Body Location—come from the UMLS (concept unique identifier (CUI)). The definition of each Attribute type can be found in Table 1.


Table 1. Disease/Disorder Attribute Types with definitions and norm and cue slot values.   



*Default Slot Values; 
**CEM = Clinical Element Models, the original source of many of the attributes (http://www.clinicalelement.com)

The training dataset will contain templates in a “|” delimited format with: a) the disorder CUI assigned to the template as well as the character boundary of the named entity, and b) the default values for each of the 10 attributes of the disease/disorder. Each template will contain the following format:

DD_DocName|DD_Spans|DD_CUI|Norm_NI|Cue_NI|Norm_SC|Cue_SC|Norm_UI|Cue_UI|Norm_CC|Cue_CC|Norm_SV|Cue_SV|       Norm_CO|Cue_CO|Norm_GC|Cue_GC|Norm_BL|Cue_BL|Norm_DT|Norm_TE|Cue_TE

The default values for the Normalization slots are shown in Table 2. The default value for the Cue slot is NULL. The default values will be provided for each attribute in the template in the test set. See the table below for disease/disorder attribute types, example sentences, its Normalization and Cue Slot Value.


Table 2. Attribute types with example sentences and their norm and cue slot values.


The ShARe/CLEFeHealth2013 Task 1 corpus and Disease/Disorder Template annotations will serve as a training set (n=300 documents of four clinical report types). The test set comprises an unseen evaluation set (n=133 documents of discharge summaries).  Participants are required to participate in Task 2a and have the option to participate in Task 2b.

Task 2a and 2b Example: For the following sentence, “The patient has an extensive thyroid history.”, participants are provided the following disease/disorder template with defaults:

09388-093839-DISCHARGE_SUMMARY.txt|30-36|C0040128|*no|*NULL|*patient|*NULL|*no|*NULL|*false|*NULL|  *unmarked|*NULL|*false|*NULL|*false|*NULL|*NULL|*NULL|*Unknown|*None|*NULL


Task 2a) Assign Normalization values to the ten attributes. Participants will keep or update the Normalization values For the example sentence, the Task 2a changes:

09388-093839-DISCHARGE_SUMMARY.txt|30-36|C0040128|*no|*NULL|*patient|*NULL|*no|*NULL|*false|*NULL| *unmarked|*NULL|severe|*NULL|*false|*NULL|C0040132|*NULL|Before|*None|*NULL

Task 2b) Assign Cue values to the nine attributes with cues.  Participants will keep or update the Cue values For the example sentence, the Task 2b changes:

09388-093839-DISCHARGE_SUMMARY.txt|30-36|C0040128|*no|*NULL|*patient|*NULL|*no|*NULL|*false|*NULL|                       *unmarked|*NULL|severe|20-28|*false|*NULL|C0040132|30-36|Before|*None|*NULL

Please note patient Cue span is not annotated in ShARe since it is an attribute default.


Obtaining Task 2 Dataset

See the Task 2 Getting Started Page for full details on how to obtain the the Task 2 dataset.