Getting Started

Obtaining Task 2 Dataset

The dataset will be distributed through the Physionet website. The steps for accessing the ShARe dataset for this year's Task 2 can be found below.

1. Register for CLEF eHealth 2014:
2. Obtain a human subjects training certificate. If you do not have a certificate, you can take the CITI training course ( or the NIH training course (
Note: First time users need to create an account in order to be able to take the courses. Expect a couple of hours work to complete the certification. Please save an electronic copy of the certificate - it will be needed in the subsequent steps to obtain the data.
3. Go to the Physionet site:
4. Click on the link for “creating a PhysioNetWorks account” (near middle of page) ( and follow the instructions.
5. Go to this site and accept the terms of the DUA: 
You will receive an email telling you to fill in your information on the DUA and email it back with your human subjects training certificate.
Important: Fill out the DUA using the word “ShARe/CLEF” in the description of the project and mail it back (pasted into the email) with your human subjects certificate attached.
    General research area for which the data will be used: CLEF (plus perhaps something more descriptive)
6. Once you are approved, the organizers will add you to the physionetworks ShARE/CLEF eHealth 2014 account as a reviewer. We will send you an email informing you that you can go to the PhysioNetWorks website and click on the authorized users link to access the data (it will ask you to log in using your physionetworks account login):

Note: If you participated in CLEF eHealth 2013 and obtained permissions, you will skip Steps 2-5 and will be provided access to the 2014 dataset following successful Step 1 registration.

Please note that all individuals working on the data need to individually obtain a human subjects training certificate, apply for a Physionet account, and sign their own DUA on the Physionet site.

To register for the task on the CLEF site, it is sufficient to register only one participant per participating group, but for access to the task 2 data, each participating individual needs her/his own access permission from Physionet.


(Small) Example data set release: Dec 9 2013
(Full) Training data set release: Jan 10 2014
Test data set release: April 23 2014
Test data set submissions due: May 1 2014
Online working notes (internal review) due: June 3 2014
Online working notes (camera ready for CLEF) due: June 7 2014

Information and Discussion Forum

General information and discussions during the task will be organised through the following Google group:

Reviewing Task 2 Dataset and Annotations

We are providing a GUI interface for calculation of outcome measures, as well as for visualization of system annotations against reference standard annotations. Use of the Evaluation Workbench is completely optional. Because the Evaluation Workbench is still under development, we would appreciate your feedback and questions if you select to use it.

A. Memory issues. You need to allocate extra heap when you run the workbench with all the files, or you will get an "out of memory" error.  To do so, you need to use a terminal (or shell) program, go to the directory containing the startup.parameters file, and type:

java -Xms512m -Xmx1024m -jar Eval*.jar 

B. Startup Properties file and GUI. The Evaluation Workbench relies on a parameter file called "". Since the Workbench is a tool for comparing two sets of annotations, the properties refer to the first (or gold standard) and second (or system) annotators. The following properties will need to be set using the Startup properties GUI before selecting “Initialize” to start the Workbench:

WorkbenchDirectory Full filename where the executable (.jar) file is located. For example,WorkbenchDirectory=/Users/wendyc/Desktop/EvaluationWorkbenchFolderDistribution_  2014ShARECLEF

TextInputDirectory:  Directory containing the clinical reports (every document is a single text file in the directory). For example,
CLEFEvaluationWorkbenchFolderDistribution_  2014ShARECLEF/corpus

AnnotationInputDirectoryFirstAnnotator / AnnotationInputDirectorySecondAnnotator Directories containing the sets of annotations (gold standard annotations is first, system annotations is second). If you do not have system annotations but just want to view the gold standard annotations, point both input directories to the gold standard annotations.

AnnotationInputDirectoryFirstAnnotator=/Users/wendyc/Desktop/CLEFEvaluationWorkbenchFolderDistribution_   2014ShARECLEF/ShAReTask2TrainingKnowtatorFiles

AnnotationInputDirectorySecondAnnotator=/Users/wendyc/Desktop/CLEFEvaluationWorkbenchFolderDistribution_  2014ShARECLEF/ShAReTask2TrainingKnowtatorFiles

Knowtator Schema File: File containing the protégé ontology file representing the ShARe schema

Knowtator Schema File =/Users/wendyc/Desktop/CLEFEvaluationWorkbenchFolderDistribution_2014ShARECLEF/     SHARe_Jan18_2012_base.pont

Classification Labels: Labels for classes, attributes, and relations between classes for ShARe schema

Classification Labels= DefaultClassificationProperties


Classification Labels= associatedcode,associatedCode,distal_or_proximal_normalization,negation_indicator_normalization,  negation_indicator_normalization,severity_normalization,course_normalization,  subject_normalization_CU,Strength number,Strength unit,Strength,Dosage,Frequency number,  Frequencyunit,Frequency,Duration,Route,Form,Attributes_medication,disease_disorder,  Disease_Disorder,severity,negation_indicator,LABEL,degree_of,subject_class,TIMEX3,  uncertainty_indicator_class,subject

**Please remember to set pathnames appropriate for your operating system.  MacOS / Unix pathnames are in the form "/applications/EvaluationWorkbench/…", whereas Windows paths are in the form "c:\\Program Files\\Evaluation Workbench\\…" (escape characters included).  After setting paths appropriately for your computer and operating system, you can activate the Workbench by going to the distribution directory and using the mouse to double-click the EvaluationWorkbench.jar icon.**

Select “Save” once you have set these parameters in the GUI, then “Initialize” to start the Evaluation Workbench.

C. Short tutorial on Evaluation Workbench (5 minute video here:

  • To open the workbench, double click on the EvaluationWorkbench.jar file and follow the steps to set the parameters described in B.
  • To view the 2014 ShARe/CLEF template annotations, select “Utilities” from the tool bar, then select “Convert annotations to pipe-delimited format (CLEF 2)”
  • To navigate the Workbench, most operations will involve holding down the CTRL key until the mouse is moved to a desired position; once the desired position is reached, release the CTRL key. 
  • You can view the 2014 ShARe/CLEF template for a given annotation by holding down the CTRL key and hovering over an annotation.
  • The Workbench displays information in several panes
    • Statistics pane: rows are classifications (e.g., Disorder CUI); columns display a contingency table of counts and several outcome measures (e.g., F-measure). The intersecting cell is the outcome measure for that particular classification. When a cell is highlighted, the reports generating that value are shown in the Reports paneWhen you move the mouse over a report in the Reports pane, that report will appear in the Document pane.
    • The Document pane displays annotations for the selected document. The parameter button with label "Display=" selects whether to view a single annotation set at a time (gold or system), or to view both at once. Pink annotations are those that occur in only one source, and so indicate a false negative error (if it appears in the gold but not the system annotation set) or false positive (if it appears in the system but not the gold set). Highlighting an annotation in the document pane updates the statistics pane to reflect statistics for that classification. It also shows the attributes and relationships for that annotation (not relevant for this dataset but in other datasets you may have attributes like Negation status or relationships like Location of).
    • The Detail panel on the lower right side displays relevant parameters, report names, attribute, and relation information. The parameters include "Annotator" (whether the currently selected annotator is Gold or System), "Display" (whether you are viewing gold annotations, system annotations, or both), MatchMode (whether matches must be exact or any-character overlap) and MouseCtrl (whether the ctrl key must be held down to activate selections).
  • You can store the evaluation measures to a file by selecting File->StoreOutcomeMeasures, and entering a selected file name.


To participate in an electronic dialogue about use of the Workbench, please sign up for the google group:!forum/evaluation-workbench