HED Python tools¶
The HED (Hierarchical Event Descriptor) scripts and notebooks assume that the Python HedTools have been installed. The HedTools package is not yet available on PyPI, so you will need to install it directly from GitHub using:
pip install git+https://github.com/hed-standard/hed-python/@master
There are several types of Jupyter notebooks and other HED support tools:
Jupyter notebooks for HED in BIDS - aids for HED annotation in BIDS.
Jupyter notebooks for data curation - aids for summarizing and reorganizing event data.
Calling HED tools - specific useful functions/classes.
Jupyter notebooks for HED in BIDS¶
The following notebooks are specifically designed to support HED annotation for BIDS datasets.
Summarize BIDS event files¶
Sometimes event files include unexpected or incorrect codes. It is a good idea to find out what is actually in the dataset event files and whether the information is consistent before starting the annotation process.
The bids_summarize_events.ipynb finds the dataset event files and outputs the column names and number of events for each event file. You can visually inspect the output to make sure that the event file column names are consistent across the dataset. The script also summarizes the unique values that appear in different event file columns across the dataset.
To use this notebook, substitute the specifics of your BIDS dataset for the following variables:
Variables to set in the bids_summarize_events.ipynb Jupyter notebook.
Variable |
Purpose |
---|---|
bids_root_path |
Full path to root directory of dataset. |
exclude_dirs |
List of directories to exclude when constructing the list of event files. |
entities |
Tuple of entity names used to construct a unique keys representing filenames. |
name_indices |
Indices used to construct a unique keys representing event filenames. |
skip_columns |
List of column names in the |
For large datasets, be sure to skip columns such as
onset
and sample
, since the summary produces counts of the number of times
each unique value appears somewhere in dataset event files.
Extract a JSON sidecar template¶
The usual strategy for producing machine-actionable event annotation using HED in BIDS is
to create a single events.json
sidecar file in the BIDS dataset root directory.
Ideally, this sidecar will contain all the annotations needed for users to
understand and analyze the data.
See the BIDS annotation quickstart for additional
information on this strategy and an online version of the tools.
The Create a JSON template section
provides a step-by-step tutorial for using the online tool that creates a
template based on the information in a single events.tsv
file.
For most datasets, this is sufficient.
In contrast, the bids_generate_sidecar.ipynb
notebook bases the extracted template on the entire dataset.
To use this notebook, substitute the specifics of your BIDS dataset for the following variables:
Variables to set in the bids_extract_sidecar.ipynb Jupyter notebook.
Variable |
Purpose |
---|---|
bids_root_path |
Full path to root directory of dataset. |
exclude_dirs |
List of directories to exclude when constructing the list of event files. |
entities |
Tuple of entity names used to construct a unique keys representing filenames. |
skip_columns |
List of column names in the |
value_columns |
List of columns names in the |
For large datasets, be sure to skip columns such as
onset
and sample
, since the summary produces counts of the number of times
each unique value appears somewhere in dataset event files.
JSON sidecar to spreadsheet¶
If you have a BIDS JSON event sidecar or a sidecar template, you may find it more convenient to view and edit the HED annotations in spreadsheet rather than working with the JSON file directly as explained in the Spreadsheet templates tutorial.
The bids_sidecar_to_spreadsheet.ipynb notebook demonstrates how to extract the pertinent HED annotation to a 4-column spreadsheet (Pandas dataframe) corresponding to the HED content of a JSON sidecar. A spreadsheet representation is useful for quickly reviewing and editing HED annotations. You can easily merge the edited information back into the BIDS JSON events sidecar.
Here is an example of the spreadsheet that is produced by converting a JSON sidecar template to a spreadsheet template that is ready to edit. You should only change the values in the description and the HED columns.
Example 4-column spreadsheet template for HED annotation.
column_name |
column_value |
description |
HED |
---|---|---|---|
event_type |
setup_right_sym |
Description for setup_right_sym |
Label/setup_right_sym |
event_type |
show_face |
Description for show_face |
Label/show_face |
event_type |
left_press |
Description for left_press |
Label/left_press |
event_type |
show_circle |
Description for show_circle |
Label/show_circle |
stim_file |
n/a |
Description for stim_file |
Label/# |
To use this notebook, you will need to provide the path to the JSON sidecar and a path to
save the spreadsheet if you want to save it.
If you don’t wish to save the spreadsheet, assign spreadsheet_filename
to be None.
The bids_merge_sidecar.ipynb notebook shows the complete process, from extracting the initial sidecar, to converting to a spreadsheet and then merging in another sidecar.
Validate HED in a BIDS dataset¶
Validating HED annotations as you develop them makes the annotation process easier and faster to debug. The HED validation guide discusses various HED validation issues and how to fix them.
The bids_validate_dataset.ipynb
Jupyter notebook validates HED in a BIDS dataset using the validate
method
of BidsDataset
.
The method first gathers all the relevant JSON sidecars for each event file
and validates the sidecars. It then validates the individual events.tsv
files
based on applicable sidecars.
The script requires you to set the check_for_warnings
flag and the root path to
your BIDS dataset.
Note: This validation pertains to event files and HED annotation only. It does not do a full BIDS validation.
The bids_validate_dataset_with_libraries.ipynb
Jupyter notebook validates HED in a BIDS dataset using the validate
method of BidsDataset
.
The example uses three schemas and also illustrates how to manually override the
schema specified in dataset_description.json
with schemas from other places.
This is very useful for testing new schemas that are underdevelopment.
Jupyter notebooks for data curation¶
All data curation notebooks and other examples can now be found in the hed-examples repository.
Consistency of BIDS event files¶
Some neuroimaging modalities such as EEG, typically contain event information
encoded in the data recording files, and the BIDS events.tsv
files are
generated post hoc.
In general, the following things should be checked before data is released:
The BIDS
events.tsv
files have the same number of events as the data recording and that onset times of corresponding events agree.The associated information contained in the data recording and event files is consistent.
The relevant metadata is present in both versions of the data.
The example data curation scripts discussed in this section assume that two versions
of each BIDS event file are present: events.tsv
and a corresponding events_temp.tsv
file.
The example datasets that are using for these tutorials assume that the recordings
are in EEG.set format.
We used the runEeglabEventsToFiles
MATLAB script to dump the events stored in the data.
Calling HED tools¶
This section shows examples of useful processing functions provided in HedTools:
Getting a list of files¶
Many situations require the selection of files in a directory tree based on specified criteria.
The get_file_list
function allows you to pick out files with a specified filename
prefix and filename suffix and specified extensions
The following example returns a list of full paths of the files whose names end in _events.tsv
or _events.json
that are not in any code
or derivatives
directories in the bids_root_path
directory tree.
The search starts in the directory root bids_root_path
:
Get a list of specified files in a specified directory tree.
file_list = get_file_list(bids_root_path, extensions=[ ".json", ".tsv"], name_suffix="_events",
name_prefix="", exclude_dirs=[ "code", "derivatives"])
Dictionaries of filenames¶
The HED tools provide both generic and BIDS-specific classes for dictionaries of filenames.
The Many of the HED data processing tools make extensive use of dictionaries specif
BIDS-specific dictionaries of files¶
Files in BIDS have unique names that indicate not only what the file represents, but also where that file is located within the BIDS dataset directory tree.
BIDS file names and keys¶
A BIDS file name consists of an underbar-separated list of entities, each specified as a name-value pair, followed by suffix indicating the data modality.
For example, the file name sub-001_ses-3_task-target_run-01_events.tsv
has entities subject (sub
), task (task
), and run (run
).
The suffix is events
indicating that the file contains events.
The extension .tsv
gives the data format.
Modality is not the same as data format, since some modalities allow
multiple formats. For example, sub-001_ses-3_task-target_run-01_eeg.set
and sub-001_ses-3_task-target_run-01_eeg.edf
are both acceptable
representations of EEG files, but the data is in different formats.
The BIDS file dictionaries represented by the class BidsFileDictionary
and its extension BidsTabularDictionary
use a set combination of entities
as the file key.
For a file name sub-001_ses-3_task-target_run-01_events.tsv
,
the tuple (‘sub’, ‘task’) gives a key of sub-001_task-target
,
while the tuple (‘sub’, ‘ses’, ‘run) gives a key of sub-001_ses-3_run-01
.
The use of dictionaries of file names with such keys makes it
easier to associate related files in the BIDS naming structure.
Notice that specifying entities (‘sub’, ‘ses’, ‘run) gives the
key sub-001_ses-3_run-01
for all three files:
sub-001_ses-3_task-target_run-01_events.tsv
, sub-001_ses-3_task-target_run-01_eeg.set
and sub-001_ses-3_task-target_run-01_eeg.edf
.
Thus, the expected usage is to create a dictionary of files of one modality.
Create a key-file dictionary for files ending in events.tsv in bids_root_path directory tree.
from hed.tools import FileDictionary
from hed.util import get_file_list
file_list = get_file_list(bids_root_path, extensions=[ ".set"], name_suffix="_eeg",
exclude_dirs=[ "code", "derivatives"])
file_dict = BidsFileDictionary(file_list, entities=('sub', 'ses', 'run) )
In this example, the get_file_list
filters the files of the appropriate type,
while the BidsFileDictionary
creates a dictionary with keys such as
sub-001_ses-3_run-01
and values that are BidsFile
objects.
BidsFile
can hold the file name of any BIDS file and keeps a parsed
version of the file name.
A generic dictionary of filenames¶
Create a key-file dictionary for files ending in events.json in bids_root_path directory tree.
from hed.tools import FileDictionary
from hed.util import get_file_list
file_list = get_file_list(bids_root_path, extensions=[ ".json"], name_suffix="_events",
exclude_dirs=[ "code", "derivatives"])
file_dict = FileDictionary(file_list, name_indices=name_indices)
Keys are calculated from the filename using a name_indices
tuple,
which indicates the positions of the name-value entity pairs in the
BIDS file name to use.
The BIDS filename sub-001_ses-3_task-target_run-01_events.tsv
has
three name-value entity pairs (sub-001
, ses-3
, task-target
,
and run-01
) separated by underbars.
The tuple (0, 2) gives a key of sub-001_task-target
,
while the tuple (0, 3) gives a key of sub-001_run-01
.
Neither of these choices uniquely identifies the file.
The tuple (0, 1, 3) gives a unique key of sub-001_ses-3_run-01
.
The tuple (0, 1, 2, 3) also works giving sub-001_ses-3_task-target_run-01
.
If you choose the name_indices
incorrectly, the keys for the event files
will not be unique, and the notebook will throw a HedFileError
.
If this happens, modify your name_indices
key choice to include more entity pairs.
The Jupyter notebook go_nogo_01_initial_summary.ipynb illustrates using this dictionary in a larger context.
For example, to compare the events stored in a recording file and the events
in the events.tsv
file associated with that recording,
we might dump the recording events in files with the same name, but ending in events_temp.tsv
.
The FileDictionary
class allows us to create a keyed dictionary for each of these event files.
Logging processing steps¶
Often event data files require considerable processing to assure internal consistency and compliance with the BIDS specification. Once this processing is done and the files have been transformed, it can be difficult to understand the relationship between the transformed files and the original data.
The HedLogger
allows you to document processing steps associated
with the dataset by identifying key as illustrated in the following
log file excerpt:
Example output from HED logger.
sub-001_run-01
Reordered BIDS columns as ['onset', 'duration', 'sample', 'trial_type', 'response_time', 'stim_file', 'value', 'HED']
Dropped BIDS skip columns ['trial_type', 'value', 'response_time', 'stim_file', 'HED']
Reordered EEG columns as ['sample_offset', 'event_code', 'cond_code', 'type', 'latency', 'urevent', 'usertags']
Dropped EEG skip columns ['urevent', 'usertags', 'type']
Concatenated the BIDS and EEG event files for processing
Dropped the sample_offset and latency columns
Saved as _events_temp1.tsv
sub-002_run-01
Reordered BIDS columns as ['onset', 'duration', 'sample', 'trial_type', 'response_time', 'stim_file', 'value', 'HED']
Dropped BIDS skip columns ['trial_type', 'value', 'response_time', 'stim_file', 'HED']
Reordered EEG columns as ['sample_offset', 'event_code', 'cond_code', 'type', 'latency', 'urevent', 'usertags']
Dropped EEG skip columns ['urevent', 'usertags', 'type']
Concatenated the BIDS and EEG event files for processing
. . .
Each of the lines following a key represents a print message to the logger.
The most common use for a logger is to create a file dictionary
using make_file_dict
and then to log each processing step using the file’s key.
This allows a processing step to be applied to all the relevant files in the dataset.
After all the processing is complete, the print_log
method
outputs the logged messages by key, thus showing all the
processing steps that hav been applied to each file
as shown in the previous example.
Using the HED logger.
from hed.tools import HedLogger
status = HedLogger()
status.add(key, f"Concatenated the BIDS and EEG event files")
# ... after processing is complete output or save the log
status.print_log()
The HedLogger
is used throughout the processing notebooks in this repository.