The hypline dataset layout¶
Every hypline command takes a single argument — the dataset root — and finds its inputs and writes its outputs by following a fixed directory convention. Understanding that convention is the key to using hypline: you never pass file paths, you organize your files where hypline expects them.
This page describes the layout once. The reference pages assume it.
The root tree¶
A hypline dataset extends the BIDS standard with a few extra areas. A complete tree looks like this:
<dataset-root>/
├── participants.tsv # required: dyad ↔ subject mapping
├── sub-031/ses-1/func/ # raw BIDS (events files live here)
├── derivatives/
│ ├── fmriprep/sub-031/ses-1/func/ # fMRIPrep outputs (preprocessed BOLD)
│ └── hypline/sub-031/ses-1/func/ # hypline imaging derivatives (denoised BOLD)
├── stimuli/dyad-030/ses-1/audio/ # stimulus audio, transcripts
├── features/dyad-030/ses-1/phonemic/ # generated features
├── confounds/dyad-030/ses-1/phonemic/ # generated confounds
└── nuisance/sub-031/ses-1/physio-v1/ # optional, user-supplied nuisance regressors
sub-031/,derivatives/fmriprep/are standard BIDS areas. You provide these — your raw recordings and your fMRIPrep run.derivatives/hypline/is a BIDS derivatives tree hypline fills with its imaging derivatives — currently thedenoiseoutput. It mirrors fMRIPrep'ssub-XX/[ses-YY/]func/shape and carries its owndataset_description.json.stimuli/,features/,confounds/are hypline additions. Hypline creates and fills these as you run commands. They are keyed by dyad (dyad-030/), not subject — see Subject vs. dyad below.nuisance/is optional and you fill it — run-level regressors (e.g. physiological recordings) fordenoiseto regress out alongside fMRIPrep's confounds.participants.tsvis a standard BIDS table at the dataset root, required to map subjects to dyads — see Subject vs. dyad.
Sessions are optional
Examples here use a ses-1/ level under each subject
(sub-031/ses-1/func/) to match the tutorial dataset. Datasets without
sessions omit the level entirely (sub-031/func/). Hypline handles both.
How files are named¶
Hypline follows BIDS filename conventions: a filename is a chain of
entity-value pairs joined by _, ending in a suffix and extension.
sub-031_task-conv_run-1_space-T1w_desc-preproc_bold.nii.gz
\____________________________________________/ \__/ \_____/
entities suffix ext
The identity entities at the front name which recording a file belongs to.
A file leads with exactly one of sub or dyad (never both), followed by
the BOLD-identity entities ses, task, run. A sub-keyed file belongs to
one brain; a dyad-keyed file belongs to one shared conversation. Generated
files mirror the identity entities of the source they came from.
Subject vs. dyad¶
Hypline is a hyperscanning pipeline: two partners hold one conversation while both are scanned. An artifact is keyed by what it is derived from:
sub-keyed — derived from one brain: raw BOLD,derivatives/fmriprep/,derivatives/hypline/(denoised),nuisance/.dyad-keyed — derived from the shared conversation between two partners:stimuli/,features/,confounds/. One conversation → one dyad → one set of stimuli/features/confounds, later consumed by each partner's per-subject encoding model. Adyad-030audio file is the dyad's shared recording, not either partner's.
Because the two worlds use different identity entities, hypline bridges them
through participants.tsv — a standard BIDS table at the dataset root with
the required participant_id column plus a custom dyad_id column:
participant_id dyad_id
sub-031 dyad-030
sub-032 dyad-030
This is the single source of truth for which subjects make up which dyad — here
subjects 031 and 032 are partners in dyad-030 (a real study has many such
pairs). It is read lazily: a purely sub-keyed workflow (e.g. denoise
alone) never needs it, but any step that joins a dyad-keyed stimulus artifact to
a sub-keyed BOLD requires it and errors if it is missing.
Use real tabs
participants.tsv — and every .tsv hypline reads (events.tsv, custom
nuisance/ tables) — must be separated by actual tab characters, not
spaces. Hypline splits on tabs, so a space-separated row collapses into one
column and fails with a misleading "missing column" error.
So a dyad-030 feature file does not match a BOLD file by sharing sub —
the two carry different identity entities. The join goes through
participants.tsv: a subject's encoding model looks up its dyad, then reads that
dyad's features.
Category entities¶
Each hypline-generated derivative carries exactly one category entity naming what kind of derivative it is:
| Entity | Area | Example |
|---|---|---|
feat-<kind> |
features/ |
feat-phonemic, feat-semantic, feat-spectral, feat-syntactic |
conf-<kind> |
confounds/ |
conf-phonemic, conf-semantic |
nuis-<kind> |
nuisance/ |
nuis-physio |
The <kind> matches the subdirectory the file lives in. A phonemic feature
(feat-phonemic) lives under features/dyad-030/ses-1/phonemic/.
Stimuli carry no category entity. Their kind is a trailing filename suffix
(_audio, _transcript) instead — e.g. dyad-030_ses-1_task-conv_run-1_audio.wav
under stimuli/dyad-030/ses-1/audio/.
Variants with desc¶
Some commands accept a --desc label that tags an output as one variant among
several. Variants live in their own subdirectory so they stay physically
separate:
confounds/dyad-030/ses-1/
├── phonemic-onset/ # conf-phonemic_desc-onset — speech-onset indicator
└── phonemic-rate/ # conf-phonemic_desc-rate — speech rate per TR
This lets you keep several derivations of the same source side by side and pick between them later by name.
Selecting subjects and runs¶
Because commands discover files by convention, you select what to process with
options rather than paths — an identity option plus --data-filters for runs and
conditions, both interpreted against the entities described above. The identity
option follows the area the command writes: dyad-keyed stimulus commands
(transcribe, featuregen, confoundgen) take --dyad-ids, while the
sub-keyed denoise takes --sub-ids. A third shared option, --force,
overwrites existing outputs (by default hypline skips files it has already
generated, so reruns are cheap).
For how to combine these, see Filter to specific runs or
conditions; for what --data-filters can match, see
Segments and metadata.
Why this design¶
Centralizing discovery in one convention means commands compose cleanly: each reads what earlier steps wrote, with no configuration file wiring inputs to outputs. It also keeps your dataset self-describing — the directory tree itself records what has been generated and from what.