ursa.recovery.timing

Derive a recording’s ended_at from per-modality segment metadata.

When the catalog’s RecordingRow.duration is wrong or missing (e.g. the 198 legacy rows whose manifests were reconstructed with ended_at = max(R2 last_modified) = upload time, not session end), this module re-derives the true session end from the segment files the rigs wrote at recording time.

Several signal sources are supported, split into data-stream probes (read from actual recorded sample timestamps) and a universal worker-report fallback:

  • EEG (eeg_*/eeg_*_timestamps.bin): per-chunk (int64 offset, float64 unix_ts) pairs packed by data-engine/eeg/recorder.py. The last 16-byte struct gives the timestamp of the most recent data chunk written before the recorder stopped.

  • Camera (camera_*/camera_timestamps_NNNN_<first_frame_ts>.csv OR camera_*/segment_NNNN_<first_frame_ts>.csv): per-frame frame, pts_ns, wall_clock, epoch_ns rows written by data-engine/camera/timestamps.py. The last row’s epoch_ns gives the most recent frame’s wall-clock. Two filename conventions are accepted because the legacy rig writers emit segment_*.csv while newer writers emit camera_timestamps_*.csv — same column shape. The filter still requires one of those two prefixes (not just .csv) so a stray debug dump can’t be lex-max-picked and silently downgrade the recording.

  • Microphone (mic_*/mic_NNNN_<unix_ts>.wav): the filename encodes the first-frame unix timestamp; the RIFF data chunk size

    • sample rate + channels + bits-per-sample give the duration. End = start + duration.

  • Screen (screen_*/screen_timestamps_NNNN_<first_frame_ts>.csv): identical frame, pts_ns, wall_clock, epoch_ns columns as camera — the last row’s epoch_ns is the most recent captured frame.

  • Samsung watch (samsungwatch_*/<sensor>_NNNN_<start_ts>.csv for imu / ppg / eda / heart_rate / battery): each sensor stream’s first column is a nanosecond timestamp; the max last-row timestamp across sensors is the last sample received.

  • Environment (environment_*/environment_NNNN_<start_ts>.jsonl): newline-delimited {"timestamp": <ns>, "type": ...} poll records; the max timestamp is the last poll.

  • Worker report (<worker>/worker_report*.json): a universal fallback across every worker type (notes, pupillabs, location, keyboard, mouse, battery, etc.). Each worker writes a report on clean shutdown carrying an explicit stopped_at_utc ISO-8601 timestamp. Probed for every worker dir alongside the data-stream probe, so a recording whose eeg_*_timestamps.bin is 0-bytes (failed recorder) can still derive ended_at from a sibling worker that shut down cleanly.

func:

derive_recording_end_time returns a :class:ModalitySignals holding the per-source candidates; the recording’s true ended_at is derived via :meth:ModalitySignals.max_end, which prefers the latest data-stream end (eeg / camera / mic / screen / samsungwatch / environment) when any such signal exists, falling back to worker_report only when none fired (recordings made up solely of worker types that still lack a sample-timestamp probe).

Conservative-by-design: parsers return None when no parseable signal exists (no _timestamps.bin present, no camera_timestamps_/segment_ CSV, no valid WAV header, no worker_report*.json with a parseable stopped_at_utc — the absent-signal case). Exceptions raised by read or parse failures are caught by :func:derive_recording_end_time and appended to

attr:

ModalitySignals.errors — the failed-parse case. An empty errors list with a None end-time means “no signal,” not “no failure.” A single broken segment file downgrades only its own recording to indeterminate rather than aborting a multi-recording rebuild.

Module Contents

Classes

ModalitySignals

End-time candidates per signal source for one recording.

Functions

derive_recording_end_time

Probe every modality worker under recordings/<rec_id>/ and return per-modality end-time candidates.

Data

API

ursa.recovery.timing.CSV_TAIL_BYTES

8192

ursa.recovery.timing.WAV_HEADER_BYTES

256

class ursa.recovery.timing.ModalitySignals[source]

End-time candidates per signal source for one recording.

Populated by :func:derive_recording_end_time. Use

Meth:

max_end to collapse the modality candidates into a single ended_at value; None indicates no signal source produced a parseable result (the row is indeterminate).

eeg: datetime.datetime | None

None

camera: datetime.datetime | None

None

mic: datetime.datetime | None

None

screen: datetime.datetime | None

None

samsungwatch: datetime.datetime | None

None

environment: datetime.datetime | None

None

worker_report: datetime.datetime | None

None

errors: list[str]

‘field(…)’

max_end() datetime.datetime | None[source]

Best ended_at estimate, or None if no signal produced a candidate (the row is indeterminate).

Prefers the latest data-stream end (eeg / camera / mic / screen / samsungwatch / environment), each read from actual recorded sample timestamps. worker_report (stopped_at_utc) is wall-clock at worker process teardown, not the last sample — workers routinely linger hours (even into the next day) past the last data, so including it in the max inflated durations (e.g. a 5h session reported as 30h, or short single-modality sessions reported as a uniform ~8h once the recorder’s idle timeout fires). It is therefore used only as a fallback when no data-stream signal exists — i.e. for worker types that still lack a sample-timestamp probe (notes / pupillabs / keyboard / mouse / location / battery).

ursa.recovery.timing.derive_recording_end_time(store: ursa.store.ObjectStore, rec_id: str) ursa.recovery.timing.ModalitySignals[source]

Probe every modality worker under recordings/<rec_id>/ and return per-modality end-time candidates.

Func:

ModalitySignals.max_end collapses them into a single ended_at value. A single broken segment file degrades only its own worker — the error is recorded in ModalitySignals.errors and the walk continues. Callers should log errors regardless of the max_end() outcome so future parser regressions don’t hide behind a successful sibling modality.