ursa.register.payload

Manifest → catalog-row payload mapping (ENG-1129).

Pure utilities that turn a data-engine manifest.json plus a listing of relative file paths under recordings/<rec_id>/ into a

class:

RegisterPayload of catalog rows ready for register_recording + register_modality to insert.

No I/O: the caller is responsible for producing the file listing — by walking a local rec_*/ directory (rig-side ingest, ENG-1130) or by listing the R2 prefix (backfill, ENG-1096). This keeps the manifest-interpretation logic in one place and avoids two paths drifting.

Format inference and modality-name collapse logic are extracted from examples/mvp_demo_register.py (ENG-1063) so the demo, the orchestrator, and backfill all agree.

Module Contents

Classes

ModalitySpec

One discovered modality within a recording.

RegisterPayload

A recording + its modalities, ready to insert.

Functions

discover_modalities_from_listing

Group a flat file listing into per-modality :class:ModalitySpec\ s.

infer_format

Pick a :class:StorageFormat for a worker subdir.

manifest_to_register_payload

Build a :class:RegisterPayload from a manifest + file listing.

API

class ursa.register.payload.ModalitySpec[source]

One discovered modality within a recording.

Attributes

name Short modality name (e.g. "camera"), after optional collapse from the worker_subdir’s first underscore segment. worker_subdir Canonical R2 subdir (e.g. "camera_0") — the actual path segment, not the collapsed name. raw_storage_uri is built from this, not from :attr:name. format Inferred :class:StorageFormat. See :func:infer_format. raw_storage_uri Full r2:// URI pointing at the modality prefix. Resolved via :func:ursa.layout.raw_modality_uri against the active profile. segment_count Number of files under this worker_subdir in the supplied listing.

name: str

None

worker_subdir: str

None

format: ursa.catalog.StorageFormat

None

raw_storage_uri: str

None

segment_count: int

None

class ursa.register.payload.RegisterPayload[source]

A recording + its modalities, ready to insert.

Not a Pydantic model: this only crosses process-local boundaries (manifest util → orchestrator → DataInterface.register_*). The contained :class:RecordingRow / :class:ModalityRow are Pydantic-validated at construction; wrapping them again would just double-validate.

recording: ursa.catalog.RecordingRow

None

modalities: list[ursa.catalog.ModalityRow]

None

ursa.register.payload.discover_modalities_from_listing(recording_id: str, rel_paths: collections.abc.Iterable[str], *, collapse_modalities: bool = True, profile: str | None = None) list[ursa.register.payload.ModalitySpec][source]

Group a flat file listing into per-modality :class:ModalitySpec\ s.

Parameters

recording_id Used to construct raw_storage_uri for each modality. rel_paths File paths relative to recordings/<recording_id>/ — typically "camera_0/camera_0000_*.mkv". Files at the recording root (no / in the rel path; e.g. manifest.json) are skipped silently. collapse_modalities When True (the default), collapse worker-subdir names to their first underscore segment. Set to False to register with the full worker_subdir name as the modality. profile Optional profile override threaded to :func:raw_modality_uri. When None, URSA_PROFILE env var is consulted.

ursa.register.payload.infer_format(worker_subdir: str, files: collections.abc.Iterable[str]) ursa.catalog.StorageFormat[source]

Pick a :class:StorageFormat for a worker subdir.

The worker-name prefix wins when it matches a known modality (e.g. camera_0RAW_VIDEO); otherwise the most-common file extension under the subdir picks. Falls back to RAW_BINARY when nothing matches — better to register with a coarse format and refine later than to refuse to register.

Parameters

worker_subdir The canonical worker subdir name (e.g. "camera_0"). files Relative paths within the worker subdir, or full paths containing it. Only the basenames’ extensions are inspected.

ursa.register.payload.manifest_to_register_payload(manifest: dict[str, Any], rel_paths: collections.abc.Iterable[str], *, participant_id: str, profile: str | None = None, collapse_modalities: bool = True) ursa.register.payload.RegisterPayload[source]

Build a :class:RegisterPayload from a manifest + file listing.

The manifest is the data-engine manifest.json contents as a dict; rel_paths is the listing under recordings/<rec_id>/.

Required manifest fields:

  • recording_id — informational; manifest_recording_id metadata for auditability.

  • recording_hash — content-addressed digest from data-engine/uploader/hashing.py:compute_recording_hash. No fallback — a manifest without this field is rejected with KeyError (B4 from round 2 — silent fallback to recording_id could orphan a row when a re-augment produces different content under the same recording_id).

  • started_at, ended_at — ISO-8601 strings.

Optional manifest fields:

  • participant — display-name string; copied to metadata for auditability but the catalog participant_ids always comes from the explicit participant_id= kwarg.

  • files — list of {worker_id, rel_path, size, sha256} entries (added by data-engine’s augment_manifest). Currently unused by this util; the orchestrator’s hash-skip path consumes it directly from the manifest dict.

Parameters

manifest The manifest dict. rel_paths File paths relative to recordings/<rec_id>/ (output of store.list or find rec_*/). participant_id The catalog ID for the participant. Required: the schema currently enforces participant_ids non-empty. Null-participant support for legacy recordings is tracked in ENG-1099. profile Optional profile override for URI construction. collapse_modalities See :func:discover_modalities_from_listing.

Raises

KeyError If the manifest is missing recording_id / started_at / ended_at. ValueError If no modality directories are discovered under recordings/<rec_id>/ (likely a bad listing or a non-rig prefix).