ursa.register.payload¶
Manifest → catalog-row payload mapping (ENG-1129).
Pure utilities that turn a data-engine manifest.json plus a listing
of relative file paths under recordings/<rec_id>/ into a
- class:
RegisterPayloadof catalog rows ready forregister_recording+register_modalityto insert.
No I/O: the caller is responsible for producing the file listing — by
walking a local rec_*/ directory (rig-side ingest, ENG-1130) or by
listing the R2 prefix (backfill, ENG-1096). This keeps the
manifest-interpretation logic in one place and avoids two paths
drifting.
Format inference and modality-name collapse logic are extracted from
examples/mvp_demo_register.py (ENG-1063) so the demo, the
orchestrator, and backfill all agree.
Module Contents¶
Classes¶
One discovered modality within a recording. |
|
A recording + its modalities, ready to insert. |
Functions¶
Group a flat file listing into per-modality :class: |
|
Pick a :class: |
|
Build a :class: |
API¶
- class ursa.register.payload.ModalitySpec[source]¶
One discovered modality within a recording.
Attributes
name Short modality name (e.g.
"camera"), after optional collapse from the worker_subdir’s first underscore segment. worker_subdir Canonical R2 subdir (e.g."camera_0") — the actual path segment, not the collapsed name.raw_storage_uriis built from this, not from :attr:name. format Inferred :class:StorageFormat. See :func:infer_format. raw_storage_uri Fullr2://URI pointing at the modality prefix. Resolved via :func:ursa.layout.raw_modality_uriagainst the active profile. segment_count Number of files under this worker_subdir in the supplied listing.- name: str¶
None
- worker_subdir: str¶
None
- format: ursa.catalog.StorageFormat¶
None
- raw_storage_uri: str¶
None
- segment_count: int¶
None
- class ursa.register.payload.RegisterPayload[source]¶
A recording + its modalities, ready to insert.
Not a Pydantic model: this only crosses process-local boundaries (manifest util → orchestrator → DataInterface.register_*). The contained :class:
RecordingRow/ :class:ModalityRoware Pydantic-validated at construction; wrapping them again would just double-validate.- recording: ursa.catalog.RecordingRow¶
None
- modalities: list[ursa.catalog.ModalityRow]¶
None
- ursa.register.payload.discover_modalities_from_listing(recording_id: str, rel_paths: collections.abc.Iterable[str], *, collapse_modalities: bool = True, profile: str | None = None) list[ursa.register.payload.ModalitySpec][source]¶
Group a flat file listing into per-modality :class:
ModalitySpec\ s.Parameters
recording_id Used to construct
raw_storage_urifor each modality. rel_paths File paths relative torecordings/<recording_id>/— typically"camera_0/camera_0000_*.mkv". Files at the recording root (no/in the rel path; e.g.manifest.json) are skipped silently. collapse_modalities WhenTrue(the default), collapse worker-subdir names to their first underscore segment. Set toFalseto register with the fullworker_subdirname as the modality. profile Optional profile override threaded to :func:raw_modality_uri. WhenNone,URSA_PROFILEenv var is consulted.
- ursa.register.payload.infer_format(worker_subdir: str, files: collections.abc.Iterable[str]) ursa.catalog.StorageFormat[source]¶
Pick a :class:
StorageFormatfor a worker subdir.The worker-name prefix wins when it matches a known modality (e.g.
camera_0→RAW_VIDEO); otherwise the most-common file extension under the subdir picks. Falls back toRAW_BINARYwhen nothing matches — better to register with a coarse format and refine later than to refuse to register.Parameters
worker_subdir The canonical worker subdir name (e.g.
"camera_0"). files Relative paths within the worker subdir, or full paths containing it. Only the basenames’ extensions are inspected.
- ursa.register.payload.manifest_to_register_payload(manifest: dict[str, Any], rel_paths: collections.abc.Iterable[str], *, participant_id: str, profile: str | None = None, collapse_modalities: bool = True) ursa.register.payload.RegisterPayload[source]¶
Build a :class:
RegisterPayloadfrom a manifest + file listing.The manifest is the data-engine
manifest.jsoncontents as a dict;rel_pathsis the listing underrecordings/<rec_id>/.Required manifest fields:
recording_id— informational;manifest_recording_idmetadata for auditability.recording_hash— content-addressed digest fromdata-engine/uploader/hashing.py:compute_recording_hash. No fallback — a manifest without this field is rejected withKeyError(B4 from round 2 — silent fallback torecording_idcould orphan a row when a re-augment produces different content under the same recording_id).started_at,ended_at— ISO-8601 strings.
Optional manifest fields:
participant— display-name string; copied tometadatafor auditability but the catalogparticipant_idsalways comes from the explicitparticipant_id=kwarg.files— list of{worker_id, rel_path, size, sha256}entries (added by data-engine’saugment_manifest). Currently unused by this util; the orchestrator’s hash-skip path consumes it directly from the manifest dict.
Parameters
manifest The manifest dict. rel_paths File paths relative to
recordings/<rec_id>/(output ofstore.listorfind rec_*/). participant_id The catalog ID for the participant. Required: the schema currently enforcesparticipant_idsnon-empty. Null-participant support for legacy recordings is tracked in ENG-1099. profile Optional profile override for URI construction. collapse_modalities See :func:discover_modalities_from_listing.Raises
KeyError If the manifest is missing
recording_id/started_at/ended_at. ValueError If no modality directories are discovered underrecordings/<rec_id>/(likely a bad listing or a non-rig prefix).