ursa.layout¶
Two-store model (architecture v0.4)¶
Ursa is a two-store database backed by two R2 buckets:
Raw store —
constellation-data(this module’s :data:RAW_BUCKET). Cold/infrequent-access tier. Whole-file objects exactly as data-engine writes them; no chunk indexing or temporal structure. Designed to be read exactly once, by Virgo’s ingestion node, then rarely accessed again. Lifecycle policy can archive or delete raw segment files after a configurable retention window once Virgo has produced the processed artifact (tracked by ENG-1085).Processed store —
constellation-assets(this module’s- data:
ASSETS_BUCKET). Hot tier. Populated by Virgo’s ingestion node: Zarr arrays for regular continuous streams, Lance tables for irregular events and the catalog itself, MP4 + Lance frame indices for video.
Two buckets, two layouts:
constellation-data — raw recordings, written by data-engine rigs.
Ursa treats this bucket as read-only. A Cloudflare bucket lock is applied
to the recordings/ prefix so rigs cannot modify already-uploaded objects.
Key structure (per data-engine PR #49)::
recordings/<recording_id>/<worker_subdir>/<file> raw segment files
manifests/<recording_id>/manifest.json upload commit marker
_status/<hostname>.json uploader heartbeat
nodes/<node_id>.json node registry
constellation-assets — Ursa-managed objects, organised by repo header so each package owns a distinct prefix and permissions can be scoped independently::
virgo/<recording_hash>/<modality>.<ext> Virgo canonical outputs
ursa/catalog/<table>.lance Lance catalog tables
orion/checkpoints/<checkpoint_id>/ Orion model checkpoints
orion/benchmark-suites/<name>/<version>.json Benchmark suite configs
orion/benchmark-results/<result_id>.json Benchmark evaluation results
Per-modality lifecycle (architecture v0.4)¶
A :class:~ursa.catalog.ModalityRow carries two URI fields plus an
ingestion_status enum:
raw_storage_uri— the immutable cold-bucket pointer (r2://constellation-data/recordings/...). Set at registration and preserved forever, even after Virgo’s ingestion node has produced the processed artifact, so re-ingestion is always possible.storage_uri— the current authoritative location. Whileingestion_status="raw"it mirrorsraw_storage_uri; once Virgo’s ingestion node runs, the row is upserted withingestion_status="processed",storage_uriswapped to theconstellation-assetskey, andformat/domain_intervals/channel_specpopulated.
recording_hash is the join key throughout — no re-keying during the
raw → processed transition.
R2 storage layout conventions for Ursa.
All code that constructs or interprets R2 object keys (ingestion, query, lifecycle) MUST use the functions here. Having a single module as the canonical source prevents the key-structure from diverging across callers.
Module Contents¶
Functions¶
Prefix for all Virgo-processed objects belonging to one recording. |
|
Key for one modality’s Virgo-processed object in |
|
Full |
|
Prefix for all Lance catalog tables in |
|
Key for a named Lance catalog table. |
|
Full |
|
Prefix for all objects belonging to one Orion model checkpoint. |
|
Full |
|
Key for the data-hash manifest inside a checkpoint. |
|
Key for a versioned benchmark suite configuration object. |
|
Full |
|
Key for a benchmark evaluation result object. |
|
Full |
|
Prefix for all raw segment files belonging to one recording. |
|
Prefix for one modality’s raw segment files. |
|
Full |
|
Key for the upload commit marker written by the uploader after a complete session. |
|
Key for a rig’s uploader status heartbeat file. |
|
Key for a node’s registry entry in |
|
Reject a |
Data¶
API¶
- ursa.layout.__all__¶
[‘RAW_BUCKET’, ‘ASSETS_BUCKET’, ‘FORMAT_EXT’, ‘CANONICAL_FORMATS’, ‘virgo_recording_prefix’, ‘virgo_…
- ursa.layout.RAW_BUCKET¶
‘constellation-data’
- ursa.layout.ASSETS_BUCKET¶
‘constellation-assets’
- ursa.layout.FORMAT_EXT: dict[ursa.catalog.schemas.StorageFormat, str]¶
None
- ursa.layout.CANONICAL_FORMATS: frozenset[ursa.catalog.schemas.StorageFormat]¶
‘frozenset(…)’
- ursa.layout.virgo_recording_prefix(recording_hash: str) str[source]¶
Prefix for all Virgo-processed objects belonging to one recording.
Used by ingestion to list all canonical objects for a recording (e.g. before lifecycle GC runs). Not used for individual object writes — call
- Func:
virgo_modality_keyfor those.
Example:
virgo/abc123def456/
- ursa.layout.virgo_modality_key(recording_hash: str, modality: str, fmt: ursa.catalog.schemas.StorageFormat) str[source]¶
Key for one modality’s Virgo-processed object in
constellation-assets.Only accepts canonical formats (
ZARR,LANCE,MP4_INDEX,PARQUET). RaisesValueErrorforRAW_*formats — those belong inconstellation-dataand must be addressed via :func:raw_modality_uri.Example:
virgo/abc123def456/eeg.zarr
- ursa.layout.virgo_modality_uri(recording_hash: str, modality: str, fmt: ursa.catalog.schemas.StorageFormat) str[source]¶
Full
r2://URI for a Virgo-processed modality object.Example:
r2://constellation-assets/virgo/abc123def456/eeg.zarr
- ursa.layout.catalog_prefix() str[source]¶
Prefix for all Lance catalog tables in
constellation-assets.Tables live under
ursa/catalog/— theursa/repo header scopes permissions for the Ursa package, mirroring howvirgo/scopes Virgo andorion/scopes Orion.Example:
ursa/catalog/
- ursa.layout.catalog_table_key(table_name: str) str[source]¶
Key for a named Lance catalog table.
Prefer using a :ref:
TABLE_* constant <catalog-table-constants>over a bare string so a rename is a single-file edit.Example:
ursa/catalog/recordings.lance
- ursa.layout.catalog_table_uri(table_name: str) str[source]¶
Full
r2://URI for a Lance catalog table.Example:
r2://constellation-assets/ursa/catalog/recordings.lance
- ursa.layout.TABLE_PARTICIPANTS¶
‘participants’
- ursa.layout.TABLE_RECORDINGS¶
‘recordings’
- ursa.layout.TABLE_MODALITIES¶
‘modalities’
- ursa.layout.TABLE_EVENTS¶
‘events’
- ursa.layout.TABLE_EMBEDDINGS¶
‘embeddings’
- ursa.layout.TABLE_VIRGO_ASSETS¶
‘virgo_assets’
- ursa.layout.TABLE_CHECKPOINTS¶
‘checkpoints’
- ursa.layout.TABLE_BENCHMARK_SUITES¶
‘benchmark_suites’
- ursa.layout.TABLE_BENCHMARK_RESULTS¶
‘benchmark_results’
- ursa.layout.ALL_CATALOG_TABLES: tuple[str, ...]¶
()
- ursa.layout.orion_checkpoint_prefix(checkpoint_id: str) str[source]¶
Prefix for all objects belonging to one Orion model checkpoint.
CheckpointRow.storage_urishould be set to this prefix. The data-hash manifest lives at{orion_checkpoint_prefix(id)}data_hashes/manifest.json— use- Func:
orion_checkpoint_data_hash_keyto construct that path rather than string-concatenating.
Example:
orion/checkpoints/ckpt-abc123/
- ursa.layout.orion_checkpoint_uri(checkpoint_id: str) str[source]¶
Full
r2://URI for an Orion checkpoint prefix.Example:
r2://constellation-assets/orion/checkpoints/ckpt-abc123/
- ursa.layout.orion_checkpoint_data_hash_key(checkpoint_id: str) str[source]¶
Key for the data-hash manifest inside a checkpoint.
This is the file Orion writes that lists every recording consumed during the training run, used for train/test overlap detection.
Example:
orion/checkpoints/ckpt-abc123/data_hashes/manifest.json
- ursa.layout.orion_benchmark_suite_key(suite_name: str, suite_version: int) str[source]¶
Key for a versioned benchmark suite configuration object.
BenchmarkSuiteRow.storage_urishould point at this key. The object contains the held-out query spec and metric definitions.Example:
orion/benchmark-suites/cognitive_load_eval/1.json
- ursa.layout.orion_benchmark_suite_uri(suite_name: str, suite_version: int) str[source]¶
Full
r2://URI for a benchmark suite configuration object.Example:
r2://constellation-assets/orion/benchmark-suites/cognitive_load_eval/1.json
- ursa.layout.orion_benchmark_result_key(result_id: str) str[source]¶
Key for a benchmark evaluation result object.
BenchmarkResultRow.storage_urishould point at this key.Example:
orion/benchmark-results/result-deadbeef.json
- ursa.layout.orion_benchmark_result_uri(result_id: str) str[source]¶
Full
r2://URI for a benchmark evaluation result.Example:
r2://constellation-assets/orion/benchmark-results/result-deadbeef.json
- ursa.layout.raw_recording_prefix(recording_id: str) str[source]¶
Prefix for all raw segment files belonging to one recording.
Matches the key layout introduced in data-engine PR #49:
recordings/<recording_id>/. Note:manifests/is a sibling prefix at the bucket root, not nested underrecordings/.Example:
recordings/rec_20260507_143022_a7f3/
- ursa.layout.raw_modality_prefix(recording_id: str, worker_subdir: str) str[source]¶
Prefix for one modality’s raw segment files.
worker_subdiris the per-worker directory data-engine creates, e.g.camera_front_camoreeg_default.Example:
recordings/rec_20260507_143022_a7f3/camera_front_cam/
- ursa.layout.raw_modality_uri(recording_id: str, worker_subdir: str) str[source]¶
Full
r2://URI for a raw modality prefix inconstellation-data.The URI points at the prefix (trailing
/) — raw modalities consist of multiple segment files. The ingestion step (ENG-888) resolves individual objects within the prefix when buildingModalityRowentries.Example:
r2://constellation-data/recordings/rec_20260507_.../camera_front_cam/
- ursa.layout.raw_commit_marker_key(recording_id: str) str[source]¶
Key for the upload commit marker written by the uploader after a complete session.
This is NOT under
recordings/— the manifests prefix sits at the bucket root alongsiderecordings/,_status/, andnodes/.Example:
manifests/rec_20260507_143022_a7f3/manifest.json
- ursa.layout.raw_status_key(hostname: str) str[source]¶
Key for a rig’s uploader status heartbeat file.
Example:
_status/green-mantis.json
- ursa.layout.raw_node_key(node_id: str) str[source]¶
Key for a node’s registry entry in
constellation-data.Example:
nodes/green-mantis.json
- ursa.layout._VALID_URI_SCHEMES: tuple[str, ...]¶
(‘r2’, ‘s3’, ‘gcs’, ‘file’)
- ursa.layout.validate_storage_uri(uri: str, fmt: ursa.catalog.schemas.StorageFormat) None[source]¶
Reject a
storage_urithat doesn’t match itsStorageFormattier.Phase 1a (M2) callers register raw modalities (
RAW_*) underconstellation-dataand canonical modalities (ZARR,LANCE,MP4_INDEX,PARQUET) underconstellation-assets. This helper enforces that contract before any catalog row is written.Test-profile bucket suffixes (
-test) are not yet recognised — seeENG-1071 <https://linear.app/constellationlab/issue/ENG-1071>_.Raises
ValueError(the typed PydanticURI_PATTERNregex would catch malformed input upstream; this validator handles the semantic mismatch where a syntactically-valid URI points at the wrong bucket for its format).