ursa.query

Top-level read API for Ursa (ENG-889 + ENG-1081).

Phase 1a (architecture v0.4 §7) ships a raw-store query: callers filter the catalog by participant, modality, recording hash, and metadata equality, and get back :class:RecordingResult objects pointing at whole-file raw modality URIs. No Zarr, no temporal slicing, no temporaldata integration — those arrive after Virgo’s M2 ingestion node lands and are tracked by ENG-1082.

The advanced kwargs (time_range, pipeline_version, time_filters, metadata_filters, derived) are part of the long-term §3.5 surface and are accepted at the type-system level so the public signature stays stable. Passing any of them while a matched modality has ingestion_status="raw" raises :class:NotImplementedError with a pointer to ENG-1082. Same precedent as :meth:Catalog.delete (ENG-1069).

The return type diverges from §3.5’s “list[temporaldata.Data]” wording. Phase 1a forbids temporaldata integration outright, and ModalityRow on a raw recording carries no domain (domain_intervals is null until ingestion). Wrapping a half-populated temporaldata.Data would be type-lying; instead we return :class:RecordingResult carrying the catalog projection. ENG-899 (M3) replaces the modalities values with temporaldata subclasses once the processed store exists, and the existing wrapper composes naturally with that.

Module Contents

Classes

RecordingResult

One recording matched by :func:ursa.query.

QuerySpec

Pydantic spec for :func:ursa.query.

Functions

query

Filter the Ursa catalog and return matching recordings.

_reject_kwargs_with_spec

If spec is supplied, every filter kwarg must be unset.

_metadata_matches

Equality on every key in expected; missing keys do not match.

_collect_modalities

Fetch the modality rows for recordings, grouped by recording_hash.

_enforce_processed_gate

Raise if any matched modality is still raw and the spec asked for processed-only behavior. The first offending (modality, recording, kwarg) is reported so the message points at a concrete row.

_processed_kwargs_set

Names of the processed-only kwargs that are non-None / non-empty.

Data

API

ursa.query.__all__

[‘QuerySpec’, ‘RecordingResult’, ‘query’]

ursa.query._BACKFILL_TICKET

‘ENG-1082’

ursa.query._PROCESSED_KWARGS

(‘time_range’, ‘pipeline_version’, ‘time_filters’, ‘metadata_filters’, ‘derived’)

class ursa.query.RecordingResult(/, **data: typing.Any)[source]

Bases: pydantic.BaseModel

One recording matched by :func:ursa.query.

Phase 1a carries catalog projections only — no array bytes, no temporaldata.Data. ModalityRow.raw_storage_uri is the cold-bucket pointer ENG-890 (ursa.get) reads from. ENG-899 (M3) replaces the modalities values with array-bearing temporaldata subclasses once the processed store exists.

Frozen for accidental-mutation safety; not hashable (the dict/list fields prevent it). Use recording_hash as the cache key.

Initialization

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

model_config

‘ConfigDict(…)’

recording_hash: ursa.catalog.schemas.CatalogID

None

participant_ids: list[ursa.catalog.schemas.CatalogID]

None

start_time: ursa.catalog.schemas.UTCDatetime

None

duration: datetime.timedelta

None

device_info: ursa.catalog.schemas.MetadataDict

None

metadata: ursa.catalog.schemas.MetadataDict

None

modalities: dict[ursa.catalog.schemas.ModalityName, ursa.catalog.schemas.ModalityRow]

None

class ursa.query.QuerySpec(/, **data: typing.Any)[source]

Bases: pydantic.BaseModel

Pydantic spec for :func:ursa.query.

Phase 1a — implemented over the raw catalog: participants, modalities, recording_hash, metadata.

Phase 1a — accepted at the signature level, raising

Class:

NotImplementedError (referencing ENG-1082) when a matched modality has ingestion_status="raw": time_range, pipeline_version, time_filters, metadata_filters, derived.

The complex-list fields use list[Any] as a deliberate forward-compat escape hatch; ENG-1082 will replace each with a typed model (AroundEvent | TimeWindow, Filter, DerivedSelector).

Initialization

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

model_config

‘ConfigDict(…)’

participants: list[ursa.catalog.schemas.CatalogID] | None

None

modalities: list[ursa.catalog.schemas.ModalityName] | None

None

recording_hash: ursa.catalog.schemas.CatalogID | None

None

metadata: ursa.catalog.schemas.MetadataDict | None

None

time_range: tuple[ursa.catalog.schemas.UTCDatetime, ursa.catalog.schemas.UTCDatetime] | None

None

pipeline_version: str | None

None

time_filters: list[Any]

‘Field(…)’

metadata_filters: list[Any]

‘Field(…)’

derived: list[Any]

‘Field(…)’

_validate() typing_extensions.Self[source]
has_processed_kwarg() bool[source]

True iff any kwarg requiring the processed store is set.

ursa.query.query(spec: ursa.query.QuerySpec | None = None, *, catalog: ursa.catalog.Catalog, participants: list[ursa.catalog.schemas.CatalogID] | None = None, modalities: list[ursa.catalog.schemas.ModalityName] | None = None, recording_hash: ursa.catalog.schemas.CatalogID | None = None, metadata: ursa.catalog.schemas.MetadataDict | None = None, time_range: tuple[ursa.catalog.schemas.UTCDatetime, ursa.catalog.schemas.UTCDatetime] | None = None, pipeline_version: str | None = None, time_filters: list[Any] | None = None, metadata_filters: list[Any] | None = None, derived: list[Any] | None = None, limit: int | None = None) list[ursa.query.RecordingResult][source]

Filter the Ursa catalog and return matching recordings.

Two surfaces:

  1. Common case — pass kwargs directly: ursa.query(catalog=cat, participants=[...], modalities=[...]). The kwargs auto-validate against :class:QuerySpec.

  2. Complex case — build a :class:QuerySpec and pass it as the first positional arg: ursa.query(spec, catalog=cat).

Phase 1a (architecture v0.4) returns :class:RecordingResult carrying catalog projections only — no array bytes. Use :func:ursa.get (ENG-890) to read the raw modality file. Temporal kwargs and pipeline_version raise :class:NotImplementedError when a matched modality has ingestion_status="raw"; the green path ships with ENG-1082 once Virgo M2’s ingestion node populates the processed store.

Parameters

spec Pre-built :class:QuerySpec. If given, all filter kwargs must be unset (catalog and limit may still be passed); else :class:TypeError. catalog Required. Use Catalog.local(...) for tests / scripting or Catalog.from_store(get_store(...)) for R2-backed catalogs. limit Caps the returned list. Not a scan cap — when metadata= is also set we scan the full recordings table and truncate at the end. ENG-1088 lifts this once Lance MapType pushdown lands (ENG-1066).

Returns

list[RecordingResult] Empty list if nothing matches. An empty query (no filters, no limit) returns every recording in the catalog — caller’s responsibility to gate that on production-scale catalogs.

ursa.query._FILTER_KWARG_NAMES

(‘participants’, ‘modalities’, ‘recording_hash’, ‘metadata’, ‘time_range’, ‘pipeline_version’, ‘time…

ursa.query._reject_kwargs_with_spec(**kwargs: Any) None[source]

If spec is supplied, every filter kwarg must be unset.

Mixing the two would force us to merge a spec with kwargs, and the merge semantics (override vs. union) are unobvious enough that being strict is cheaper than picking a rule.

ursa.query._metadata_matches(actual: collections.abc.Mapping[str, Any], expected: collections.abc.Mapping[str, Any]) bool[source]

Equality on every key in expected; missing keys do not match.

ursa.query._collect_modalities(catalog: ursa.catalog.Catalog, recordings: list[ursa.catalog.schemas.RecordingRow], modality_filter: list[ursa.catalog.schemas.ModalityName] | None) dict[ursa.catalog.schemas.CatalogID, list[ursa.catalog.schemas.ModalityRow]][source]

Fetch the modality rows for recordings, grouped by recording_hash.

Single list_modalities call (no batching in Phase 1a; ENG-1089 covers growth). Empty input short-circuits to avoid an empty-IN-list edge case.

ursa.query._enforce_processed_gate(spec: ursa.query.QuerySpec, by_recording: dict[ursa.catalog.schemas.CatalogID, list[ursa.catalog.schemas.ModalityRow]]) None[source]

Raise if any matched modality is still raw and the spec asked for processed-only behavior. The first offending (modality, recording, kwarg) is reported so the message points at a concrete row.

ursa.query._processed_kwargs_set(spec: ursa.query.QuerySpec) list[str][source]

Names of the processed-only kwargs that are non-None / non-empty.

Truthiness uniformly handles both shapes _PROCESSED_KWARGS covers: None-or-value scalars (time_range, pipeline_version) and list defaults (time_filters, metadata_filters, derived).