ursa.query¶
Top-level read API for Ursa (ENG-889 + ENG-1081).
Phase 1a (architecture v0.4 §7) ships a raw-store query: callers filter the
catalog by participant, modality, recording hash, and metadata equality, and
get back :class:RecordingResult objects pointing at whole-file raw modality
URIs. No Zarr, no temporal slicing, no temporaldata integration — those
arrive after Virgo’s M2 ingestion node lands and are tracked by ENG-1082.
The advanced kwargs (time_range, pipeline_version, time_filters,
metadata_filters, derived) are part of the long-term §3.5 surface and
are accepted at the type-system level so the public signature stays stable.
Passing any of them while a matched modality has ingestion_status="raw"
raises :class:NotImplementedError with a pointer to ENG-1082. Same precedent
as :meth:Catalog.delete (ENG-1069).
The return type diverges from §3.5’s “list[temporaldata.Data]” wording.
Phase 1a forbids temporaldata integration outright, and ModalityRow
on a raw recording carries no domain (domain_intervals is null until
ingestion). Wrapping a half-populated temporaldata.Data would be
type-lying; instead we return :class:RecordingResult carrying the catalog
projection. ENG-899 (M3) replaces the modalities values with
temporaldata subclasses once the processed store exists, and the
existing wrapper composes naturally with that.
Module Contents¶
Classes¶
One recording matched by :func: |
|
Pydantic spec for :func: |
Functions¶
Filter the Ursa catalog and return matching recordings. |
|
If |
|
Equality on every key in |
|
Fetch the modality rows for |
|
Raise if any matched modality is still raw and the spec asked for processed-only behavior. The first offending (modality, recording, kwarg) is reported so the message points at a concrete row. |
|
Names of the processed-only kwargs that are non-None / non-empty. |
Data¶
API¶
- ursa.query.__all__¶
[‘QuerySpec’, ‘RecordingResult’, ‘query’]
- ursa.query._BACKFILL_TICKET¶
‘ENG-1082’
- ursa.query._PROCESSED_KWARGS¶
(‘time_range’, ‘pipeline_version’, ‘time_filters’, ‘metadata_filters’, ‘derived’)
- class ursa.query.RecordingResult(/, **data: typing.Any)[source]¶
Bases:
pydantic.BaseModelOne recording matched by :func:
ursa.query.Phase 1a carries catalog projections only — no array bytes, no
temporaldata.Data.ModalityRow.raw_storage_uriis the cold-bucket pointer ENG-890 (ursa.get) reads from. ENG-899 (M3) replaces themodalitiesvalues with array-bearingtemporaldatasubclasses once the processed store exists.Frozen for accidental-mutation safety; not hashable (the
dict/listfields prevent it). Userecording_hashas the cache key.Initialization
Create a new model by parsing and validating input data from keyword arguments.
Raises [
ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.selfis explicitly positional-only to allowselfas a field name.- model_config¶
‘ConfigDict(…)’
- recording_hash: ursa.catalog.schemas.CatalogID¶
None
- participant_ids: list[ursa.catalog.schemas.CatalogID]¶
None
- start_time: ursa.catalog.schemas.UTCDatetime¶
None
- duration: datetime.timedelta¶
None
- device_info: ursa.catalog.schemas.MetadataDict¶
None
- metadata: ursa.catalog.schemas.MetadataDict¶
None
- modalities: dict[ursa.catalog.schemas.ModalityName, ursa.catalog.schemas.ModalityRow]¶
None
- class ursa.query.QuerySpec(/, **data: typing.Any)[source]¶
Bases:
pydantic.BaseModelPydantic spec for :func:
ursa.query.Phase 1a — implemented over the raw catalog:
participants,modalities,recording_hash,metadata.Phase 1a — accepted at the signature level, raising
- Class:
NotImplementedError(referencing ENG-1082) when a matched modality hasingestion_status="raw":time_range,pipeline_version,time_filters,metadata_filters,derived.
The complex-list fields use
list[Any]as a deliberate forward-compat escape hatch; ENG-1082 will replace each with a typed model (AroundEvent | TimeWindow,Filter,DerivedSelector).Initialization
Create a new model by parsing and validating input data from keyword arguments.
Raises [
ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.selfis explicitly positional-only to allowselfas a field name.- model_config¶
‘ConfigDict(…)’
- participants: list[ursa.catalog.schemas.CatalogID] | None¶
None
- modalities: list[ursa.catalog.schemas.ModalityName] | None¶
None
- recording_hash: ursa.catalog.schemas.CatalogID | None¶
None
- metadata: ursa.catalog.schemas.MetadataDict | None¶
None
- time_range: tuple[ursa.catalog.schemas.UTCDatetime, ursa.catalog.schemas.UTCDatetime] | None¶
None
- pipeline_version: str | None¶
None
- time_filters: list[Any]¶
‘Field(…)’
- metadata_filters: list[Any]¶
‘Field(…)’
- derived: list[Any]¶
‘Field(…)’
- ursa.query.query(spec: ursa.query.QuerySpec | None = None, *, catalog: ursa.catalog.Catalog, participants: list[ursa.catalog.schemas.CatalogID] | None = None, modalities: list[ursa.catalog.schemas.ModalityName] | None = None, recording_hash: ursa.catalog.schemas.CatalogID | None = None, metadata: ursa.catalog.schemas.MetadataDict | None = None, time_range: tuple[ursa.catalog.schemas.UTCDatetime, ursa.catalog.schemas.UTCDatetime] | None = None, pipeline_version: str | None = None, time_filters: list[Any] | None = None, metadata_filters: list[Any] | None = None, derived: list[Any] | None = None, limit: int | None = None) list[ursa.query.RecordingResult][source]¶
Filter the Ursa catalog and return matching recordings.
Two surfaces:
Common case — pass kwargs directly:
ursa.query(catalog=cat, participants=[...], modalities=[...]). The kwargs auto-validate against :class:QuerySpec.Complex case — build a :class:
QuerySpecand pass it as the first positional arg:ursa.query(spec, catalog=cat).
Phase 1a (architecture v0.4) returns :class:
RecordingResultcarrying catalog projections only — no array bytes. Use :func:ursa.get(ENG-890) to read the raw modality file. Temporal kwargs andpipeline_versionraise :class:NotImplementedErrorwhen a matched modality hasingestion_status="raw"; the green path ships with ENG-1082 once Virgo M2’s ingestion node populates the processed store.Parameters
spec Pre-built :class:
QuerySpec. If given, all filter kwargs must be unset (catalogandlimitmay still be passed); else :class:TypeError. catalog Required. UseCatalog.local(...)for tests / scripting orCatalog.from_store(get_store(...))for R2-backed catalogs. limit Caps the returned list. Not a scan cap — whenmetadata=is also set we scan the full recordings table and truncate at the end. ENG-1088 lifts this once Lance MapType pushdown lands (ENG-1066).Returns
list[RecordingResult] Empty list if nothing matches. An empty query (no filters, no
limit) returns every recording in the catalog — caller’s responsibility to gate that on production-scale catalogs.
- ursa.query._FILTER_KWARG_NAMES¶
(‘participants’, ‘modalities’, ‘recording_hash’, ‘metadata’, ‘time_range’, ‘pipeline_version’, ‘time…
- ursa.query._reject_kwargs_with_spec(**kwargs: Any) None[source]¶
If
specis supplied, every filter kwarg must be unset.Mixing the two would force us to merge a spec with kwargs, and the merge semantics (override vs. union) are unobvious enough that being strict is cheaper than picking a rule.
- ursa.query._metadata_matches(actual: collections.abc.Mapping[str, Any], expected: collections.abc.Mapping[str, Any]) bool[source]¶
Equality on every key in
expected; missing keys do not match.
- ursa.query._collect_modalities(catalog: ursa.catalog.Catalog, recordings: list[ursa.catalog.schemas.RecordingRow], modality_filter: list[ursa.catalog.schemas.ModalityName] | None) dict[ursa.catalog.schemas.CatalogID, list[ursa.catalog.schemas.ModalityRow]][source]¶
Fetch the modality rows for
recordings, grouped by recording_hash.Single
list_modalitiescall (no batching in Phase 1a; ENG-1089 covers growth). Empty input short-circuits to avoid an empty-IN-list edge case.
- ursa.query._enforce_processed_gate(spec: ursa.query.QuerySpec, by_recording: dict[ursa.catalog.schemas.CatalogID, list[ursa.catalog.schemas.ModalityRow]]) None[source]¶
Raise if any matched modality is still raw and the spec asked for processed-only behavior. The first offending (modality, recording, kwarg) is reported so the message points at a concrete row.
- ursa.query._processed_kwargs_set(spec: ursa.query.QuerySpec) list[str][source]¶
Names of the processed-only kwargs that are non-None / non-empty.
Truthiness uniformly handles both shapes
_PROCESSED_KWARGScovers:None-or-value scalars (time_range,pipeline_version) and list defaults (time_filters,metadata_filters,derived).