Ursa¶
Database / data-access layer for Constellation’s research stack.
Ursa is Constellation’s data-access layer for multimodal recordings: EEG, video, eye tracking, biometrics, questionnaires, keyboard/mouse/screen captures, and more. It answers two questions for everything else in the stack — what recordings exist? and give me the bytes — so that Virgo and Orion never touch raw object storage directly.
The data model is intentionally minimal: Participant → Recording (recording_hash) → Modality + Events + flexible metadata. No sessions, no trials, no stimuli.
Concretely, Ursa is three things stacked together:
a Lance metadata catalog — a small set of tables describing every participant, recording, modality, and event;
an R2 object store holding the actual modality bytes, addressed by absolute URI from each catalog row; and
format-aware readers that turn those bytes into Python objects — Zarr for dense streams, Lance for sparse/variable events, MP4 for video.
How a read flows through the layers:
Researcher / Virgo / Orion
│ query() (filter catalog rows)
▼
ursa.DataInterface ──────────────────► Lance catalog
│ materialize() ◄───── participants · recordings ·
│ stream() modalities · events · …
│ download()
▼
Format-aware readers ──read by storage_uri──► R2 object store
(Zarr · Lance · MP4) raw + processed bytes
The current implementation is a catalog and byte-access layer: the Lance catalog plus the resolved-input read verbs ursa.DataInterface.query(), .materialize(), .stream(), and .download() over raw and processed (Zarr/Lance/MP4) modalities, including a flat windowed iterator. Vector search, lifecycle GC, and Polaris cache sync are roadmap work.
Contents
- Setup
- Quickstart
- Architecture
- Layered overview
- Current state versus roadmap
- Storage tiers and R2 layout —
src/ursa/layout.py - Lance catalog schemas —
src/ursa/catalog/schemas.py - Store abstraction —
src/ursa/store/ - DataInterface read verbs —
src/ursa/data_interface.py:216 - Register / write side —
src/ursa/register/andDataInterface.register_* - Temporal classes and the temporaldata extension —
src/ursa/temporal.py - Active-catalog pointer —
src/ursa/catalog/_pointer.py - Time model —
src/ursa/time.py - Cross-repo contracts
- Lifecycle (Planned)
- Concepts
- Tutorials
- API Reference
- API Reference
ursaursa.data_interfaceursa._query_typesursa._temporaldata_compatursa.backendsursa.backends._zarrursa.backends.lanceursa.backends.videoursa.catalogursa.catalog.catalogursa.catalog.exceptionsursa.catalog.schemasursa.filtersursa.layoutursa.participantursa.rawursa.recoveryursa.recovery.first_epochursa.recovery.timingursa.registerursa.register.orchestratorursa.register.payloadursa.statusursa.storeursa.store.baseursa.store.configursa.store.factoryursa.store.uriursa.temporalursa.time
- Admin runbooks
Where this fits¶
Ursa is one of three packages in Constellation’s research stack:
Ursa (this site) — database / data layer
Virgo — DAG-based preprocessing
Orion — research / training / benchmarking
Full architecture: Research Stack Architecture (Notion).
Status¶
Current shipped scope is raw catalog + raw byte access only. Implementation is tracked in the Linear Ursa project.