Ursa

Database / data-access layer for Constellation’s research stack.

Ursa is Constellation’s data-access layer for multimodal recordings: EEG, video, eye tracking, biometrics, questionnaires, keyboard/mouse/screen captures, and more. It answers two questions for everything else in the stack — what recordings exist? and give me the bytes — so that Virgo and Orion never touch raw object storage directly.

The data model is intentionally minimal: Participant → Recording (recording_hash) → Modality + Events + flexible metadata. No sessions, no trials, no stimuli.

Concretely, Ursa is three things stacked together:

  • a Lance metadata catalog — a small set of tables describing every participant, recording, modality, and event;

  • an R2 object store holding the actual modality bytes, addressed by absolute URI from each catalog row; and

  • format-aware readers that turn those bytes into Python objects — Zarr for dense streams, Lance for sparse/variable events, MP4 for video.

How a read flows through the layers:

Researcher / Virgo / Orion
          │  query()                          (filter catalog rows)
          ▼
   ursa.DataInterface ──────────────────►  Lance catalog
          │   materialize()           ◄─────  participants · recordings ·
          │   stream()                          modalities · events · …
          │   download()
          ▼
   Format-aware readers ──read by storage_uri──►  R2 object store
   (Zarr · Lance · MP4)                            raw + processed bytes

The current implementation is a catalog and byte-access layer: the Lance catalog plus the resolved-input read verbs ursa.DataInterface.query(), .materialize(), .stream(), and .download() over raw and processed (Zarr/Lance/MP4) modalities, including a flat windowed iterator. Vector search, lifecycle GC, and Polaris cache sync are roadmap work.

Contents

Where this fits

Ursa is one of three packages in Constellation’s research stack:

  • Ursa (this site) — database / data layer

  • Virgo — DAG-based preprocessing

  • Orion — research / training / benchmarking

Full architecture: Research Stack Architecture (Notion).

Status

Current shipped scope is raw catalog + raw byte access only. Implementation is tracked in the Linear Ursa project.