Ursa

Database / data-access layer for Constellation’s research stack.

Ursa is Constellation’s data-access layer for multimodal recordings: EEG, video, eye tracking, biometrics, questionnaires, keyboard/mouse/screen captures, and more.

The data model is intentionally minimal: Participant → Recording (recording_hash) → Modality + Events + flexible metadata. No sessions, no trials, no stimuli.

The current M2 implementation is a raw catalog and byte-access layer: a Lance metadata catalog on Cloudflare R2 plus ursa.DataInterface.query(), .get(), and .download() for raw modalities. Processed Zarr arrays, temporal slicing, streaming windows, vector search, lifecycle GC, and Polaris cache sync are roadmap work.

Where this fits

Ursa is one of three packages in Constellation’s research stack:

  • Ursa (this site) — database / data layer

  • Virgo — DAG-based preprocessing

  • Orion — research / training / benchmarking

Full architecture: Research Stack Architecture (Notion).

Status

🌱 Early bootstrap. M2 is raw catalog + raw byte access only. Implementation is tracked in the Linear Ursa project.

Phasing (mirrors the Linear project milestones):

  • M1 — Foundations

  • M2 — Underbuilt MVP (Phase 1a, solo) (current)

  • M3 — Production Database (Phase 1b, with Killian)

  • M4 — Production-scale (Phase 4)

  • M5 — Benchmarks Integration (Phase 5)

  • M6 — Polish & Onboarding (Phase 6)