ursa.download

Phase 1a (M2) — raw path only

For each modality with ingestion_status="raw", list the segment files under ModalityRow.raw_storage_uri (a prefix), stream each segment’s bytes to disk under a per-modality directory, and return the list of files written. ObjectStore.open() is used (not get()) so multi-GB objects (RAW_VIDEO, RAW_AUDIO) never sit in memory.

The ticket’s {modality}.{ext} form presupposes single-file canonical formats — that’s Phase 1b. Raw modalities are multi-segment prefixes (N segment files under a directory), so M2 writes a directory tree per modality and the layout selector controls only the parent path. The ticket annotation is captured in the PR body.

Phase 1b — processed path

Deferred to ENG-1093. download currently raises

class:

NotImplementedError for any modality with ingestion_status="processed"; the processed-path read (using storage_uri and canonical Zarr/Lance/MP4_INDEX/Parquet tiers) lands with the rest of ENG-1093.

Stream raw-modality bytes from object storage to local disk (ENG-1091).

Third verb in the M2 three-verb read API (querygetdownload): query() selects, get() (ENG-890) materializes bytes into memory, and download() writes those same bytes to disk without parsing. download and get are symmetric — both consume a :class:QueryResult (or iterable of them), both resolve URIs via :func:ursa.store.parse_storage_uri, and both are M2-gated to ingestion_status="raw".

Module Contents

Classes

_PlannedWrite

One (store, source key, destination path) tuple produced by the plan phase and consumed by the execute phase.

Functions

download

Stream raw-modality bytes for target to disk under dest.

_plan_writes

Enumerate every write before any I/O.

_check_modality_eligibility

Raise if mrow is not eligible for M2 raw-path download.

_dest_path

Compute the on-disk destination for one source segment.

_stream_to_disk

Copy bytes from store[key] to dest via a temp-file rename.

Data

API

ursa.download.__all__

[‘download’]

ursa.download._PROCESSED_PATH_TICKET

‘ENG-1093’

ursa.download._LayoutMode

None

ursa.download._VALID_LAYOUTS: frozenset[str]

‘frozenset(…)’

ursa.download._STREAM_CHUNK

None

class ursa.download._PlannedWrite[source]

Bases: typing.NamedTuple

One (store, source key, destination path) tuple produced by the plan phase and consumed by the execute phase.

Holds the :class:ObjectMeta too so the execute phase can carry size info for diagnostics without re-listing the prefix.

store: ursa.store.ObjectStore

None

source_key: str

None

dest_path: pathlib.Path

None

meta_size: int

None

ursa.download.download(target: ursa.query.QueryResult | collections.abc.Iterable[ursa.query.QueryResult], dest: str | os.PathLike[str], *, layout: ursa.download._LayoutMode = 'by_recording', overwrite: bool = False) list[pathlib.Path][source]

Stream raw-modality bytes for target to disk under dest.

Parameters

target A single :class:QueryResult or any iterable of them. Single-vs-iterable disambiguation is by type (isinstance(target, QueryResult)), not by length — a one-element iterable still yields the flat-list contract. dest Destination directory. Created if missing. Coerced to :class:pathlib.Path. layout Per-segment dest path scheme:

* ``"by_recording"`` (default) →
  ``dest/{recording_hash}/{modality}/{relative_segment_key}``
* ``"by_modality"`` →
  ``dest/{modality}/{recording_hash}/{relative_segment_key}``
* ``"flat"`` →
  ``dest/{recording_hash}__{modality}/{relative_segment_key}``

overwrite False (default) raises :class:FileExistsError if any destination file already exists, before any I/O. True lets the temp-file rename in :func:_stream_to_disk atomically replace each existing target — no explicit unlink, so a mid-stream crash leaves the prior file intact at the canonical path.

Returns

list[Path] Files written, always a flat list even for a single :class:QueryResult input. Ordering is deterministic:

1. input-recording order (``target`` iteration order),
2. then ``qr.modalities.items()`` insertion order,
3. then lexicographic order on ``ObjectMeta.key`` within a
   modality.

Raises

NotImplementedError If any matched modality has ingestion_status="processed" (deferred to ENG-1093 / Phase 1b). FileNotFoundError If a registered modality’s raw_storage_uri lists zero objects — a catalog/upload bug rather than a silent no-op. FileExistsError If overwrite=False and one or more destination files already exist. Every collision is collected and surfaced in the error message (truncated past 20 entries). ValueError If layout is not one of the three accepted modes, if the plan produces two writes for the same destination path (defense-in-depth across all layouts), or if layout="flat" would join an unsafe recording_hash or modality name (one containing "__").

Notes

Streaming uses :meth:ObjectStore.open (forward-only) — never

Meth:

ObjectStore.get — so multi-GB raw video/audio is never fully materialized in memory. A crash or signal mid-stream leaves a .part file that is unlinked by the except handler; no half-written file is left at the canonical destination.

The single-file {modality}.{ext} form implied by the ticket signature lands with the Phase 1b processed-path work (ENG-1093); M2 writes a directory tree per modality regardless of layout.

ursa.download._plan_writes(qrs: list[ursa.query.QueryResult], dest_root: pathlib.Path, *, layout: ursa.download._LayoutMode, overwrite: bool) list[ursa.download._PlannedWrite][source]

Enumerate every write before any I/O.

One pass over the inputs produces:

  • the full ordered list of :class:_PlannedWrite tuples (consumed by the execute phase — no re-listing of R2 prefixes),

  • an intra-call duplicate-destination check that catches collisions across all layouts (cheap dict lookup; future-proofs against new layout modes),

  • a pre-existing-destination check that batches every collision into one :class:FileExistsError when overwrite=False.

ursa.download._check_modality_eligibility(modality_name: str, mrow: ursa.catalog.schemas.ModalityRow, *, layout: ursa.download._LayoutMode) None[source]

Raise if mrow is not eligible for M2 raw-path download.

ursa.download._dest_path(dest_root: pathlib.Path, *, layout: ursa.download._LayoutMode, recording_hash: str, modality: str, rel_key: str) pathlib.Path[source]

Compute the on-disk destination for one source segment.

rel_key is the source key’s path relative to its modality prefix; passing it through as a sub-path preserves any data-engine-internal worker structure (e.g. multi-file segments under date-partitioned subdirectories). FORMAT_EXT[mrow.format] is intentionally unused in M2 — Phase 1b (ENG-1093) consumes it for the single-file {modality}.{ext} form.

ursa.download._stream_to_disk(store: ursa.store.ObjectStore, key: str, dest: pathlib.Path) None[source]

Copy bytes from store[key] to dest via a temp-file rename.

Uses :meth:ObjectStore.open (forward-only BinaryIO), not

Meth:

ObjectStore.get, so multi-GB raw video/audio never sits in memory. The .part rename ensures partial writes are not visible at the canonical destination — if the stream fails or the process is interrupted, the temp file is unlinked and the canonical path either doesn’t exist (first write) or still holds the previous bytes (overwrite case — os.replace is atomic and only swaps in the new file once it’s fully written).