ursa.download¶
Phase 1a (M2) — raw path only¶
For each modality with ingestion_status="raw", list the segment
files under ModalityRow.raw_storage_uri (a prefix), stream each
segment’s bytes to disk under a per-modality directory, and return the
list of files written. ObjectStore.open() is used (not get()) so
multi-GB objects (RAW_VIDEO, RAW_AUDIO) never sit in memory.
The ticket’s {modality}.{ext} form presupposes single-file canonical
formats — that’s Phase 1b. Raw modalities are multi-segment prefixes
(N segment files under a directory), so M2 writes a directory tree per
modality and the layout selector controls only the parent path. The
ticket annotation is captured in the PR body.
Phase 1b — processed path¶
Deferred to ENG-1093. download currently raises
- class:
NotImplementedErrorfor any modality withingestion_status="processed"; the processed-path read (usingstorage_uriand canonical Zarr/Lance/MP4_INDEX/Parquet tiers) lands with the rest of ENG-1093.
Stream raw-modality bytes from object storage to local disk (ENG-1091).
Third verb in the M2 three-verb read API (query → get →
download): query() selects, get() (ENG-890) materializes
bytes into memory, and download() writes those same bytes to disk
without parsing. download and get are symmetric — both
consume a :class:QueryResult (or iterable of them), both resolve URIs
via :func:ursa.store.parse_storage_uri, and both are M2-gated to
ingestion_status="raw".
Module Contents¶
Classes¶
One (store, source key, destination path) tuple produced by the plan phase and consumed by the execute phase. |
Functions¶
Stream raw-modality bytes for |
|
Enumerate every write before any I/O. |
|
Raise if |
|
Compute the on-disk destination for one source segment. |
|
Copy bytes from |
Data¶
API¶
- ursa.download.__all__¶
[‘download’]
- ursa.download._PROCESSED_PATH_TICKET¶
‘ENG-1093’
- ursa.download._LayoutMode¶
None
- ursa.download._VALID_LAYOUTS: frozenset[str]¶
‘frozenset(…)’
- ursa.download._STREAM_CHUNK¶
None
- class ursa.download._PlannedWrite[source]¶
Bases:
typing.NamedTupleOne (store, source key, destination path) tuple produced by the plan phase and consumed by the execute phase.
Holds the :class:
ObjectMetatoo so the execute phase can carry size info for diagnostics without re-listing the prefix.- store: ursa.store.ObjectStore¶
None
- source_key: str¶
None
- dest_path: pathlib.Path¶
None
- meta_size: int¶
None
- ursa.download.download(target: ursa.query.QueryResult | collections.abc.Iterable[ursa.query.QueryResult], dest: str | os.PathLike[str], *, layout: ursa.download._LayoutMode = 'by_recording', overwrite: bool = False) list[pathlib.Path][source]¶
Stream raw-modality bytes for
targetto disk underdest.Parameters
target A single :class:
QueryResultor any iterable of them. Single-vs-iterable disambiguation is by type (isinstance(target, QueryResult)), not by length — a one-element iterable still yields the flat-list contract. dest Destination directory. Created if missing. Coerced to :class:pathlib.Path. layout Per-segment dest path scheme:* ``"by_recording"`` (default) → ``dest/{recording_hash}/{modality}/{relative_segment_key}`` * ``"by_modality"`` → ``dest/{modality}/{recording_hash}/{relative_segment_key}`` * ``"flat"`` → ``dest/{recording_hash}__{modality}/{relative_segment_key}``overwrite
False(default) raises :class:FileExistsErrorif any destination file already exists, before any I/O.Truelets the temp-file rename in :func:_stream_to_diskatomically replace each existing target — no explicit unlink, so a mid-stream crash leaves the prior file intact at the canonical path.Returns
list[Path] Files written, always a flat list even for a single :class:
QueryResultinput. Ordering is deterministic:1. input-recording order (``target`` iteration order), 2. then ``qr.modalities.items()`` insertion order, 3. then lexicographic order on ``ObjectMeta.key`` within a modality.
Raises
NotImplementedError If any matched modality has
ingestion_status="processed"(deferred to ENG-1093 / Phase 1b). FileNotFoundError If a registered modality’sraw_storage_urilists zero objects — a catalog/upload bug rather than a silent no-op. FileExistsError Ifoverwrite=Falseand one or more destination files already exist. Every collision is collected and surfaced in the error message (truncated past 20 entries). ValueError Iflayoutis not one of the three accepted modes, if the plan produces two writes for the same destination path (defense-in-depth across all layouts), or iflayout="flat"would join an unsaferecording_hashor modality name (one containing"__").Notes
Streaming uses :meth:
ObjectStore.open(forward-only) — never- Meth:
ObjectStore.get— so multi-GB raw video/audio is never fully materialized in memory. A crash or signal mid-stream leaves a.partfile that is unlinked by the except handler; no half-written file is left at the canonical destination.
The single-file
{modality}.{ext}form implied by the ticket signature lands with the Phase 1b processed-path work (ENG-1093); M2 writes a directory tree per modality regardless of layout.
- ursa.download._plan_writes(qrs: list[ursa.query.QueryResult], dest_root: pathlib.Path, *, layout: ursa.download._LayoutMode, overwrite: bool) list[ursa.download._PlannedWrite][source]¶
Enumerate every write before any I/O.
One pass over the inputs produces:
the full ordered list of :class:
_PlannedWritetuples (consumed by the execute phase — no re-listing of R2 prefixes),an intra-call duplicate-destination check that catches collisions across all layouts (cheap dict lookup; future-proofs against new layout modes),
a pre-existing-destination check that batches every collision into one :class:
FileExistsErrorwhenoverwrite=False.
- ursa.download._check_modality_eligibility(modality_name: str, mrow: ursa.catalog.schemas.ModalityRow, *, layout: ursa.download._LayoutMode) None[source]¶
Raise if
mrowis not eligible for M2 raw-path download.
- ursa.download._dest_path(dest_root: pathlib.Path, *, layout: ursa.download._LayoutMode, recording_hash: str, modality: str, rel_key: str) pathlib.Path[source]¶
Compute the on-disk destination for one source segment.
rel_keyis the source key’s path relative to its modality prefix; passing it through as a sub-path preserves any data-engine-internal worker structure (e.g. multi-file segments under date-partitioned subdirectories).FORMAT_EXT[mrow.format]is intentionally unused in M2 — Phase 1b (ENG-1093) consumes it for the single-file{modality}.{ext}form.
- ursa.download._stream_to_disk(store: ursa.store.ObjectStore, key: str, dest: pathlib.Path) None[source]¶
Copy bytes from
store[key]todestvia a temp-file rename.Uses :meth:
ObjectStore.open(forward-onlyBinaryIO), not- Meth:
ObjectStore.get, so multi-GB raw video/audio never sits in memory. The.partrename ensures partial writes are not visible at the canonical destination — if the stream fails or the process is interrupted, the temp file is unlinked and the canonical path either doesn’t exist (first write) or still holds the previous bytes (overwrite case —os.replaceis atomic and only swaps in the new file once it’s fully written).