ursa.backends.video

MP4-backed lazy video reader for StorageFormat.MP4_INDEX modalities.

Ported from neuro-galaxy/temporaldata@ian/lazy-everything/temporaldata/lazy_video.py (2026-05 snapshot) with four adaptations:

  1. video_file paths → uri: str + :class:ObjectStore. Segments are downloaded to a process-global LRU cache keyed by (uri, etag) on first slice; PyAV decodes the local copy. Cache directory and quota are controlled by the module-level :data:_DEFAULT_CACHE_DIR and

    data:

    _DEFAULT_CACHE_QUOTA_GB constants — tests and operators override by monkeypatching them and calling

    func:

    _reset_cache_for_tests.

  2. segment_frame_counts / segment_pts_indices → :class:LanceFrameIndex. The frame index is a sibling Lance table at

    func:

    ursa.layout.lance_frame_index_uri(uri) with columns (segment_idx, local_frame_idx, pts, timestamp). The original PyAV demux fallback survives for callers that open an MP4 without an index (tests, ad-hoc inspection); it is a transitional path slated for removal once every processed recording carries an index.

  3. to_hdf5 / from_hdf5 removed. Persistent state is the catalog

    class:

    ModalityRow + storage URI; HDF5 would just duplicate it.

  4. metadata: ModalityRow | None slot mirroring the rest of

    mod:

    ursa.temporal. .slice() propagates the same instance to the returned IrregularTimeSeries (identity-preserving — pinned by test).

Fork-safety: stream.thread_count = 1 + thread_type = "NONE" is kept verbatim — FFmpeg’s frame/slice worker threads deadlock in avcodec_free_context if the codec context was created in the parent and released in a forked child (PyTorch DataLoader(num_workers>0) defaults to fork on Linux). The constraint is read-side; the writer side imposes the same setting on its own PyAV containers.

PyAV is an optional dep (ursa[video]); construction raises a clear ImportError with the install hint if missing.

Module Contents

Classes

LazyVideo

Lazy-decoded video for StorageFormat.MP4_INDEX modalities.

LanceFrameIndex

Sibling Lance table at <mp4_uri>.lance describing one MP4’s frames.

API

class ursa.backends.video.LazyVideo(uri: str, *, store: ursa.store.base.ObjectStore, frame_index: ursa.backends.video.LanceFrameIndex | None = None, metadata: ursa.catalog.schemas.ModalityRow | None = None, segment_uris: Sequence[str] | None = None, resize: tuple[int, int] | None = None, colorspace: str = 'RGB', channel_format: str = 'NCHW')[source]

Lazy-decoded video for StorageFormat.MP4_INDEX modalities.

Construction is metadata-only when a :class:LanceFrameIndex is provided: no MP4 GET, no frame decode happens until the first

Meth:

slice call, which downloads the required segment(s) into the process-global LRU cache and decodes the requested frames via PyAV.

When frame_index=None the constructor falls back to PyAV-demuxing every segment up-front to recover frame counts + PTS tables; this path DOES download bytes and probe PTS at construction time. Use the Lance-frame-index path (the default through :meth:from_uri) for the metadata-only contract.

Seeks are keyframe-aligned (container.seek(pts, any_frame=False, backward=True)) and the decoder is then advanced frame-by-frame to the target PTS, so frames returned mid-GOP are bit-correct (no mmco: unref short failure corruption).

Picklable: no PyAV container/stream/reformatter handles are stored on self (they live as locals inside :meth:_load_frames), so a LazyVideo survives the pickle.dumps that DataLoader workers perform across the fork boundary.

Args: uri: r2:// URI of the MP4 (or the first segment if multi-segment). store: :class:ObjectStore to read from — typically :attr:DataInterface.assets_ro_store. frame_index: :class:LanceFrameIndex for the sibling sidecar, or None to PyAV-demux the segments lazily (test-only fallback; see module docstring). metadata: optional :class:ModalityRow to carry through to :meth:slice results. segment_uris: additional URIs for multi-segment videos. If absent, uri is the sole segment. resize: (height, width) to resize frames to, or None for original dimensions. colorspace: "RGB" or "G". channel_format: "NCHW" or "NHWC".

Initialization

property metadata: ursa.catalog.schemas.ModalityRow | None
__len__() int[source]
__repr__() str[source]
classmethod from_uri(uri: str, *, store: ursa.store.base.ObjectStore, metadata: ursa.catalog.schemas.ModalityRow | None = None) ursa.backends.video.LazyVideo[source]

Open uri as a lazy MP4 video.

Mirrors :meth:ursa.RegularTimeSeries.from_uri /

Meth:

ursa.LazyIrregularTimeSeries.from_uri so callers can use a uniform construction surface across backends. Resolves the sibling Lance frame index via :func:ursa.layout.lance_frame_index_uri; on a missing sidecar, falls back to demuxing the MP4 itself.

Video is intentionally lazy-only. An eager equivalent would decode every frame at construction, which defeats the point of MP4_INDEX for any realistic recording. DataInterface.materialize(..., lazy=False) therefore still returns a LazyVideo for video subfields.

Video isn’t dispatched through :class:_BackendOpeners because the regular/irregular split doesn’t fit per-frame video access.

Class:

ursa.DataInterface invokes this classmethod directly when ModalityRow.format == StorageFormat.MP4_INDEX.

classmethod concat(videos: Sequence[ursa.backends.video.LazyVideo]) ursa.backends.video.LazyVideo[source]

Concatenate segments of one logical video.

Single-recording only — cross-recording concat is not supported because aligned time domains across recordings are only established by the per-recording processed path.

Multi-video concat with frame indexes is also rejected: the receiver would have to merge LanceFrameIndex instances from each input, which the writer-side helper has not been built yet. Calling concat([a]) (single-video — i.e. metadata-only re-wrap) stays supported.

lazy_slice(start: float, end: float) ursa.backends.video.LazyVideo[source]

Return a new :class:LazyVideo windowed to [start, end) with no PyAV decode at call time.

The returned object is still a :class:LazyVideo carrying a recorded frame-range window; frames decode only on the next :meth:slice call, and only for the windowed range. _apply_time_window calls this form so that stream(time_range=…) never triggers a segment download or decode.

Implementation: np.searchsorted on the in-memory timestamp array to find idx_l:idx_r, then object.__new__ + attribute shallow- copy with the windowed timestamps, frame_indices, and frame_count substituted. All per-segment metadata (segment_frame_counts, segment_frame_offsets, _pts_cache, etc.) is shared by reference — these use global frame indices, so they remain correct for the windowed object’s :meth:_segment_for_frame lookups.

reset_origin is always False: the returned series keeps original recording-relative coordinates.

If start >= end or the window covers no frames, an empty

Class:

LazyVideo (frame_count=0) is returned rather than raising.

slice(start: float, end: float, reset_origin: bool = True) temporaldata.IrregularTimeSeries[source]

Return an :class:IrregularTimeSeries of decoded frames in [start, end) (end-exclusive).

reset_origin=True (default) shifts the returned timestamps to be relative to start; False keeps absolute camera time.

_segment_for_frame(frame_index: int) tuple[int, int][source]
_ensure_pts_table(segment_idx: int) numpy.ndarray[source]
_empty_frames_array() numpy.ndarray[source]
_resolve_segment_path(segment_idx: int) str[source]

Cache-resolve a segment URI to a local-fs path for PyAV.

_load_frames(frame_indices: numpy.ndarray) numpy.ndarray[source]

Decode the requested presentation-ordered frames.

Implementation notes (verbatim from Ian’s port — keep these):

  • Indices are sorted by (segment, local_index) so each segment is walked forward in presentation order; the only seeks are at segment boundaries (or when the caller passes a non-monotonic sequence).

  • Single-threaded decode (stream.thread_count = 1, stream.thread_type = "NONE") avoids the libavcodec frame/slice thread + os.fork() deadlock in avcodec_free_context. This shows up in any consumer that loads LazyVideo from a forked child (PyTorch DataLoader(num_workers>0) defaults to fork on Linux).

  • A single av.video.reformatter.VideoReformatter does colorspace + resize via libswscale.

class ursa.backends.video.LanceFrameIndex(dataset: lance.LanceDataset, uri: str)[source]

Sibling Lance table at <mp4_uri>.lance describing one MP4’s frames.

Schema (the writer side emits this; this module only reads):

.. code-block:: text

segment_idx: int64 # which underlying segment file local_frame_idx: int64 # 0..segment_frame_count-1 pts: int64 # PyAV PTS (codec-time-base units) timestamp: float64 # recording-relative seconds

Sorted by (segment_idx, local_frame_idx) (i.e. presentation order within each segment). Construction is metadata-only; per-segment PTS arrays are loaded lazily.

Rebuildable from the source MP4 via :func:_probe_segment as a fallback.

Initialization

classmethod open(uri: str, store: ursa.store.base.ObjectStore) ursa.backends.video.LanceFrameIndex[source]

Open the frame-index dataset. Lance pulls only metadata + footer.

segment_frame_counts() numpy.ndarray[source]

Frames per segment as an int64 ndarray of length n_segments.

pts_table(segment_idx: int) numpy.ndarray[source]

Return the PTS array for segment_idx in presentation order.

timestamps() numpy.ndarray[source]

Flat presentation-ordered timestamps across all segments.