ursa.catalog._arrow

PyArrow schemas for the nine catalog tables.

Each schema is hand-written rather than auto-generated from the Pydantic row class. Reasons:

  • Lance/Arrow type choices (timestamp("us", tz="UTC") vs the default timestamp("ns"); duration("us") for timedelta; list_(float64) for embedding vectors) are explicit and reviewable.

  • MetadataDict-typed columns serialize to JSON strings in M2 — see ENG-1066 for the M3 promotion to Lance MapType + hot-key columns.

  • Nested submodels (TimeWindow, EmbeddingSource) materialize as pa.struct with a fixed field set; extras are rejected at write time because Arrow struct columns are fixed-schema.

A drift test (tests/catalog/test_catalog_arrow_coverage.py) asserts every Pydantic field is covered, so adding a field to a row class without touching this module fails CI.

Module Contents

Functions

Data

API

ursa.catalog._arrow.__all__

[‘ARROW_SCHEMAS’, ‘EXTRAS_COLUMN’, ‘JSON_METADATA_COLUMNS’, ‘NULLABLE_JSON_METADATA_COLUMNS’, ‘PRIMA…

ursa.catalog._arrow.EXTRAS_COLUMN

pydantic_extra

ursa.catalog._arrow._TS_UTC

‘timestamp(…)’

ursa.catalog._arrow._DUR

‘duration(…)’

ursa.catalog._arrow._STR

‘string(…)’

ursa.catalog._arrow._F64

‘float64(…)’

ursa.catalog._arrow._I64

‘int64(…)’

ursa.catalog._arrow._VECTOR

‘list_(…)’

ursa.catalog._arrow.JSON_METADATA_COLUMNS: dict[str, frozenset[str]]

None

ursa.catalog._arrow.NULLABLE_JSON_METADATA_COLUMNS: dict[str, frozenset[str]]

None

ursa.catalog._arrow._TIME_WINDOW

‘struct(…)’

ursa.catalog._arrow._EMBEDDING_SOURCE

‘struct(…)’

ursa.catalog._arrow._DOMAIN_INTERVAL

‘struct(…)’

ursa.catalog._arrow._DOMAIN_INTERVALS

‘list_(…)’

ursa.catalog._arrow._PARTICIPANT_IDS

‘list_(…)’

ursa.catalog._arrow._extras_field() pyarrow.Field[source]
ursa.catalog._arrow.ARROW_SCHEMAS: dict[str, pyarrow.Schema]

None

ursa.catalog._arrow.ROW_CLASSES

None

ursa.catalog._arrow.PRIMARY_KEYS: dict[str, list[str]]

None