ursa.catalog._arrow¶
PyArrow schemas for the nine catalog tables.
Each schema is hand-written rather than auto-generated from the Pydantic row class. Reasons:
Lance/Arrow type choices (
timestamp("us", tz="UTC")vs the defaulttimestamp("ns");duration("us")fortimedelta;list_(float64)for embedding vectors) are explicit and reviewable.MetadataDict-typed columns serialize to JSON strings in M2 — see ENG-1066 for the M3 promotion to LanceMapType+ hot-key columns.Nested submodels (
TimeWindow,EmbeddingSource) materialize aspa.structwith a fixed field set; extras are rejected at write time because Arrow struct columns are fixed-schema.
A drift test (tests/catalog/test_catalog_arrow_coverage.py) asserts
every Pydantic field is covered, so adding a field to a row class
without touching this module fails CI.
Module Contents¶
Functions¶
Data¶
API¶
- ursa.catalog._arrow.__all__¶
[‘ARROW_SCHEMAS’, ‘EXTRAS_COLUMN’, ‘JSON_METADATA_COLUMNS’, ‘NULLABLE_JSON_METADATA_COLUMNS’, ‘PRIMA…
- ursa.catalog._arrow.EXTRAS_COLUMN¶
‘pydantic_extra’
- ursa.catalog._arrow._TS_UTC¶
‘timestamp(…)’
- ursa.catalog._arrow._DUR¶
‘duration(…)’
- ursa.catalog._arrow._STR¶
‘string(…)’
- ursa.catalog._arrow._F64¶
‘float64(…)’
- ursa.catalog._arrow._I64¶
‘int64(…)’
- ursa.catalog._arrow._VECTOR¶
‘list_(…)’
- ursa.catalog._arrow.JSON_METADATA_COLUMNS: dict[str, frozenset[str]]¶
None
- ursa.catalog._arrow.NULLABLE_JSON_METADATA_COLUMNS: dict[str, frozenset[str]]¶
None
- ursa.catalog._arrow._TIME_WINDOW¶
‘struct(…)’
- ursa.catalog._arrow._EMBEDDING_SOURCE¶
‘struct(…)’
- ursa.catalog._arrow._DOMAIN_INTERVAL¶
‘struct(…)’
- ursa.catalog._arrow._DOMAIN_INTERVALS¶
‘list_(…)’
- ursa.catalog._arrow._PARTICIPANT_IDS¶
‘list_(…)’
- ursa.catalog._arrow.ARROW_SCHEMAS: dict[str, pyarrow.Schema]¶
None
- ursa.catalog._arrow.ROW_CLASSES¶
None
- ursa.catalog._arrow.PRIMARY_KEYS: dict[str, list[str]]¶
None