Module catalog

Module catalog 

Source
Expand description

Parquet data catalog for efficient storage and retrieval of financial market data.

This module provides a comprehensive data catalog implementation that uses Apache Parquet format for storing financial market data with object store backends. The catalog supports various data types including quotes, trades, bars, order book data, and other market events.

§Key Features

  • Object Store Integration: Works with local filesystems, S3, and other object stores.
  • Data Type Support: Handles all major financial data types (quotes, trades, bars, etc.).
  • Time-based Organization: Organizes data by timestamp ranges for efficient querying.
  • Consolidation: Merges multiple files to optimize storage and query performance.
  • Validation: Ensures data integrity with timestamp ordering and interval validation.

§Architecture

The catalog organizes data in a hierarchical structure:

data/
├── quotes/
│   └── INSTRUMENT_ID/
│       └── start_ts-end_ts.parquet
├── trades/
│   └── INSTRUMENT_ID/
│       └── start_ts-end_ts.parquet
└── bars/
    └── INSTRUMENT_ID/
        └── start_ts-end_ts.parquet

§Usage

use std::path::PathBuf;
use nautilus_persistence::backend::catalog::ParquetDataCatalog;

// Create a new catalog
let catalog = ParquetDataCatalog::new(
    PathBuf::from("/path/to/data"),
    None,        // storage_options
    Some(5000),  // batch_size
    None,        // compression (defaults to SNAPPY)
    None,        // max_row_group_size (defaults to 5000)
);

// Write data to the catalog
// catalog.write_to_parquet(data, None, None)?;

Structs§

ParquetDataCatalog
A high-performance data catalog for storing and retrieving financial market data using Apache Parquet format.

Traits§

CatalogPathPrefix
Trait for providing catalog path prefixes for different data types.

Functions§

are_intervals_contiguous
Checks if intervals are contiguous (adjacent with no gaps).
are_intervals_disjoint
Checks if a list of closed integer intervals are all mutually disjoint.
extract_identifier_from_path
Extracts the identifier from a file path.
extract_path_components
Extracts path components using platform-appropriate path parsing.
extract_sql_safe_filename
Extracts the filename from a file path and makes it SQL-safe.
local_to_object_store_path
Converts a local PathBuf to an object store path string.
make_local_path
Creates a platform-appropriate local path using PathBuf.
make_object_store_path
Creates an object store path using forward slashes.
make_object_store_path_owned
Creates an object store path using forward slashes with owned strings.
make_sql_safe_identifier
Makes an identifier safe for use in SQL table names.
parse_filename_timestamps
Parses timestamps from a Parquet filename.
timestamps_to_filename
Converts timestamps to a filename using ISO 8601 format.