Data Descriptors

Complete schema reference for Datafye Data Descriptors.

Overview

A DataSpec defines what market data your algo needs, including:

  • Which datasets to use (SIP, PrecisionAlpha, TotalView)

  • Which symbols to subscribe to

  • What tick and aggregate schemas to stream

  • How long to retain each type of data

  • What mode to operate in (live, paper, backtest)

Schema

Root Structure

apiVersion: datafye.io/v1
kind: DataSpec
metadata:
  name: <string>
  description: <string>
  requestedBy:
    actorType: user | algo
    actorId: <string>
mode: live | paper | backtest
datasets:
  - <dataset-object>

Field: apiVersion

  • Type: string

  • Required: Yes

  • Value: datafye.io/v1

  • Description: Schema version for this descriptor

Field: kind

  • Type: string

  • Required: Yes

  • Value: DataSpec

  • Description: Identifies this as a Data Descriptor

Field: metadata

  • Type: object

  • Required: Yes

  • Description: Metadata about this descriptor

metadata.name

  • Type: string

  • Required: Yes

  • Format: lowercase alphanumeric with hyphens (DNS-1123 subdomain)

  • Description: Unique identifier for this data specification

  • Example: my-trading-data, backtest-2024-q1

metadata.description

  • Type: string

  • Required: No

  • Description: Human-readable description of this data specification

  • Example: SIP trades and quotes for momentum strategy

metadata.requestedBy

  • Type: object

  • Required: No

  • Description: Actor requesting this data (used for audit and billing)

metadata.requestedBy.actorType

  • Type: string

  • Required: Yes (if requestedBy present)

  • Values: user | algo

  • Description: Type of actor requesting data

metadata.requestedBy.actorId

  • Type: string

  • Required: Yes (if requestedBy present)

  • Description: Unique identifier for the requesting actor

  • Example: user-123, algo-momentum-v2

Field: mode

  • Type: string

  • Required: Yes

  • Values: live | paper | backtest

  • Description: Operating mode for this data specification

Mode
Description

live

Real-time production trading

paper

Paper trading with live data streams

backtest

Historical data only, no live streams

Field: datasets

  • Type: array of dataset objects

  • Required: Yes

  • Description: List of datasets to provision

Dataset Object

Field: dataset

  • Type: string

  • Required: Yes

  • Values: SIP | PrecisionAlpha | TotalView | Synthetic

  • Description: Which dataset to use

Dataset
Description
Provider

SIP

US equities trades and quotes

Polygon

PrecisionAlpha

Alternative data signals

PrecisionAlpha

TotalView

NASDAQ Level 2 depth

NASDAQ

Synthetic

Synthetic market data for testing

Datafye

Field: provider

  • Type: string

  • Required: No (defaults based on dataset)

  • Description: Data provider for this dataset

Dataset
Default Provider

SIP

Polygon

PrecisionAlpha

PrecisionAlpha

TotalView

NASDAQ

Synthetic

Datafye

Field: symbols

  • Type: object

  • Required: Yes

  • Description: Which symbols to subscribe to

symbols.tickers

  • Type: array of strings

  • Required: No

  • Description: Explicit ticker symbols or wildcards

  • Examples:

    • ["AAPL", "GOOGL", "MSFT"] — Explicit list

    • ["NV*"] — Wildcard pattern (all tickers starting with NV)

    • ["*"] — All available symbols

symbols.universes

  • Type: array of strings

  • Required: No

  • Description: Pre-defined symbol universes

  • Values: SP500 | NDX100 | RUSSELL2000 | RUSSELL1000

Note: tickers and universes can be used together — the effective symbol list is the union of both.

Field: reference

  • Type: boolean

  • Required: No

  • Default: false

  • Description: Whether to provision reference data (security master, corporate actions, calendar)

  • Recommendation: Set to true for trading algos

Field: live

  • Type: object

  • Required: Yes (if mode is live or paper)

  • Description: Real-time data streams to subscribe to

live.ticks

  • Type: string (comma-separated schema list)

  • Required: No

  • Special values: none, all, *

  • Description: Which tick schemas to stream in real-time

  • Examples:

    • "trades" — Trades only

    • "trades,quotes" — Trades and quotes

    • "all" or "*" — All available tick schemas

    • "none" — No tick streams

live.aggregates

  • Type: string (comma-separated schema list)

  • Required: No

  • Special values: none

  • Description: Which aggregate schemas to stream in real-time

  • Examples:

    • "ohlc-1m" — 1-minute bars only

    • "ohlc-1m,ema-1m-20,vwap-1m" — Multiple aggregates

    • "none" — No aggregate streams

Field: history

  • Type: object

  • Required: No

  • Description: Historical data retention policies

history.ticks

  • Type: array of retention objects

  • Required: No

  • Description: Retention policy per tick schema

Retention object structure:

  • schema: Which tick schema (e.g., trades, quotes)

  • duration: How long to retain (e.g., 7d, 30d, P90D)

history.aggregates

  • Type: array of retention objects

  • Required: No

  • Description: Retention policy per aggregate schema

Retention object structure:

  • schema: Which aggregate schema (e.g., ohlc-1m, ema-1m-20)

  • duration: How long to retain (e.g., 30d, 180d, 1y)

history.reference

  • Type: object

  • Required: No

  • Description: Retention policy for reference data

Structure:

  • duration: How long to retain reference data (e.g., 365d, 1y)

Duration Format

Durations can be specified in two formats:

Format
Description
Example

Simple

<number><unit>

7d, 30d, 6m, 1y

ISO-8601

P<number><unit>

P7D, P6M, P1Y

Units:

  • d — Days

  • m — Months (30 days)

  • y — Years (365 days)

Special value: 0d means no retention (live only)

Tick Schemas by Dataset

SIP Tick Schemas

Schema
Description

trades

Trade ticks

quotes

NBBO quotes

Synthetic Tick Schemas

Schema
Description

trades

Synthetic trade ticks

quotes

Synthetic NBBO quotes

PrecisionAlpha Tick Schemas

Schema
Description

pa-1s

1-second signals

pa-1m

1-minute signals

pa-1d

Daily signals

TotalView Tick Schemas

Schema
Description

depth

Level 2 order book depth

Aggregate Schemas

Aggregate schemas are available across all datasets (applied to the dataset's tick data):

Schema
Description

ohlc-1s

1-second OHLC bars

ohlc-1m

1-minute OHLC bars

ohlc-1h

1-hour OHLC bars

ohlc-1d

1-day OHLC bars

ema-1m-20

1-minute 20-period EMA

ema-1m-50

1-minute 50-period EMA

vwap-1m

1-minute VWAP

signal-1m-12-26-9

MACD signal (12, 26, 9)

Pattern: <type>-<interval>-<params>

Complete Examples

Example 1: Basic Live Trading

Example 2: Multi-Dataset with PrecisionAlpha

Example 3: Backtest Configuration

Example 4: Wildcard Symbols

Validation Rules

Required Fields

  • apiVersion must be datafye.io/v1

  • kind must be DataSpec

  • metadata.name is required

  • mode is required

  • datasets must have at least one dataset

Mode-Specific Rules

  • live/paper: Must have live section with at least one tick or aggregate schema

  • backtest: Must have history section, live section ignored if present

Symbol Rules

  • Must specify at least one of tickers or universes

  • Wildcard * cannot be mixed with other ticker patterns

  • Empty arrays not allowed

Duration Rules

  • Must be positive value (except 0d for no retention)

  • Cannot exceed 3 years (3y or P1095D)

  • Tick retention typically shorter than aggregate retention

Schema Rules

  • Tick schemas must be valid for the chosen dataset

  • Aggregate schemas are universal across datasets

  • Cannot specify all for aggregates (only for ticks)

Last updated