Data APIs and Datasets
Understanding how datasets relate to the REST and WebSocket APIs is key to effectively using Datafye in own-container scenarios. This page explains how the API structure (organized by asset class) works with your deployment structure (organized by datasets).
The Orthogonal Design
The Datafye REST and WebSocket APIs are organized by asset class (stocks, crypto, etc.), while your deployment contains specific datasets (SIP, TotalView, PrecisionAlpha, etc.). These two structures work together but are orthogonal to each other — meaning they're independent organizational systems that intersect through dataset routing.
API Structure (Asset Class Based)
APIs are organized by asset class and data category:
http://api.rumi.local/datafye-api/v1/<assetClass>/<category>/<path>Examples:
/stocks/live/trades/lasttrade- Get last trade for stocks/stocks/reference/securities- Get security master for stocks/stocks/history/ohlcs- Get historical OHLC bars for stocks/crypto/live/trades/lasttrade- Get last trade for crypto
This structure is consistent regardless of which datasets you've deployed. The API surface remains the same whether you have one dataset or ten.
Deployment Structure (Dataset Based)
Your deployment consists of one or more datasets, where each dataset contains four types of services (see Datasets for details):
Reference - Security master and static metadata
Live Ticks - Real-time tick-level market data
Live Aggregates - Real-time pre-computed analytics
Historical - Historical data storage and retrieval
For example, if you deploy SIP and TotalView datasets for stocks, you have:
How They Work Together
The API uses the asset class to determine which endpoints are available, and the dataset parameter to route requests to the appropriate dataset service within that asset class.
Dataset Routing
Most API endpoints accept a dataset parameter that bridges these two structures. The API routes your request to the appropriate service within the specified dataset.
Example: Multiple Datasets
Let's say your deployment has both SIP and Nasdaq TotalView datasets running:
The same endpoint (/stocks/live/trades/lasttrade) can serve data from different datasets. The API internally routes to:
SIP's live ticks service when
dataset=SIPTotalView's live ticks service when
dataset=TotalView
Example: Different Categories
The routing applies across all API categories:
Why This Design?
This orthogonal design provides several benefits:
1. Consistent API Surface
The API endpoints remain the same regardless of which datasets you've deployed. Your code structure doesn't change when you:
Add new datasets to your deployment
Switch providers for a dataset (e.g., Polygon → Alpaca for SIP)
Deploy different dataset combinations for dev vs prod
2. Dataset Flexibility
You can easily switch between datasets without restructuring your code:
3. Multi-Dataset Support
Query different datasets for the same symbol to:
Compare data quality across providers
Implement failover logic (primary dataset → backup dataset)
Use specialized datasets for specific symbols (e.g., TotalView for high-frequency, SIP for others)
4. Provider Independence
The API abstracts away which provider serves a dataset. You specify:
In your DataSpec:
SIPdataset provided byPolygonIn your API calls:
dataset=SIP
Your code doesn't need to know that SIP comes from Polygon. If you later switch to Alpaca for SIP, your API calls remain unchanged.
Default Dataset Behavior
If you omit the dataset parameter, the API applies default logic:
Single Dataset Deployed
If only one dataset for that asset class is deployed, it's used automatically:
Multiple Datasets Deployed
If multiple datasets are deployed, the API returns an error asking you to specify which dataset:
You must be explicit:
Best Practices
1. Always Specify Dataset in Multi-Dataset Deployments
Even if you primarily use one dataset, explicitly specify it:
2. Parameterize Dataset Selection
Make dataset selection configurable:
3. Document Dataset Dependencies
Document which datasets your algo requires:
4. Handle Dataset-Specific Schemas
Different datasets may have different schemas for the same data type:
Real-World Scenarios
Scenario 1: Development vs Production
Use different datasets for different environments:
Scenario 2: Dataset Comparison
Compare data quality across datasets:
Scenario 3: Specialized Dataset Usage
Use different datasets for different symbols:
Related Concepts
Datasets - Understanding what datasets are and how they're structured
Data Access Modes - The two API mechanisms and data delivery modes
Data Descriptors - How to configure datasets in your deployment
What is Datafye? - Overall platform architecture
Last updated: 2025-10-14
Last updated

