The Datafye CLI
The Datafye CLI (datafye) is the command-line tool you use to provision, manage, and control your Datafye environments. It's the primary interface for creating private cloud deployments, whether you're setting up a development Foundry or a production Trading Environment.
New to Datafye deployments? Start with Private Cloud Deployments to understand deployment models, then review Descriptors to learn about configuration files.
What the CLI Does
The Datafye CLI bridges the gap between your configuration (descriptors) and your infrastructure (private cloud deployments). Here's the flow:
You create descriptors — YAML or JSON files specifying what data, algos, and broker connections you need
You run CLI commands — The CLI reads your descriptors and provisions the infrastructure
Datafye creates your environment — A complete, isolated deployment with the services you specified
The CLI handles all the complexity: spinning up services, configuring networking, connecting to data providers, deploying containers, and ensuring everything works together.
Core Workflow
Every Datafye environment follows the same lifecycle, managed through CLI commands:
1. Provision
Create a new environment from descriptors. The command structure varies by deployment environment:
# Standalone Foundry (single deployment per machine)
datafye foundry local provision --descriptor foundry.yaml
# Distributed Foundry on AWS (requires name and cloud-specific params)
datafye foundry aws provision \
--name my-dev-foundry \
--descriptor foundry.yaml \
--profile my-aws-profile \
--cidr 10.0.0.0/16
# Standalone Trading environment
datafye trading local provision --descriptor trading.yaml
# Distributed Trading environment on AWS
datafye trading aws provision \
--name production-trading \
--descriptor trading.yaml \
--profile my-aws-profile \
--cidr 10.1.0.0/16What happens during provisioning:
CLI validates your descriptors (schema, semantics, dependencies)
Infrastructure is allocated based on your deployment model (Standalone, Distributed Self-managed, or Distributed Datafye Managed)
Services are deployed (Data Cloud, Algo Container, Backtesting Engine, Broker Connector — depending on scenario)
Network connectivity is established
Data providers and brokers are connected using your credentials
Health checks verify everything is running correctly
Provisioning time:
Local deployments: 5-10 minutes depending on internet connection and system resources
AWS deployments: 10-20 minutes
Do not interrupt the provisioning process. If provisioning fails, deprovision and try again.
2. Start
Start a provisioned environment that's currently stopped:
Starting an environment is much faster than provisioning (typically under a minute) because the infrastructure already exists — services just need to resume.
3. Stop
Stop a running environment without deleting it:
Stopping preserves your environment configuration and data while freeing up compute resources. This is useful for:
Saving costs when you're not actively using the environment
Performing maintenance or updates
Pausing development work
4. Deprovision
Completely remove an environment and all its resources:
Warning: Deprovisioning is irreversible. All deployment data will be permanently deleted, including databases, configurations, and all infrastructure. Ensure you have backed up any critical data before proceeding.
How the CLI Uses Descriptors
The CLI's power comes from its descriptor-driven approach. Instead of remembering dozens of command-line flags and options, you declare what you want in a descriptor file.
Descriptor Input Methods
Single combined descriptor:
Where complete.yaml contains data, algo, and broker sections:
Separate descriptor files:
This modular approach lets you:
Reuse common descriptors across environments (e.g., same data descriptor for dev and prod)
Version control each descriptor type independently
Share descriptor fragments with team members
Learn more: See Descriptors for complete details on descriptor format and organization.
Deployment Model Selection
The CLI uses environment-specific commands to handle the differences between deployment models. The environment is specified as part of the command (e.g., local, aws, azure, gcp), and each environment accepts different parameters.
Note: Developers can provision Standalone (local) and Distributed Self-managed (aws/azure/gcp) deployments using the CLI. Distributed Datafye Managed deployments (required for live trading) can only be provisioned by Datafye Ops. To get a managed deployment, contact Datafye to request provisioning.
Standalone Deployment
All services run on your local machine (or a single cloud VM). Commands use the local environment:
Key difference: Local deployments do not use a --name parameter since only one deployment is allowed per machine. You provide descriptor file(s) to specify what to provision.
See CLI Command Reference for complete parameter lists.
The CLI:
Uses Docker to run all services as containers on your machine
Sets up Docker networks for inter-service communication
Binds ports to localhost for API access
Manages container lifecycle (start, stop, health checks)
Best for: Development, learning, paper trading on local hardware
Distributed Self-managed
Services spread across multiple machines in your cloud account. Commands use cloud provider names (aws, azure, gcp):
Key differences: Distributed deployments require:
--name— A unique name for the deployment (multiple deployments can exist in the same cloud account)--profile— The AWS CLI profile to use for provisioningDescriptor file(s) specifying what to provision
Optional cloud-specific parameters like
--regionand--cidr
See CLI Command Reference for complete parameter lists.
The CLI:
Provisions cloud resources (VMs, networking, storage) in your account
Deploys services across multiple machines for scalability
Sets up private networking (VPCs, subnets, security groups)
Configures monitoring and logging
Uses native cloud infrastructure (no Docker/containerization)
Best for: Production-scale development, paper trading with cloud resources, teams with cloud expertise
Distributed Datafye Managed
Services spread across multiple machines in a Datafye-operated cloud account, provisioned and managed by Datafye Ops:
What Datafye Ops does:
Provisions infrastructure in a dedicated account under Datafye's root
Deploys services across multiple machines for scalability and resilience
Handles all operations, monitoring, updates, and compliance
Uses native cloud infrastructure (no Docker/containerization)
Configures for live trading requirements when applicable
Best for: Live trading (required for live), production environments, teams wanting fully managed infrastructure
To request: Contact Datafye with your descriptors and deployment requirements.
Learn more: See Private Cloud Deployments for detailed comparison of deployment models.
Scenario-Specific Provisioning
Different scenarios require different combinations of descriptors:
Foundry: Data Cloud Only
Descriptors required: Data only
Standalone:
Distributed (AWS example):
What gets provisioned: Data Cloud services for historical and live replay data.
You bring: Your own algo containers that connect to Data Cloud APIs.
Foundry: Full Stack
Descriptors required: Data + Algo
Standalone:
Distributed (AWS example):
What gets provisioned: Data Cloud, Algo Container runtime, Backtesting Engine, MCP Server for AI-assisted vibe coding.
You bring: Nothing — complete stack provided by Datafye.
Trading: Data Cloud + Broker
Descriptors required: Data + Broker
For paper trading - Standalone:
For paper trading - Distributed Self-managed (AWS example):
For live trading: Contact Datafye Ops to provision a Distributed Datafye Managed deployment.
What gets provisioned: Data Cloud services for live market data, Broker Connector for trade execution.
You bring: Your own algo containers that connect to Data Cloud and Broker Connector.
Trading: Full Stack
Descriptors required: Data + Algo + Broker
For paper trading - Standalone:
For paper trading - Distributed Self-managed (AWS example):
For live trading: Contact Datafye Ops to provision a Distributed Datafye Managed deployment.
What gets provisioned: Data Cloud, Algo Container runtime, Broker Connector, MCP Server for AI-assisted vibe coding.
You bring: Nothing — complete trading stack provided by Datafye.
Learn more: See Choose Your Path for detailed scenario descriptions.
CLI Configuration
The CLI stores configuration in ~/.datafye/config.yaml:
This configuration provides defaults so you don't have to specify common options on every command.
Environment Variables
The CLI supports environment variable substitution in descriptors, keeping secrets out of version control:
In your descriptor:
When you run the CLI:
The CLI reads environment variables and substitutes values before provisioning.
Validation and Error Handling
Before provisioning, the CLI validates your descriptors to catch errors early:
Validation Checks
Schema validation:
All required fields present
Correct data types (strings, numbers, arrays)
Values within valid ranges
Semantic validation:
Referenced resources exist (container images, data symbols)
Combinations are valid (e.g., can't request live trading in Standalone deployment)
Resource requests are feasible (enough memory, valid CPU counts)
Dependency validation:
Data providers support requested asset classes
Broker supports requested permissions
Container images are accessible from the deployment environment
Error Messages
When validation fails, the CLI provides clear, actionable error messages:
This prevents provisioning failures and saves time troubleshooting.
Status and Monitoring
Check the status of your environments:
Status output shows:
Environment ID and name
Deployment model (Standalone, Distributed Self-managed, Distributed Datafye Managed)
Current state (provisioning, running, stopped, failed)
Services running (Data Cloud, Algo Container, etc.)
Resource usage (CPU, memory, network)
Health check status
Common Workflows
Development Workflow
Create minimal descriptors for fast iteration
Provision Standalone deployment on your laptop
Develop and test your algo locally
Stop environment when not in use to save resources
Testing Workflow
Create full descriptors with production data
Provision Distributed Self-managed for cloud resources
Run comprehensive backtests with parallelization
Validate results and optimize parameters
Deprovision when testing is complete
Production Workflow
Create production descriptors with live data and broker
For paper trading: Provision Standalone or Distributed Self-managed to validate
For live trading: Contact Datafye Ops to provision Distributed Datafye Managed deployment
Datafye Ops handles all operations and monitoring for managed environments
Prerequisites and Installation
The Datafye CLI requires Java 17+ and Docker (for local deployments). For AWS deployments, you'll also need the AWS CLI v2 configured with your credentials.
Complete installation guide: See CLI Installation for detailed prerequisites, installation instructions, and troubleshooting.
Command Reference
This page provides a conceptual overview of the CLI. For detailed command syntax, parameters, and examples, see the CLI Command Reference.
Next Steps
Install the CLI — Follow CLI Installation for step-by-step installation instructions
Understand deployment models — Read Private Cloud Deployments to understand Standalone, Distributed Self-managed, and Distributed Datafye Managed deployments
Learn about descriptors — Review Descriptors to understand how to configure your environments
Get started with a scenario — Choose your path in Choose Your Path
See command details — Browse CLI Command Reference for specific command syntax
Last updated: 2025-10-22
Last updated

