Scorecarding

Scorecarding is the systematic evaluation of algo performance using industry-standard metrics and risk-adjusted return measures. In institutional trading, scorecarding is table stakes — it's how professionals assess whether a strategy is worth deploying capital.

Datafye provides comprehensive scorecarding for all algos developed in the Full Stack Foundry, ensuring that strategies are rigorously evaluated before they see real money.

Why Scorecarding Matters

Objective Performance Assessment

Scorecarding provides objective, quantitative measures of algo performance:

  • No guesswork — Clear metrics tell you if a strategy works

  • Apples-to-apples comparisons — Compare different strategies fairly

  • Risk-adjusted returns — Understand returns relative to risk taken

  • Industry standards — Use the same metrics as professional funds

Without scorecarding, you're flying blind — you might see profits but not understand the risks you took to achieve them.

Risk Management

Scorecards reveal the risk characteristics of your strategy:

  • Maximum drawdown — Worst peak-to-trough decline

  • Volatility — How much returns fluctuate

  • Tail risk — Probability of extreme losses

  • Correlation — How the strategy behaves relative to markets

Understanding risk is crucial for position sizing, capital allocation, and avoiding catastrophic losses.

Strategy Comparison

Scorecarding enables informed decision-making:

  • Which algo to trade? — Deploy capital to the best strategies

  • How to allocate? — Size positions based on risk-adjusted returns

  • When to stop trading? — Detect when a strategy stops working

  • Portfolio construction — Combine strategies for optimal diversification

Marketplace Requirement

For developers planning to publish to the Datafye Marketplace:

  • Scorecards are required — Investors need objective performance data

  • Standard format — All marketplace algos use the same scoring framework

  • Transparency — Investors can compare algos fairly

  • Quality bar — Minimum score thresholds ensure marketplace quality

Key Performance Metrics

Datafye scorecards provide comprehensive trading performance analysis through 18 key metrics organized into five categories. These metrics are automatically calculated during backtesting, paper trading, and live trading, giving you consistent performance measurement across all stages of strategy development and deployment.

The current scorecard is optimized for day-trading style strategies where positions are cleanly entered and exited. A different scorecard for long-term strategies (such as portfolios that are periodically rebalanced) may be introduced in the future.

Datafye scorecards include the following categories of metrics:

Performance Metrics

These metrics provide a high-level view of your strategy's profitability and risk over the trading period:

Trading Days (tradingDays)

  • The number of calendar days when the market was open during the analyzed period

  • Counts full or partial trading days, excluding weekends and holidays

  • Used to normalize other frequency metrics

Cumulative P/L (cumulativePL)

  • Total profit or loss over the entire trading period

  • Calculated as: Current portfolio value - Starting portfolio value

  • Portfolio value = Cash + Market value of positions (using previous close if market is closed)

  • Important: Scorecards currently require no cash additions or removals during the scored period

Max Drawdown Amount (maxDrawdownAmount)

  • The largest observed loss from a portfolio value peak to a subsequent trough before a new peak is achieved

  • Calculated by tracking the highest portfolio value seen (peak), then measuring the largest decline before the next peak

  • Expressed in currency units (e.g., $2,156.88)

  • Critical metric for understanding worst-case losses

Max Drawdown Percent (maxDrawdownPercent)

  • The maximum drawdown as a percentage of the peak portfolio value

  • Calculated as: (maxDrawdownAmount / Portfolio value at peak) × 100

  • Easier to interpret than absolute drawdown for comparing strategies of different sizes

  • Example: 42.15% means the strategy lost 42% from its peak value at worst

Risk-Adjusted Return (riskAdjReturn)

  • How much profit was generated for every dollar of maximum risk endured

  • Calculated as: cumulativePL / maxDrawdownAmount

  • Higher values indicate better risk-adjusted performance

  • Example: 1.98 means you earned $1.98 for every $1 of maximum drawdown risk

Trade Frequency and Volume

These metrics describe how often your strategy trades:

Total Trades (totalTrades)

  • Count of trades made over the trading period

  • A "trade" is defined as a paired entry and exit execution

  • Important: If a position has multiple entries but a single exit, these count as multiple trades (each entry pairs with the exit)

  • Helps assess statistical significance and trading costs

Average Trades Per Week (avgTradesPerWeek)

  • Average number of trades per week

  • Calculated as: (totalTrades / tradingDays) × 5

  • Helps classify strategy type (day trading, swing trading, position trading)

  • Higher frequency = higher transaction costs and slippage impact

Win/Loss Statistics

These metrics break down your trading success rate:

Winning Trades (winningTrades)

  • Count of trades that generated a profit

  • Each trade's profit/loss is calculated from the paired entry and exit executions

  • Breakeven trades (exact $0 P/L) are not counted as winning trades

Losing Trades (losingTrades)

  • Count of trades that generated a loss

  • Breakeven trades are not counted as losing trades

  • Note: winningTrades + losingTrades may be less than totalTrades if there are breakeven trades

Win Rate (winRate)

  • Percentage of trades that were profitable

  • Calculated as: (winningTrades / totalTrades) × 100

  • Example: 72.1% means 72.1% of trades were winners

  • Important: High win rate doesn't guarantee profitability—must be evaluated with profit/loss ratios

Profit and Loss Analysis

These metrics quantify the profitability of your trades:

Gross Profit (grossProfit)

  • Total profit generated by all winning trades summed together

  • Only includes profitable trades (winners)

  • Example: $19,613.25 in total profits from all winners

Gross Loss (grossLoss)

  • Total loss incurred by all losing trades summed together

  • Only includes unprofitable trades (losers)

  • Typically shown as a negative number (e.g., -$8,840.00)

Profit Factor (profitFactor)

  • Ratio of gross profits to gross losses

  • Calculated as: grossProfit / |grossLoss|

  • Values > 1.0 indicate net profitability

  • Example: 2.12 means you made $2.12 in profits for every $1.00 in losses

  • Important: If there are no losing trades (grossLoss = 0), profitFactor is undefined

Profit Per Trade (profitPerTrade)

  • Average profit per trade across all trades

  • Calculated as: grossProfit / totalTrades

  • Includes both winning and losing trades in the average

  • Example: $5.74 average profit per trade

Average Profit Trade (avgProfitTrade)

  • Average profit for winning trades only

  • Calculated as the mean profit across all winning trades

  • Example: $18.95 average profit when the trade wins

Average Losing Trade (avgLosingTrade)

  • Average loss for losing trades only

  • Calculated as the mean loss across all losing trades

  • Typically shown as a negative number (e.g., -$22.10)

  • Compared with avgProfitTrade to understand risk/reward per trade

Extreme Values

These metrics identify your best and worst trades:

Best Trade (bestTrade)

  • The single most profitable trade

  • Maximum per-trade profit across all winning trades

  • Example: $298.45 for the best-performing trade

  • Helps identify if strategy has occasional big wins

Worst Trade (worstTrade)

  • The single largest losing trade

  • Minimum (deepest loss) across all losing trades

  • Typically shown as a negative number (e.g., -$445.20)

  • Critical for understanding tail risk and maximum single-trade loss

Scorecard Generation

During Backtesting

In Full Stack Foundry, scorecards are automatically generated when you run backtests. The Backtesting Engine continuously updates metrics as your algo executes against historical data:

  1. Run backtest — Execute your algo against historical data using the Backtesting Engine

  2. Track events — Datafye monitors all trades (entries and exits) and portfolio value changes

  3. Update metrics — Scores are calculated incrementally as events occur

  4. Generate report — Complete scorecard is produced when backtest completes

When Metrics Are Calculated

The scorecard metrics are updated at specific events during backtesting, paper trading, and live trading:

When a trade is entered or exited:

  • Calculate profit/loss on the completed trade(s)

  • Update trade counts: totalTrades, winningTrades, losingTrades

  • Update profit/loss totals: grossProfit, grossLoss

  • Update derived metrics: winRate, profitFactor, avgProfitTrade, avgLosingTrade, profitPerTrade

  • Update extremes if applicable: bestTrade, worstTrade

  • Accumulate into cumulativePL

  • If trade was a loss, update maxDrawdownAmount and maxDrawdownPercent

  • Update riskAdjReturn

At the end of each trading day:

  • Increment tradingDays counter

  • Recalculate avgTradesPerWeek

Optional - During the trading day (for open positions):

  • Mark-to-market P/L on open positions can be tracked

  • These unrealized P/L values are temporary and recalculated as market prices change

  • When position closes, unrealized P/L is replaced by final realized P/L

Example Scorecard

Here's what a complete scorecard looks like in JSON format:

Scorecard Components

A complete Datafye scorecard includes:

Performance Summary

  • Key metrics at a glance

  • Overall score/rating [Does Datafye provide an overall score?]

  • Pass/fail criteria for marketplace submission

Equity Curve

  • Visual representation of account value over time

  • Drawdown visualization

  • [Chart format: interactive? static?]

Returns Distribution

  • Histogram of returns

  • Normal distribution overlay

  • Tail analysis

Rolling Metrics

  • Sharpe ratio over time windows

  • Rolling volatility

  • [Window sizes used?]

Monthly Returns Heatmap

  • Performance by month and year

  • Seasonal patterns

  • Consistency visualization

Trade Analysis

  • Win/loss distribution

  • Hold time distribution

  • Profit/loss by trade

Drawdown Analysis

  • All drawdowns ranked

  • Duration and recovery

  • Underwater equity chart

Interpreting Scorecards

Understanding what your scorecard metrics mean is crucial for evaluating strategy quality and making informed trading decisions.

Key Metric Relationships

The power of scorecarding comes from understanding how metrics interact:

Profitability Triangle:

  • cumulativePL shows total profit, but doesn't reveal risk taken

  • maxDrawdownPercent shows risk endured

  • riskAdjReturn combines both: higher values mean better risk-adjusted profits

Win Rate vs. Profit Factor:

  • High winRate (e.g., 70%+) doesn't guarantee profitability if losses are larger than wins

  • profitFactor must be > 1.0 for overall profitability

  • Best strategies combine decent win rate with strong profit factor

  • Example: 72% win rate + 2.12 profit factor = profitable strategy with good consistency

Average Win vs. Average Loss:

  • Compare avgProfitTrade to avgLosingTrade (absolute value)

  • Ideally avgProfitTrade > |avgLosingTrade| for positive expectancy

  • If avgProfitTrade < |avgLosingTrade|, you need higher win rate to be profitable

Good vs. Poor Metrics

profitFactor:

  • < 1.0: Losing strategy (gross losses exceed gross profits)

  • 1.0 - 1.5: Marginal (barely profitable after transaction costs)

  • 1.5 - 2.0: Good (solid profitability)

  • > 2.0: Excellent (strong edge)

  • Warning: If undefined (no losing trades), strategy may be undertested or overfitted

maxDrawdownPercent:

  • < 15%: Excellent risk control

  • 15% - 25%: Good, manageable risk

  • 25% - 40%: Acceptable if returns justify it

  • > 40%: High risk, requires strong justification

  • Important: Consider your personal risk tolerance and capital size

winRate:

  • Varies significantly by strategy type

  • Day trading: 55-70% typical

  • Mean-reversion: 60-75% typical

  • Trend-following: 35-50% typical (fewer wins, but larger average wins)

  • Key insight: Win rate alone doesn't determine profitability

riskAdjReturn:

  • < 1.0: Poor (earned less than maximum risk endured)

  • 1.0 - 2.0: Acceptable

  • 2.0 - 3.0: Good

  • > 3.0: Excellent risk-adjusted performance

Red Flags

Watch out for these warning signs that may indicate problems with your strategy:

Too-Good-To-Be-True Metrics:

  • winRate > 85% (suspiciously high, may indicate overfitting or look-ahead bias)

  • profitFactor > 5.0 (exceptional, verify strategy logic carefully)

  • maxDrawdownPercent < 5% with high returns (unusual, check for issues)

  • Undefined profitFactor (no losing trades suggests insufficient testing or overfitting)

High Risk Indicators:

  • maxDrawdownPercent > 50% (very high risk, most traders can't tolerate this)

  • Large difference between bestTrade and typical avgProfitTrade (may depend on rare outliers)

  • Large difference between worstTrade and typical avgLosingTrade (tail risk concern)

  • avgLosingTrade > avgProfitTrade with winRate < 60% (negative expectancy)

Insufficient Statistical Significance:

  • totalTrades < 30 (too few trades for reliable statistics)

  • tradingDays < 60 (insufficient time period, needs more data)

  • Very low avgTradesPerWeek (< 1.0) combined with short backtest (limited data points)

Execution Concerns:

  • Very high avgTradesPerWeek (> 100) suggests high-frequency strategy with significant slippage/commission impact

  • profitPerTrade too small relative to expected transaction costs (strategy may not be profitable after fees)

Marketplace Thresholds

For algos to be accepted in the Datafye Marketplace, they must meet minimum quality standards based on scorecard metrics:

Note: Final marketplace thresholds are subject to change. Contact Datafye for current requirements.

Preliminary minimum requirements:

  • profitFactor ≥ 1.5 (demonstrates clear edge)

  • maxDrawdownPercent ≤ 40% (manageable risk for investors)

  • tradingDays ≥ 120 (sufficient historical validation)

  • totalTrades ≥ 50 (adequate statistical significance)

  • cumulativePL > 0 (net profitable over period)

Preferred characteristics:

  • riskAdjReturn > 2.0 (strong risk-adjusted performance)

  • winRate between 40% and 85% (reasonable range, not suspiciously high or low)

  • Consistent performance across different market conditions

  • Reasonable avgTradesPerWeek (not excessively high-frequency)

  • Clean trade distribution (no extreme dependence on single outlier trades)

Comparing Algos

Scorecards enable systematic comparison:

Within Your Portfolio

Rank by Sharpe Ratio

  • Identify best risk-adjusted performers

  • Allocate more capital to top Sharpe strategies

Diversification Analysis

  • [Does Datafye provide correlation matrices?]

  • Combine low-correlation strategies

  • Portfolio-level metrics

Risk Budgeting

  • Allocate risk based on volatility

  • Ensure no single strategy dominates risk

Marketplace Algos

Filter by metrics

  • [Can users filter marketplace by scorecard metrics?]

  • Set minimum thresholds

  • Sort by preferred metrics

Side-by-side comparison

  • [Compare multiple algos simultaneously?]

  • Visual comparison charts

  • Aggregated portfolio impact

Walk-Forward and Out-of-Sample Analysis

Robust scorecarding includes validation beyond in-sample backtesting:

Walk-Forward Analysis

How it works:

  1. Optimize on period 1 (in-sample)

  2. Test on period 2 (out-of-sample)

  3. Roll forward and repeat

  4. Aggregate out-of-sample results

Why it matters:

  • Reveals if optimization overfits

  • More realistic performance expectations

  • Detects regime changes

[Does Datafye's backtesting engine support walk-forward analysis?]

Out-of-Sample Validation

Best practices:

  • Hold out 20-30% of data for out-of-sample testing

  • Never optimize on out-of-sample data

  • Compare in-sample vs. out-of-sample metrics

  • Significant degradation suggests overfitting

[How does Datafye facilitate out-of-sample validation?]

Scorecarding Best Practices

During Development

1. Start with long backtests

  • [Minimum recommended period?]

  • Include multiple market regimes

  • Bull markets, bear markets, sideways markets

2. Focus on risk-adjusted returns

  • Don't chase raw returns

  • High Sharpe is better than high return with high risk

  • Sustainable strategies have acceptable risk profiles

3. Monitor trade count

  • Too few = not statistically significant

  • Too many = potential overfitting

  • [Recommended range?]

4. Check consistency

  • Performance should be relatively stable over time

  • Large variations suggest luck or overfitting

Before Trading

1. Paper trade first

  • Verify scorecard with live (paper) data

  • Look for degradation from backtest

  • Small degradation is normal (slippage, real-world conditions)

  • Large degradation is a red flag

2. Start small

  • Don't bet the farm on backtested results

  • Scale up as forward performance validates backtest

  • Use kelly criterion or similar for position sizing

3. Set stop conditions

  • Define when to stop trading an algo

  • Example: Stop if Sharpe drops below X for Y days

  • Example: Stop if drawdown exceeds Z%

Advanced Scorecarding

Monte Carlo Simulation

[Does Datafye support Monte Carlo analysis?]

Purpose:

  • Assess robustness of results

  • Understand range of possible outcomes

  • Estimate probability of drawdowns

How it works:

  • Resample trades with replacement

  • Generate thousands of alternate histories

  • Analyze distribution of outcomes

Parameter Sensitivity Analysis

[How does Datafye visualize parameter sensitivity?]

Purpose:

  • Understand how sensitive strategy is to parameters

  • Identify robust parameter regions

  • Avoid parameter overfitting

Approach:

  • Vary parameters systematically

  • Score each combination

  • Visualize parameter space performance

  • Look for plateaus (robust regions) vs. peaks (overfitting)

Multi-Market Validation

Purpose:

  • Verify strategy works across different markets

  • Ensure not curve-fit to specific securities

Approach:

  • Test on different symbol sets

  • Test on different timeframes

  • Test on different asset classes

  • Consistent performance = robust strategy

Scorecards in the Trading Environment

Scorecarding doesn't stop after backtesting:

Live Performance Tracking

[Does Trading Environment provide live scorecards?]

Continuous scoring:

  • Track same metrics in live trading

  • Compare to backtest expectations

  • Detect when algo stops working

Alerts:

  • Notify when metrics degrade

  • Example: Sharpe drops below threshold

  • Example: Drawdown exceeds limit

Forward Testing

Paper trading scores:

  • Generate scorecards for paper trading period

  • Must be close to backtest scores

  • Large discrepancies require investigation

Live trading scores:

  • Track live performance

  • Adjust position sizing based on realized performance

  • Stop trading if metrics degrade significantly

Next Steps

Now that you understand scorecarding, here's how to put it into practice:


Last updated: 2025-10-22

Last updated