Project

Astralanx

Astralanx is a system built to discover long-term quantitative stock-picking strategies and then run the best of them as a simulated portfolio. This page describes the methodology: where the data is imported from, how a candidate strategy is backtested, how realistic costs are simulated, the filters used to define a realistic investable universe, and how out-of-sample testing is applied.

~57K
lines of Python
~11K
lines of native C
~3,500
names in the tradable universe
View the strategiesLive performance for every deployed strategy, with additional open-formula strategies broken out on their own.

The engine

A genetic-programming search

Astralanx is a highly performant genetic-programming engine. It generates large populations of candidate strategies, evaluates each one against historical market data, and evolves them across many generations.

How those strategies are represented and evolved is intentionally kept private. What is documented here is everything that accompanies the GP algorithm (data, the backtesting system, cost simulation model, and validation/safety measures).

Data source

Tiingo price data

Every strategy is purely price-based and cross-sectional. There are no fundamentals, no macro inputs, no alternative data, and no discretionary overrides — all signal is derived from each ticker's own price and volume history, plus a single benchmark series.

  • Per-ticker market data comes from Tiingo: adj_close, adj_open, adj_high, adj_low, adj_volume, and the unadjusted close.
  • The adjusted OHLCV fields drive every signal and ranking feature. The unadjusted close is used only for the minimum-price eligibility gate, specifically so a stock split cannot leak future information into a historical decision.
  • The benchmark is the S&P 500 series, used for benchmark-relative and market-relative measurements.

The starting universe is built from Tiingo listing metadata for the NYSE, NASDAQ, AMEX, NYSE MKT, NYSE ARCA, and BATS exchanges. Forex, crypto, and mutual-fund entries are dropped at build time, along with obvious non-common share classes. It is important to note that this selection of exchanges is likely quite arbitrary, given their universe scale these strategies will likely work with any large stock-based universe. (for the live trading component of this site, the yahoo finance API is used instead, but the simulated cost model from the engine is still applied)

Universe & filters

What is allowed into the backtest

The selectable universe is then narrowed by two kinds of filters: filters that remove obviously malformed or partially missing data, and filters that remove niche, risky, or structurally unconventional tickers. It's worth knowing that the second category largely subsumes the first — for the conventional names these strategies trade, Tiingo's data is very reliable.

  • Tickers flagged with extreme or malformed prices are excluded under a strict default setting.
  • Non-common instruments — warrants, units, rights, preferreds, and structurally non-tradable symbols — are removed, as are FX, crypto, mutual-fund, and OTC-style entries.
  • Any ticker whose adjusted-close max/min span is at least 100,000× over its history, or that prints an absolute one-day return of 10,000% or more, is thrown out as corrupt.

On top of the static universe, every date applies causal eligibility rules — using only information that existed on that date:

  • The ticker must already have begun trading, and must not be past its last real observation.
  • It must not be in a stale-data gap. Once a price has been forward-filled for more than seven trading days the name is marked stale and excluded from new selection.
  • Its unadjusted close on the rebalance date must be at least $10.
  • It must clear a 63-day trailing median dollar-volume screen of at least $5M, shifted by a day so the screen is never computed on the day it acts.

Survivorship bias is handled strictly. There is no present-day-survivors universe: a name becomes eligible only once it has actually begun trading in the historical record, and it leaves the universe once it is past its last real observation. When a name goes stale or delists, the engine applies a one-time 100% penalty to the transition and then zeroes its returns — so a position that quietly disappears from the data is forced to take a realistic exit hit rather than coasting at its last good mark.

Backtesting

The backtesting model

The backtest is deliberately simple, so the results are easy to reason about and hard to game.

  • Daily bars. Strategies are evaluated on daily price data — no intraday or tick assumptions.
  • Signal on the rebalance date. On each rebalance the formula is evaluated against the data available as of that date to produce target weights. Between rebalances the basket is held fixed and marked to market.
  • Next-open fills. When the basket changes, fills land at the next bar's open — never at the close that produced the signal. This removes the look-ahead bias of trading on a price you could not have acted on.
  • Four regimes over twenty years. Training always spans four non-overlapping five-year regimes, this tests the strategy across a variety of market conditions and ensures it isn't just optimized for one particular stretch.

The headline statistics mean what they usually mean: CAGR (compound annual growth rate), Sharpe (risk-adjusted return), and max drawdown (worst peak-to-trough decline).

Costs

The cost model

Ignoring trading costs overstates returns, so every fill is charged across several components rather than a flat fee. All of them are configurable; the defaults below are the ones behind the published numbers. (strategies can be rerun with different cost assumptions on request)

  • Commission — a base of 5 bps on traded notional (1 bp = 0.01%).
  • Slippage — a base of 5 bps per 1.0 of turnover, where turnover is the exact sum of absolute weight changes between the drifted prior holdings and the new targets.
  • Volatility scaling. Both commission and slippage are scaled by 1 + 0.75 · √(realized_vol_63d / long_term_vol_252d), so trading into turbulent markets costs more.
  • Price scaling. Slippage is further scaled by 50 / harmonic_mean_portfolio_price — lower-priced baskets pay proportionally more.
  • Market impact. A square-root impact term, 0.5 · √(trade_value / ADV), is applied weight by weight against each name's rebalance-date dollar volume. If average daily volume is missing, a punitive 5% impact charge is applied to that weight.

The exact commission and slippage used for each deployed strategy are published alongside it on the dashboard as well.

Validation

Out-of-sample testing & validity measures

The single most important rule in the engine is the train / out-of-sample firewall. Fitness is computed on training data only. We reserve an out-of-sample test only for the most promising strategies

Crucially, the out-of-sample test is run only rarely, on a handful of specific contenders. That rules out the obvious objection that strong out-of-sample numbers are just an artifact of testing thousands of variants against the holdout — which would basically amount to just using those years as extra training data.

When a strategy is replayed on unseen data, several validity measures are computed to check that it is a real edge rather than a fragile curve fit:

  • Directional consistency between the development period and the unseen window — CAGR, Sharpe, and drawdown should stay in the same ballpark, not collapse.
  • Rolling-window stress — worst rolling 3- and 5-year CAGR, and rolling Sharpe, to see how bad a bad stretch actually gets.
  • Benchmark relationship — correlation and beta against the S&P 500, so the return stream isn't just leverage on the index.
  • Factor decomposition — a Fama-French regression (market, size, value, profitability, investment, momentum, reversal) to measure annualized alpha and confirm the strategy isn't a repackaging of one standard style sleeve.
  • Sector exposure & active share — how far the basket departs from an equal-weight eligible universe, reported as aggregate sector exposure.
  • Capacity — two views: a liquidity-screening estimate (how large the book could get while staying a small fraction of each name's volume) and a stricter impact-consistent estimate using the same square-root impact model as the backtest that only allows 100bps slippage.

Open strategies show all of this end to end, including the full basket. Secured live strategies publish performance and aggregate sector exposure only.

Everything on the dashboard is a simulated paper portfolio This is not investment advice.