Quick definition
A machine learning approach where Treeova's AI agents learn from the outcomes of past trades to improve future decision-making, adjusting conviction scores and strategy parameters over time.
What the agent actually learns
Each agent maintains a Bayesian posterior over the parameters it controls — tool weights, conviction thresholds, exit timing. A trade that closed profitably after a high-conviction entry reinforces the parameter set that produced it; an unprofitable trade with high conviction is a stronger negative signal than a small loss with low conviction. The RL writer aggregates these into stable per-knob priors.
Integrity rails
RL is only useful if the data is clean. Treeova's RL writer excludes runs that were guard-blocked, deduplicated, or budget-truncated from the learning population on the specific axes those events corrupt. This is what stops a runaway loop where the engine "learns" from its own failure modes — a common pitfall in naïve RL deployments.
Where it shows up
The reinforcement loop runs upstream of conviction scoring, position sizing, and the ASI Evolution Engine's candidate-promotion gate. Over weeks an agent's parameter signature drifts toward what actually works for its instrument and regime — an SPX iron-condor agent and a meme-stock momentum agent end up with very different equilibria.