I wanted to share some of the behind the scenes notes from my first use of ML while building the rush detector ![]()
first up, here’s the actual signal cheat sheet I built before writing any code. this is what the bot is scanning for:
12 Pool signals:
- No natural expansion by ~1:20 to 1:30
- Spawning pool already done when your scout arrives
- Super low drone count in main (~12 to 14 instead of ~20+ by 2:00)
- No gas taken or gas not being mined
- First 6 lings show up around ~1:50, hitting by ~2:15
- No queen started after pool finishes (full ling commitment)
Speedling signals:
- Early gas geyser taken with the pool
- Natural hatch timing feels “off” or fake
- Zergling speed finishing at ~2:30 to 2:45 vs ~3:30 standard
- Gas mining stops around 100 gas if its pure speedling flood
- Early Baneling Nest around ~2:00 to 2:30 for ling bane variant
- Low drone count and delayed queens compared to standard
after training the ML model I looked at the coefficients, basically how much each signal pushes the model toward saying “yep that’s a rush.” and my hand written rule scores? they were doing almost all the work.
the 12 pool rule score had a coefficient of +1.519. huge. the raw timings like when the natural started? +0.065. not too much
so the ML wasn’t replacing what I built by hand. it was just calibrating it. smoothing out the messy edge cases where the rules weren’t confident.
I used scikit-learn for the model. very simple logistic regression. here’s the actual training code, under 20 lines:
import pandas as pd
from sklearn.linear_model import LogisticRegression
df = pd.read_csv("rush_data.csv")
X = df[["rule_score", "auto_true", "t_ling_seen", "pool_start_est", "t_nat_started", "gas_mined"]]
y = df["was_rushed"]
model = LogisticRegression()
model.fit(X, y)
and at runtime the bot uses it like this:
model = joblib.load("rush_model.joblib")
if auto_true:
return True
features = [[
rule_score,
auto_true,
t_ling_seen or -1,
pool_start_est or -1,
t_nat_started or -1,
gas_mined or -1
]]
p_rush = model.predict_proba(features)[0][1]
return p_rush >= 0.5
the lesson here, and this one’s for anyone thinking about adding ML to their bot. your rules don’t have to be perfect. they just have to be good enough for the model to build on top of.
I only had 4 speedling games in my training data
so the model thinks it knows speedlings but it really doesn’t. yet. more games, more labels, better model. that’s the grind.
if you’re working on detecting anything in your bot, cannon rushes, proxy rax, cheese builds, the pattern is honestly the same:
- figure out the signals humans use to spot it
- turn those into a point system
- collect game data
- let a simple model learn the weights for you
anyone try this with other cheeses?


