The full range — from a distributed computer-vision phone cluster, to live AI demos, to the repeatable protocols I use to make AI output reliable, to early reading-education games, down to a single-screen game.
Five real detection scenes — the model locks a tracking ring on every worker it sees. Tap the left or right half, or swipe, to move through them.
This training footage is synthetic — generated on a single RTX 3060. That lets me manufacture the edge cases ordinary datasets never capture: workers leaning, crouching, half-hidden behind equipment, crowded together, in bad light — thousands of them, on demand.
Most PPE-detection systems learn from easy, well-lit, one-person-at-a-time footage, so they fail exactly when it matters. Mine is built on the hard cases from day one, then sharpened on real site B-roll. The rings and labels above are the detector's genuine output running on that footage.
The inference doesn't run in a central cloud that every camera reports up to — it runs on the phones, on-site. That's a deliberate choice, not a cost trick. When the model lives at the edge, the site itself decides and updates what can and should be detected — the safety gear that matters for that job — and the detection happens locally. The raw feed never has to leave the site to be useful.
It matters because of where everyone else is headed. The easy, invasive version of this future tracks your face — central servers, identities, a permanent record of who was where. We're building the opposite: a system that watches for a hardhat, a vest, a harness, reports whether the gear is on, and has no reason to know whose face it is. Not tracking your face is the creed — and putting the model on the phones is how that promise is kept structurally, not just written into a policy.
Every detection got a box and a percentage. Accurate — but on a crowded frame the boxes stacked up and buried the picture.
Swapped raw boxes for per-object emoji badges and confidence gauges. Easier to read at a glance — still busy when the scene filled up.
One clean ring per worker, PPE linked per person. Stays legible even in a dense crowd — the current overlay.
This is one screen of the PPE-Vision kiosk's version history — from draft v0.0.4 to v1.25.0+, dozens of standalone iterations built in a matter of days.
Every version is its own file. Nothing is overwritten; each step is reviewable and reversible, and the next one builds on what worked. The overlay evolution you just watched — boxes → emoji → rings — is the same method applied to one feature.
That cadence is the point: AI doesn't replace the iteration — it makes each loop faster, so the work compounds instead of stalling.
Six scenes at once — every worker tracked, PPE linked per person, frame after frame. The footage is synthetic, generated on local hardware; the detection riding on top is the live output of the system I built. No cloud, no stock footage.
One ordinary phone runs this. A fleet of them runs a whole site — in real time, on the edge, on hardware you already own.