For Scheller College of Management

What I build with AI.

The full range — from a distributed computer-vision phone cluster, to live AI demos, to the repeatable protocols I use to make AI output reliable, to early reading-education games, down to a single-screen game.

▶ Click any card to open the embedded demo. Mic/camera demos ask permission; nothing leaves your device.

See the model run — real output

Not a mockup. This is a real detection sequence the computer-vision model produced, rendered with the current "rings" overlay. Press play.

‹ tap or swipe ›

Five real detection scenes — the model locks a tracking ring on every worker it sees. Tap the left or right half, or swipe, to move through them.

Trained on the hard cases — on purpose

Where this pulls ahead of the field.

The footage is AI-generated. The detection is real.

This training footage is synthetic — generated on a single RTX 3060. That lets me manufacture the edge cases ordinary datasets never capture: workers leaning, crouching, half-hidden behind equipment, crowded together, in bad light — thousands of them, on demand.

Most PPE-detection systems learn from easy, well-lit, one-person-at-a-time footage, so they fail exactly when it matters. Mine is built on the hard cases from day one, then sharpened on real site B-roll. The rings and labels above are the detector's genuine output running on that footage.

1 Generate edge-case video · RTX 3060 → 2 Train the detector on the hard cases → 3 Sharpen on real site B-roll

Why it runs on the phones — and why it won't watch faces

The architecture is the ethics. They're the same decision.

Detection pushed to the edge, on purpose.

The inference doesn't run in a central cloud that every camera reports up to — it runs on the phones, on-site. That's a deliberate choice, not a cost trick. When the model lives at the edge, the site itself decides and updates what can and should be detected — the safety gear that matters for that job — and the detection happens locally. The raw feed never has to leave the site to be useful.

It matters because of where everyone else is headed. The easy, invasive version of this future tracks your face — central servers, identities, a permanent record of who was where. We're building the opposite: a system that watches for a hardhat, a vest, a harness, reports whether the gear is on, and has no reason to know whose face it is. Not tracking your face is the creed — and putting the model on the phones is how that promise is kept structurally, not just written into a policy.

Our approach

Edge inference, on-site. Detects PPE — what should be worn — and reports compliance. No identity, no face; nothing leaves the site by default.

The invasive default

Central servers ingesting every camera, recognizing and tracking faces, building a record of who was where and when.

Computer vision · the phone cluster · data engine

The heavier engineering — open each to explore the live artifact.

In-browser AI demos — live URLs

Public-web sites, header-verified to embed; each modal also has an open-in-new-tab link.

Security-oriented work

AI/ML security research — the supply-chain risks that come with downloading models and trusting what they run.

My AI protocols — in plain English

Repeatable "recipes" I run to make AI answers more reliable: have the model attack its own work, make two models compete, check from several angles, or simply do it twice and merge. Each card opens the real tool — the blurb explains it in normal terms.

Education · speech & reading

Where this started — browser games built early with ChatGPT-4 and the original Claude on Firebase: speech recovery for aphasia, and reading for kids.

Entrepreneurial · built & taught with AI

An AI-development training guild and AI-assisted games — playable / live.

The other end of the range

Same toolchain, smallest target: a single-screen build.

Also in the stack

Systems without a clickable demo — what they actually do.

ArcCoach Sports CV — MediaPipe pose + YOLOv8 ball tracking → parabolic trajectory regression (release/apex/entry). Marlin-2B auto-labeler — VLM quality-control in a self-healing loop; 40k+ labels through device crashes. Multi-agent swarm — memory store, hooks routing, dynamic agent spawning, vector-DB retrieval drives the vision pipeline. The Ellis Metric — ingest→extract→align→track→view pipeline over long-form public-figure video. Chop Shop — Whisper transcription + emotion + NLP scoring over an audio corpus (400+ clips). MusicGen / AudioGen — local Meta AudioCraft generative-audio serving with model caching.

How the overlay evolved

Same detector — three generations of how it shows what it sees. Each pass cut visual noise while keeping the signal. Press play on any.

Boxes + confidence

first

Every detection got a box and a percentage. Accurate — but on a crowded frame the boxes stacked up and buried the picture.

Emoji + gauges

then

Swapped raw boxes for per-object emoji badges and confidence gauges. Easier to read at a glance — still busy when the scene filled up.

Tracking rings + links

now

One clean ring per worker, PPE linked per person. Stays legible even in a dense crowd — the current overlay.

The iterative method behind it

None of this was one-shot. It was built the way I build everything — version after version, each one standalone and testable.

A long file list of PPE-Vision kiosk versions, from draft v0.0.4 through final v1.25.0 and beyond

This is one screen of the PPE-Vision kiosk's version history — from draft v0.0.4 to v1.25.0+, dozens of standalone iterations built in a matter of days.

Every version is its own file. Nothing is overwritten; each step is reviewable and reversible, and the next one builds on what worked. The overlay evolution you just watched — boxes → emoji → rings — is the same method applied to one feature.

That cadence is the point: AI doesn't replace the iteration — it makes each loop faster, so the work compounds instead of stalling.

The capability, end to end

This is the model
actually running.

Six scenes at once — every worker tracked, PPE linked per person, frame after frame. The footage is synthetic, generated on local hardware; the detection riding on top is the live output of the system I built. No cloud, no stock footage.