Systems & Automation
Tooling that lets small teams run on AI agents, reliably
I’ve been writing software since 2017: web applications in Ruby on Rails and Django, theatre tech tools, data pipelines. But the work I find most interesting right now sits one level up: building the systems and tooling that let a small team — often an ops team of one — safely hand real work to AI agents.
That looks like two things in practice.
Operational systems. CRMs, databases, and import pipelines designed for correctness and maintainability: schema architecture, change-detection automations, documented conventions that survive staff turnover. Most of this work is for AI Safety organisations, where budgets are tight and the ops team is tiny.
Agent tooling. A set of open-source tools that make AI-augmented work verifiable rather than vibes-based: an agent that screenshots and checks its own visual output, hooks that force agents to verify code before claiming it works, utilities for reading real office documents, and Airtable tooling for schema diffing and standards checking. The common thesis: agents are only useful for operations if you can trust their output, and trust comes from verification loops, not hope.
Everything in this area is on my GitHub.
Projects in this area
All projects →CRM & Data Infrastructure for an AI Safety Organisation
Designed and built the central CRM architecture, alumni database, change-detection automations, and donor-import pipelines for an AI safety organisation
Vischeck
A hook, skill, and CLI that make AI agents screenshot and visually check their own UI changes
Readoc
CLI tools that let AI agents read and edit Word documents, Excel sheets, and PDFs
Rails Toolkit
Agent skills encoding Ruby on Rails 8 conventions, so AI-written Rails code follows the framework instead of fighting it
Dev Hooks
Hooks and skills that force AI coding agents to verify their work instead of pattern-matching their way to 'done'
Airtable Utils
Schema export, schema diffing, standards checking, and access auditing for Airtable bases — plus an agent skill for writing correct Airtable scripts
Edinburgh Festivals Chat
Chatbot providing information about the Edinburgh Festivals, helping you decide which of >4000 shows to see
Spotify Tools
Listening goals and statistics for Spotify. Retired since Spotify stopped verifying apps from small developers
ImpAmp 3
Web soundboard used live in improv comedy shows — an early experiment in building real software with AI agents
Black Lightning
Maintainer of the Rails application that runs Bedlam Theatre
An AI safety organisation needed its scattered contact, alumni, and donor data turned into one reliable system. I designed and built it.
CRM architecture and documentation. I designed the central base — People and Organisations tables linked many-to-many through join tables, with a stable primary key — then did a field-by-field documentation pass across the entire base: table guides, consistent naming conventions for consent-sensitive fields, and flagging dead fields for deletion. The goal was a CRM whose structure a new team member can understand without me in the room.
Alumni database consolidation. I merged two parallel fellowship-alumni tables into one, while deliberately keeping team-assessed hiring data in a separate 1:1-linked table — self-reported and team-assessed data have different provenance and different sharing rules, and the schema should enforce that distinction.
Change detection without a changelog API. Airtable can’t tell you what changed on a record, only that it changed. For a self-service alumni profile form, I built a snapshot-and-diff automation: each record carries a JSON snapshot of its fields, diffed on every form submission, writing one changelog row per submission split into “needs processing” and “other changes”. The team reviews actual diffs instead of re-reading whole records.
Donor import pipeline. An email-triggered upsert script handling ~1,500 donation records from a major effective-giving platform: CSV parsing, donor matching by email then platform ID, ID backfill, and create-or-update keyed on donation reference. The key architectural fix was moving a downstream webhook call out of the import script — which was blowing through Airtable’s 50-fetch ceiling — into a separate record-created automation that Airtable throttles naturally.
AI coding agents will confidently claim a UI change “looks good” without ever rendering it. Vischeck closes that loop: a hook, skill, and CLI that let an agent take authenticated screenshots of its local dev server and critically review the result before reporting back.
The screenshot CLI handles dev-server auth automatically, supports dark mode, mobile viewports, element-level captures, and batch screenshots of whole page sets. The hook fires whenever the agent edits a view or template file, so visual verification happens by default rather than when someone remembers to ask.
The skill is deliberately opinionated about how to review: zoom in on the changed component, compare against an existing known-good page, and walk an explicit checklist (contrast, spacing, house-style conformance, overflow) before giving a verdict. “Looks good” without evidence counts as a failure.
I built it because I run AI agents on real projects daily — including this website — and unverified agent output is the main thing standing between “agents help” and “agents can be trusted with the work”.
Operations work lives in Word documents, Excel sheets, and PDFs — formats AI agents can’t natively open from the command line. Readoc gives agents three CLIs: readoc to extract the contents of a .docx, .xlsx, or PDF as structured text, readir to explore and search whole folders of mixed documents, and editdoc to make targeted edits to Office files without mangling their formatting.
The practical effect: an agent can be pointed at a shared drive full of policies, budgets, and reports and actually work with them, instead of being limited to plain text and code.
I built it for my own operations work, where “can the agent read the document” was the recurring blocker between AI assistance and the tasks that actually fill an ops person’s week.
AI agents write Rails code the way the average of the internet writes Rails code — which is to say, fighting the framework. Rails Toolkit is a set of Claude Code skills that encode modern Rails 8 conventions and hard-won project rules, so agent-written code comes out idiomatic: thin controllers, concern-based models, Solid Queue jobs, Stimulus controllers with modern JavaScript, Hotwire/Turbo patterns, fixture-based tests.
It also includes higher-level workflows: a full application audit that orchestrates the individual skills into a severity-ranked health report for inherited codebases, a database performance review, and an upgrade analyzer covering breaking changes from Rails 2.3 through 8.1.
I maintain Rails applications (including the administration system behind a 100-year-old theatre company), and this toolkit is how I make AI agents productive on them without sacrificing code quality.
The default failure mode of AI coding agents is declaring victory: “this should work now” after a change that was never run. Dev Hooks is a collection of Claude Code hooks and skills that counteract that, built from patterns I kept reapplying across projects.
Highlights:
- but-for-real — intercepts premature success claims and demands the agent actually run, test, and inspect before saying something works.
- board — assembles a panel of adversarial expert critics to poke holes in a plan, draft, or pitch, as an antidote to agreeable-by-default responses.
- premortem — before executing a risky plan, asks “it’s six months later and this failed — why?”.
- humanizer — strips the tell-tale signs of AI-generated writing from text, based on Wikipedia’s signs-of-AI-writing guide.
- code-simplifier, readability, self-rate — quality passes for code and prose.
The shared thesis with my other agent tooling: agents become genuinely useful when scepticism and verification are built into the workflow itself, not left to the human reviewing the output.
Airtable is the operational backbone of many small organisations, and it has no native tooling for the questions a responsible admin asks: what changed in this base since last month? Does our structure follow our own conventions? Who has access to what?
Airtable Utils answers those:
- Schema export — dump a base’s full structure (tables, fields, views) to JSON for documentation and version control.
- Schema diff — compare two exports to see exactly what was added, removed, renamed, or retyped between snapshots.
- Standards check — validate a base against a written set of naming and structure conventions, producing a compliance report.
- Access audit — scrape collaborator and permission data across bases to see who can touch what.
It also includes a comprehensive agent skill for writing Airtable scripts — both scripting-extension and automation scripts — encoding the API’s quirks and limits (like its 50-fetch ceiling) so generated scripts are correct the first time.
These grew directly out of my CRM and database work for AI Safety organisations, where the base is the institutional memory and silent schema drift is a real operational risk.
The Edinburgh Festival Fringe programme lists around 3,500 shows. Nobody reads it cover to cover, which means most people pick from the handful of shows with the biggest marketing budgets. I was frustrated enough by this to build the tool I wanted: a chat interface where you describe what you’re in the mood for and get recommendations from the full programme.
Under the hood it’s a RAG pipeline: semantic search over the festival data with ChromaDB vector storage, category filtering, and a FastAPI backend serving a React/TypeScript frontend.
Spotify Tools allows you to set listening goals for your favorite artists, albums, and tracks. The application syncs with your Spotify listening history and helps you reach your goals by generating personalized playlists based on your preferences and what you need to listen to. It also gives you detailed statistics of your listening history.
Unfortunately, Spotify no longer verifies small apps from individual developers, so it is not publicly accessible. If you would like to try it, please contact me with your email and I can give you access.
Improv comedy needs sound effects now — a technician fumbling through folders kills the joke. ImpAmp is Bedlam Theatre’s purpose-built soundboard for live improv: multiple soundbanks, instant search, armed tracks, multiple sounds per pad, and Google Drive sync so each show’s sounds follow the team.
This third version was also a deliberate experiment in AI-driven development: I built it in Next.js using Claude 3.7 Sonnet and Gemini 2.5, back when letting agents write most of the code was still a novelty. It has been running real shows since.
Black Lightning is the Ruby on Rails application behind bedlamtheatre.co.uk: the public website, show archive, and internal administration system of the Edinburgh University Theatre Company. This app keeps proposals, shows, seasons, and members organised across yearly committee turnover.
I took over maintenance of a codebase that had outlived several generations of student developers and carried it through major upgrades, from Rails 5 all the way to Rails 8, modernising the stack and Dockerising deployment along the way. It is where I learned most of my web development and product skills, because production code with real users is the best teacher.