Detailed case studies from recent engagements. More projects added as they ship.
Independent ad evaluation pipeline for a brand that couldn't get straight answers from its agency. Pulls Meta and Shopify data nightly, evaluates every active ad through Claude and delivers recommendations via Discord.
Read case study →AI-powered pipeline that extracts entities from trade documents, scores risk across 13 commodity categories and enables natural language querying.
Read case study →
Complete Shopify rebuild for a US probiotic kombucha brand. Custom storefront, subscription architecture, DDP shipping and ongoing SEO campaign.
Read case study →An eval-first pipeline that gives a brand owner an independent read on their Meta ad performance. Pulls data from Meta and Shopify each night, evaluates every active ad through Claude Sonnet and delivers recommendations with supporting reasoning via Discord. No agency interpretation required.
The brand had been running paid ads through an external agency and couldn't get clear answers about whether they were working. Reports arrived in the agency's format. Performance questions were deflected. Decisions about scaling, killing and refreshing individual ads were being made on incomplete information. The owner needed an independent, automated way to evaluate ad performance against his actual Shopify sales data, not the agency's interpretation of it.
A nightly pipeline that pulls ad performance data from the Meta Ads API and order data from Shopify, aligns them in the same format and stores the combined dataset in Postgres. That dataset is passed to Claude Sonnet, which produces a recommendation for each active ad: scale, kill, refresh or monitor. Each recommendation includes a confidence rating, supporting reasoning that cites specific metrics and flags for unusual data or tracking discrepancies between Meta and Shopify.
Recommendations are summarised in a daily report delivered to the client via Discord. Beyond the daily report, the client has access to a Discord bot for follow-up questions about any recommendation: why an ad was flagged for refresh rather than kill, why a confidence rating came back medium, what a tracking issue flag means in practice. The bot only uses the data and reasoning the evaluated prompt produced, so it cannot generate answers that aren't backed by the analysis. If it can't answer with confidence, it says so.
No version of the prompt went into the live pipeline without first being scored against a set of test cases covering eight scenarios: clear winners, clear losers, creative fatigue, tracking issues, low-data cases and edge cases. Two graders ran in combination: a code grader checking output structure and field validity, and a model grader assessing reasoning quality, metric citation, confidence accuracy and whether tracking issues were flagged correctly. Each prompt version was measured against the previous one. Only a version that beat a defined score threshold made it into the live system. The same eval-first approach was applied to the Q&A bot prompt.
The brand owner now has an independent basis for evaluating his agency's reporting. The pipeline catches under-performing ads earlier than weekly agency reviews. It surfaces tracking discrepancies between Meta and Shopify before they lead to bad scaling decisions. Every recommendation comes with a reasoning trail the client can interrogate directly.
An end-to-end AI pipeline that processes trade finance documents, extracts structured data, scores risk and provides a natural language interface for querying the full document set.
A Singapore-based trade finance firm was reviewing bills of lading, commercial invoices and certificates of origin manually. Staff were cross-referencing documents by eye, checking commodity pricing against market rates in spreadsheets and building risk assessments in Word documents. The process was slow, inconsistent and impossible to audit retroactively.
A complete document processing pipeline powered by AI. Documents are uploaded through a web interface and routed through an automated workflow that calls the Claude API for entity extraction. The system pulls structured fields from each document type - shipper details, port pairs, commodity descriptions, weights, values and dates.
Extracted data is stored in PostgreSQL and run through a deterministic risk scoring engine. The engine checks for pricing anomalies against a commodity lookup table covering 13 commodity categories, detects potential duplicate financing across the document set and flags routing or date inconsistencies. Every score is logged with a full audit trail.
The frontend is a React application with a two-panel layout: document details and scores on the left, a persistent AI chat interface on the right. Users can ask questions like "show me all palm oil shipments from Indonesia above market rate" and get answers drawn from the structured data.
Risk scoring was deliberately built as a deterministic JavaScript engine rather than an AI-generated assessment. This means scores are reproducible, auditable and free from the variability that comes with LLM-generated evaluations. The AI handles what it's good at (entity extraction from unstructured text) and the rules engine handles what needs to be consistent.
The platform processes documents in seconds that previously took staff 30-45 minutes each. Risk flags that were previously caught only by experienced reviewers are now surfaced automatically. The natural language interface lets junior staff query the document library without needing to understand the underlying data structure.
A ground-up Shopify rebuild for a US-based probiotic kombucha brand, covering custom storefront development, subscription architecture, international DDP shipping and an ongoing SEO campaign with AI-driven reporting.
The brand had outgrown its original Shopify setup. The cart was broken (a third-party app that had stopped working), subscriptions weren't converting, international orders were generating surprise customs fees and refunds, and the site had no SEO foundation. The store needed a complete rebuild - not a patch job.
A custom Shopify theme built from scratch in Liquid. The product page features a subscribe-and-save toggle, Mix & Match flavour selection for multi-packs and a custom cart drawer with a progress bar and milestone rewards. Subscription and bundle logic is handled through Appstle, with inventory automation via Shopify Flow.
International shipping was solved with a DDP (Delivered Duty Paid) configuration using EasyPost and DHL eCommerce, eliminating surprise customs charges for international customers. The setup prints as standard USPS labels, meaning zero workflow change for the existing 3PL.
SEO was built into the site architecture from the start: schema markup, optimised heading structures, FAQ pages with structured data, a blog with proper content hierarchy and a disavow file for existing spam backlinks. Google Search Console was configured and a sitemap submitted as part of the go-live checklist.
The engagement continues with a monthly SEO campaign and AI-driven performance reporting. Reports cover search console data, keyword visibility, sales trends and actionable recommendations - delivered as interactive HTML dashboards in the brand's own visual identity.
We're always interested in hearing about new problems to solve.
Get in Touch