
A single catalogue for a fragmented market. Veilingen.ai ingests 25,000+ active lots from 18+ Dutch and international auction houses, enriches them with Gemini vision and text models, and surfaces them through a unified discovery experience with creator analytics, saved searches, and a premium tier.
18+ Sources
Auction house coverage
Gemini 2.0
Vision + categorisation
Stripe Billing
Premium subscriptions
PWA
Push-ready, offline-first
// Enrichment pipeline
Scrape
Playwright + Selenium
Sync
Batched API ingest
Classify
Gemini + keyword
Serve
Cached Next.js
// What we built
- Unified catalogue — 14-dimension filtering across category, creator, period, source, condition and price — one search for the entire Dutch auction landscape
- Scraper fleet — 18+ independent scrapers in Python (Playwright, Selenium, Requests) run daily via GitHub Actions with batched sync to the API
- AI categorisation — 45 main categories and 894 subcategories resolved via hybrid keyword matching (~65%) plus Gemini 2.0 Flash for the long tail (~35%)
- Creator analytics — Dedicated maker pages with 10-year price trends, monthly distributions, top sales, and source breakdown charts
- Dynamic taxonomy — Admin hub edits categories live; the Gemini system prompt rebuilds on mutation with zero downtime and optional git auto-sync
- Saved searches & favourites — Per-user filter presets, wishlist with optimistic UI, search history and trending suggestions
- Premium subscriptions — Stripe billing with webhook-driven lifecycle, billing portal, and server-side gating for extended analytics and alerts
- Admin hub — Scraper monitoring, taxonomy management, lot quality checks, orphan clustering, and audit logging
// By the numbers
// Architecture
Scraping layer
Python 3.11 with Playwright and Selenium for JS-heavy houses, plain Requests for API-friendly sources. GitHub Actions schedules daily runs and posts in 25-lot batches to the Node backend.
API & enrichment
Express on Render with MongoDB Atlas. A 2-hour text-enrichment job and a nightly vision job route lots through Gemini 2.0 Flash with confidence scoring and a keyword fallback.
Discovery app
Next.js 15 on Vercel with a service worker for network-first HTML and cache-first assets. In-memory API caching, dynamic sitemap, and JSON-LD for SEO.
// Engineering challenges
- Cost control at AI scale — Naively routing every lot through Gemini would have been uneconomical. A keyword dictionary handles roughly two-thirds of lots for free; only the ambiguous remainder hits the model, bringing per-lot enrichment cost down to ~$0.001.
- Taxonomy that evolves without downtime — Auction stock shifts constantly. The hub writes category changes to MongoDB overrides and rebuilds the Gemini system prompt in-process, so a new subcategory is trained and live within seconds of approval.
- Heterogeneous sources, one schema — Each auction house exposes lots differently. A shared normaliser and per-source adapter layer turn 18 wildly different data shapes into one canonical lot document with consistent dating, pricing, and condition fields.
- Long-horizon price analytics — 10-year rolling price history is precomputed per creator and served via cached routes, so maker pages render instantly even with hundreds of thousands of historical sales behind them.
