Skip to main content

work_experience

archived

Ilghar Consulting

Built a machine that scraped and structured the entire UK job market, daily, without human input.

Co-founder & Data Systems Architect·Jan 2021 — Aug 2024
Ilghar Consulting project media

// what problem this solves

The UK visa sponsorship market is high-stakes and painfully opaque. Thousands of roles go live every day across fragmented job boards, but figuring out which ones actually sponsor visas is a needle-in-a-haystack problem. For applicants, stale or incomplete data means missed opportunities. For a platform trying to serve them, manual tracking simply does not scale.

// what I built

I built a high-concurrency data ingestion and processing system that turned the UK job market into a structured, searchable, sponsorship-focused database. It continuously scraped, cleaned, validated, and indexed job data at scale, creating the data engine behind ukvisajobs.com. What used to be scattered across hundreds of sites became a reliable, queryable intelligence layer.

// how it works

The system ran as a resilient multi-node scraping cluster built with Python and Selenium, designed to handle JavaScript-heavy sites, anti-bot friction, and large-scale ingestion without constant human babysitting. A centralized ETL pipeline normalized wildly inconsistent source data into a canonical PostgreSQL schema, then validated and enriched records for downstream use. On top of that, I optimized the query layer so dashboards and filters could slice through millions of records in real time without sacrificing reliability.

// result

  • 90% reduction in manual data entry through fully autonomous discovery and indexing
  • Millions of historical and live job postings ingested and structured into a searchable database
  • 99.9% system uptime from a self-healing scraping cluster built for sustained heavy load
  • Sub-second filtering and dashboard performance across enterprise-scale datasets
  • Provided the data foundation for a visa sponsorship platform users could actually trust

the stack

PythonSeleniumPostgreSQLGCP