Curated jobs from companies hiring worldwide — engineering, design, product, marketing, support, and more. Remote‑first, updated daily.
Sign in to save this search and get notified by Telegram or email the moment a matching job is posted.
No account? Create one
Pick a job to read the details
Tap any role on the left — its description and apply link will open here.
Share this job
We're building the data backbone for European public procurement. Our platform aggregates tender data from 100+ e-procurement portals — each with its own quirks, anti-bot protections, and legacy HTML.
We're looking for a scraping engineer who can navigate this landscape: someone who's comfortable with headless browsers, knows how to handle sessions and CAPTCHAs, and won't panic when the same platform serves three different HTML layouts across pages.
What you'll do
Build and maintain async scrapers (Python + Playwright) against Italian and later European public procurement portals (Maggioli PortaleAppalti, ANAC, MePA, and others)
Handle real-world challenges: JSESSIONID session management, FriendlyCaptcha/Mosparo anti-bot, Cloudflare WAF, IP rotation with rate limit backoff
Parse Italian data formats — amounts (€ 1.234.567,89), dates (DD/MM/YYYY, textual), CIG/CUP identifiers with placeholder detection
Extract and process documents: PDF, .p7m (PKCS#7 signed), ZIP/7Z archives, with OCR fallback
Integrate scrapers into our Prefect orchestration pipeline with monitoring, alerting, and anomaly detection
Work with PostgreSQL, Supabase, Clickhouse, and S3 for dual-sink storage with upsert/idempotency patterns
What we're looking for
Strong async Python — you think in asyncio, not time.sleep()
Playwright or Selenium experience — you've intercepted XHR responses, handled SPAs, and debugged timing issues
Resilience mindset — retry with backoff, graceful degradation, circuit breakers. Your scraper doesn't crash at 3 AM.
Comfort with messy HTML — you can write a multi-strategy extractor that handles