Descripción de la oferta
Contractor role; US-based company. We operate remotely - most of the Engineering team is CET. Our data team collects large-scale web and API data to power these insights. We have a small, focused group that owns source coverage and freshness. The Data Acquisition Lead sets priorities and reviews complex fixes; the Data Engineer maintains schemas, pipelines, and SLAs. Your focus is 80 percent web data collection and spider reliability; You will keep pipelineshealthy, support internal users, and run QA checks so data stays accurate at all times. This is an early-career role with significant growth. 80 percent - Spiders and data collection Build and maintain spiders and API collectors in Python/JavaScript; Write validation checksand source-level QA to prevent bad data from entering the warehouse. Adjust small Python or SQL transformations when a source output changes. raise and resolve data quality issues. Collaborate with Data Engineers on schemas and idempotent loads into the warehouse. Provide lightweight technical support to internal consumers. Follow legal and ethical guidelines for data collection; respect terms, privacy,and access controls. Communicate clearly in English with engineers and non-technical stakeholders. Our stack (you do not need all of it)Node.Js in JavaScript or TypeScript ; Proxy providers: integration and rotation, residential or datacenter, country targeting, session stickiness. Cloud Run or Cloud Functions, Pub/Sub, Cloud Storage, Cloud Scheduler. Data : BigQuery or Postgres fundamentals, CSV or Parquet handling. personal projects or internships are fine. Core web fundamentals : headers and cookies, session handling, JSON APIs, simple auth flows. Comfortable in Node.Js and TypeScript or JavaScript ; Experience with other web crawling frameworks, for example Scrapy, is valued and a plus. Schedule and orchestrate runs reliably using Cloud Scheduler and Airflow or Mage where appropriate, with clear SLAs and alerting. Basic SQL; comfort reading or writing simple queries for QA. Experience with GitHub Actions, Docker, and simple cost-aware choices on GCP. Exposure to data quality checks or anomaly detection. 30 days: ship your first spider, add monitoring and a QA checklist, fix a real breakage end to end. ~Real impact on a data product used by operators. Fully remote with EU and US overlap; Learn from senior engineers and grow toward data engineering or platform paths. if possible one example of a scraper you built and what it collected.