Data Engineer
Google Cloud Platform (GCP)
Apache Airflow, Spark, Python, Scala
Infrastructure as Code (IaC) tools such as Terraform or Ansible
Experience building Customer Data Platforms (CDPs)
Experience with AI-assisted developer tools - IntelliJ plug-ins using OpenAI or Anthropic models, Codex CLI, Windsurf.
Experience – 5 yrs
Canada Remote
Full time requirement
Role Overview:We’re looking for a skilled Data Engineer to design, build, and optimize scalable, cloud-native data pipelines on Google Cloud Platform (GCP). The role involves extensive work with Apache Airflow, Spark, Python, and Scala to develop high-performance data solutions supporting analytics, streaming, and generative AI initiatives.
Key Responsibilities:- Develop, automate, and maintain batch and streaming ETL pipelines using Apache Airflow, Apache Spark, Python, and Scala.
- Build and manage cloud-based data ecosystems on GCP (BigQuery, Bigtable, Dataproc, Pub/Sub, Cloud Storage, IAM, VPC).
- Design and optimize SQL and NoSQL data models for data lakes and warehouses (BigQuery, MongoDB, Snowflake).
- Write complex SQL queries for advanced data transformation, aggregation, and analytics optimization within BigQuery or equivalent platforms.
- Apply modern Test Driven Development (TDD) methodologies for big data pipelines, ensuring test automation across Airflow workflows, Spark jobs, and transformation logic.
- Apply data mesh and data-as-a-product principles to enable reusable and domain-driven datasets.
- Implement real time ingestion with Kafka Connect and process streaming data using Spark Streaming, Apache Flink, or similar technologies
- Optimize data performance, scalability, and cost efficiency across GCP components.
- Ensure compliance with PCI and PII data with standards such as GDPR, PCI DSS, SOX, and CCPA.
- Integrate GenAI tools such as OpenAI, Gemini, and Anthropic LLMs for intelligent data quality and analytics enhancement.
- Collaborate with stakeholders, data scientists, and full stack engineers to deliver trusted, documented, and reusable data products
Required Qualifications:- Bachelor’s or Master’s in Computer Science, Data Engineering, or related field.
- 5+ years of hands-on experience with large-scale data engineering in cloud environments.
- Advanced skills using Python, Scala, Spark ecosystem, SQL to build data pipelines
- Strong GCP expertise (BigQuery, Bigtable, Dataproc, Pub/Sub, IAM, VPC).
- Proficiency in SQL/NoSQL modeling and data architecture for cloud data lakes.
- Familiarity with streaming frameworks (Kafka, Flume).
- Experience handling sensitive data and ensuring regulatory compliance.
- Working knowledge of Docker, CI/CD, and modern DevOps practices for data platforms.
Preferred Qualifications:- Experience with Infrastructure as Code (IaC) tools such as Terraform or Ansible.
- Contributions to open-source projects or internal developer tooling.
- Prior experience building Customer Data Platforms (CDPs) inhouse
- Experience with AI-assisted developer tools (for example, IntelliJ plug-ins using OpenAI or Anthropic models), Codex CLI, Windsurf.
Job Type: Full-time
Pay: $70,503.61-$150,613.77 per year
Work Location: Remote