Job Description
We are looking for a skilled Senior level Site Reliability Engineer (SRE) to join our team.
An ideal candidate should deeply care about big data systems and automation, be fluent in Java, and be eager to learn as needed.
The candidate will be involved in all aspects of the data platform, including ideation, design, implementation, deployment, customer onboarding and support.
* This involves regular cross-team collaboration with Data Engineering, Infrastructure, Engineering, Security, and Operation Teams.
* The candidate will take ownership of the data platform, regularly interacting with internal customers, proactively identifying, prioritizing, and delivering on their common data platform needs.
Requirements
To succeed in this role, you will need:
* A bachelor's degree in Computer Science or a similar technical major, or equivalent work experience.
* 8+ years of practical work experience.
* Experience architecting, deploying, and managing large-scale data platforms (Kafka, Spark, Hadoop, Druid) running on top of Kubernetes.
* Experience developing and maintaining custom Java and Python applications enhancing platform capabilities.
* Experience developing automation for deployments (CI/CD) using Ansible and Jenkins.
Bonus Requirements
To stand out in this role, you will benefit from:
* Experience ensuring platform SLOs by collecting, visualizing, and alerting on relevant telemetry.
* Experience upgrading large-scale data platforms improving system capabilities and security while ensuring minimal customer impact.
* Ability to troubleshoot complex issues in large, distributed environments.
* Staying up-to-date with industry data platform best practices and standards, focusing on hybrid cloud environments.
About Us
We are a mission-focused, values-driven company where each individual can contribute to building a stronger, more secure internet.
We offer a dynamic and flexible work environment with competitive benefits and the ability to grow your career.