Site reliability engineer - full remote (eu only)

Mendrisio

Jobtome

Inserat online seit: 14 Januar

Beschreibung

About the company

At Jobtome - we are building a modern, cloud-native recruitment and marketing platform used at scale across multiple countries and brands.
Our systems power high-traffic job distribution, integrations with external partners, and real-time data pipelines, with a strong focus on reliability, observability, and automation.

Engineering is a core function of the company: we value ownership, pragmatic decision-making, and long-term technical excellence over short-term fixes.

The role

As a Senior Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our production systems.

You will work closely with Backend, Frontend, and Product teams to:

* design resilient architectures

* define reliability standards

* improve observability and incident response

* reduce operational toil through automation

This is not a pure ops role: you will contribute to codebases, collaborate on system design, and help evolve our engineering culture toward SRE best practices.

What you will do

* Design, implement, and maintain reliable and scalable cloud infrastructure

* Define and evolve SLIs, SLOs, and error budgets

* Improve monitoring, alerting, and observability across services

* Lead and participate in incident response, post-mortems, and root-cause analysis

* Automate repetitive operational tasks to reduce toil

* Collaborate with Backend engineers on service design, scalability, and failure modes

* Improve CI/CD pipelines, deployment strategies, and release safety

* Contribute to infrastructure as code and platform tooling

* Act as a reliability advocate across the engineering organization

Tech stack

* Cloud: Google Cloud Platform (preferred), AWS

* Containers & orchestration: Docker, Kubernetes (GKE)

* Infrastructure as Code: Terraform

* CI/CD: GitLab CI/CD

* Observability: Cloud Monitoring, Logging, Prometheus, Grafana

* Languages: Go, Python, Bash

* Networking & security: IAM, VPCs, service accounts, secrets management

What we expect from a senior SRE

* Strong experience running production systems at scale

* Solid understanding of distributed systems and failure modes

* Proven experience with SLO-driven reliability

* Strong coding skills

* Cloud infrastructure automation experience

* Ability to debug complex cross-system issues

* Ownership mindset and strong communication skills

* Pragmatic approach to reliability, speed, and cost trade-offs

Working model

* Flexible working hours

* Remote-friendly setup

* Small autonomous teams

* Direct collaboration with product and leadership

Bewerben

E-Mail Alert anlegen

Speichern