Select how often (in days) to receive an alert:
Site Reliability Engineer (SRE)
Posted on: 26 Apr 2025
Location: Huechuraba, Chile
Department: Customer Projects Deployment & Services
Job Family: Information Technology
SITE RELIABILITY ENGINEER
Site Reliability Engineer (SRE)
AIM OF THE JOB:
As an SRE, responsible for responding to incidents and escalations. This includes on-call support and escalation support that may be required after office hours and during weekends. A support duty roster shall be implemented. On Technical Support, be competent in troubleshooting and investigating technical problems, perform RCA, recommend resolutions, and implement workarounds when a software fix is not yet available. On Solution and Observability Monitoring, be competent in developing, customizing, and implementing monitoring of the solution. On Continuous Delivery, be responsible for deploying new versions of applications. On Solution Quality Assurance, participate with Product Development and DevOps in development testing activities (FAT) and drive solution testing during deployment (SAT). Proactively share knowledge with team members and the SRE community. Possess a curious mindset that is always learning new things or making improvements.
Main responsibilities and activities:
* Implement solution monitoring and observability, automate detections and responses
* Implement SLI and SLO measurements and monitoring in our Solution Monitoring
* Conduct service improvement actions and review with the team using data from SLI and SLO
* Troubleshoot incidents, perform post-incident analysis, and root cause analysis
* Implement workarounds to prevent recurrence of incidents and improve monitoring detection
* Implement observability monitoring and perform distributed tracing analysis of applications
* Deploy new application releases to pre-production and production environments
* Participate and contribute to automation in deployment, testing, and monitoring detection
* Collaborate with SQC team on testing automation deployment and with DevOps on continuous delivery
* Participate in planning and review sessions with Development, DevOps, and Platform teams
* Expand and grow technical knowledge, skillsets, and expertise expected of an SRE
* Create and document artifacts related to SRE practices, such as good practices, patterns, dashboards, workarounds, troubleshooting methods, and monitoring improvements.
PROFILE:
* College degree or technical training in Computer Science, Software Engineering, or equivalent experience
* At least 5 years of work experience, including at least 3 years in software development and 2 years in IT operations, support, or system administration. Experience in application maintenance, troubleshooting, bug fixing, testing, and application management is essential.
TECHNICAL SKILLS:
* Troubleshooting and debugging applications and complex systems
* Application tracing and log analysis
* Linux and VM experience
* Hands-on experience with Shell scripting
* Application deployment and tools (e.g., Jenkins)
* Knowledge of at least one database (schema understanding, DML using SQL)
* Programming skills in at least one language (e.g., Python, C, Java)
* Incident resolution, root cause analysis, and incident management
* Experience with JIRA, ITSM ticketing, documentation tools (Wiki), Nagios, Splunk, Docker, OpenShift, Kubernetes, and automation tools (e.g., Ansible)
#J-18808-Ljbffr