THE WORK We are seeking a new Staff Software Engineer to join the Platform Engineering team of our Custody department. You'll play a key role in shaping our engineering practices, ensuring the reliability and performance of our platform across a multi-cloud environment, and empowering our engineering team to deliver innovative features with speed and efficiency. You will drive improvements in our automation, observability, and overall platform stability. WHAT YOU'LL DO Design, build, and maintain scalable and resilient infrastructure across Cloud providers such as Azure, AWS, GCP and IBM Cloud. Automating work including infrastructure needs, testing, failover solutions, failure mitigation, and much more. Implement and manage monitoring, alerting, and logging systems to ensure system visibility. Develop and maintain our CI/CD pipelines using GitLab CI. Proactively identify and resolve potential performance bottlenecks and reliability issues. Participate in on-call rotations to address production incidents and provide support for service engineers with customer incidents. Collaborate closely with development teams to integrate and deploy services efficiently. Contribute to the development and maintenance of our internal platform tools and services. Lead the implementation of standard methodologies for DevOps and SRE within the engineering organization. Document processes and procedures. Automate any software maintenance processes which previously required a manual procedure. WHAT YOU'LL BRING 10 years’ experience with software engineering, platforming engineering or system operations on high available and high traffic environments Strong experience with Linux-based infrastructures, Linux/Unix administration Experience with databases such as PostgreSQL Experience administering Linux servers as well as docker based infrastructure (like Kubernetes, AKS, etc.) in a highly available environment Experience of scripting languages such as Go or Bash Experience with message broker/queue technologies like RabbitMQ, AMQP 1.0 Experience with modern monitoring, logging and observability tools in complex distributed systems such as with Application Insights, Grafana, Elastic stack, Datadog, Prometheus, etc Practical experience with infrastructure-as-code (with tools like Terraform) Familiarity with GitOps deployment practices using ArgoCD or Flux Good understanding of cybersecurity fundamentals and best practices Troubleshooting skills with the ability to spot issues before they become problems Excellent problem-solving and communication skills Committed to processes, with excellent documentation skills and a strong ability to work well in a team! WHO WE ARE: Do Your Best Work The opportunity to build in a fast-paced start-up environment with experienced industry leaders A learning environment where you can dive deep into the latest technologies and make an impact. A professional development budget to support other modes of learning. Thrive in an environment where no matter what race, ethnicity, gender, origin, or culture they identify with, every employee is a respected, valued, and empowered part of the team. In-office collaboration for moments that matter is important to our culture, and we give managers and teams the flexibility to decide which 10 days a month they come in. Bi-weekly all-company meeting - business updates and ask me anything style discussion with our Leadership Team We come together for moments that matter which include team offsites, team bonding activities, happy hours and more!