Requirements :
* Previous experience managing RabbitMQ, Apache Kafka, Redis, or Solr/SolrCloud in critical environments.
* Minimum of 3 years in DevOps or Site Reliability Engineering, focusing on high-availability systems and production ownership.
* Experience with automation and self-service tools such as Git, Jenkins, Ansible, Terraform for messaging infrastructure management.
* Proficiency with scripting languages (e.g., Python, Bash) for automation purposes.
* Knowledge of monitoring and alerting tools like Prometheus and Grafana for system performance tracking and proactive alerts.
* Strong understanding of security best practices, including access management and data encryption.
* Familiarity with containerization (Docker, Kubernetes) and their integration with messaging and search platforms.
* Experience with version control systems such as Git.
* Encryption expertise.
* Nice-to-Haves :
* Experience with additional messaging platforms or middleware tools.
* Knowledge of disaster recovery and business continuity for messaging and search infrastructure.
* Experience troubleshooting distributed systems and managing their challenges.
* Designing and deploying cloud-native messaging/search systems or managing hybrid environments.
Responsibilities :
* Manage messaging software (Kafka, RabbitMQ, Redis, Solr, Ansible, GitHub).
* Design and implement continuous deployment pipelines.
* Enhance automation of middleware installation via Ansible and Kubernetes.
* Collaborate with development and operations teams for technical support and troubleshooting.
* Monitor and manage KPIs using Grafana, Prometheus, Kibana, and PagerDuty.
* Build dashboards to track infrastructure health and SLA compliance.
* Configure and audit logs for compliant monitoring.
* Provide clear insights into platform health.
* Develop alerting mechanisms based on metrics and logs.
* Ensure security compliance (Vault, TLS, SASL, Encryption, ACL, RBAC, LDAP).
* Adhere to bank security standards and FINMA regulations.
* Promote best practices for software usage.
* Maintain documentation and tooling (Teams, Jira, Readme, Confluence).
* Write and review operational procedures.
* Document infrastructure comprehensively.
* Support developers and operations teams with resources.
* Research and develop new solutions, including cloud migration strategies like Kubernetes.
* Introduce and benchmark new tools and solutions.
#J-18808-Ljbffr