
Senior Site Reliability Engineer
The vacancy is well-structured with clear responsibilities and requirements, though some details on compensation and company links could improve clarity.
Check Match β Just drop your CV
See your fit for Senior Site Reliability Engineer in seconds.
Overview
Join Manychat as a Senior Site Reliability Engineer to enhance our cloud infrastructure and ensure platform reliability. Work remotely with a diverse team and enjoy competitive salary and benefits. We help creators get more out of every conversation with Instagram-focused automations and support for other channels like Messenger, WhatsApp, and TikTok. The result? Better engagement, more sales, and real, sustainable growth. With a diverse team of 350+ people spread across three continents, weβre building the leading Chat Marketing platform that is used β and loved β by more than 1.5 million customers worldwide.
What Youβll Do
- β’Maintain and harden AWS infrastructure (EC2, ALB/NLB, WAF, IAM, CloudWatch)
- β’Operate and evolve our EKS clusters powering Python-based AI services
- β’Migrate existing services to Kubernetes using Terraform and Helm
- β’Codify infrastructure with Terraform and manage host-level automation via Ansible
- β’Build and improve CI/CD pipelines with GitHub Actions
- β’Own observability efforts: Prometheus, Grafana, alerting, and on-call readiness
- β’Support OS-level patching, certs, WAF rules, and general infra hygiene
- β’Partner with engineers to guide best practices and drive platform reliability
- β’Create clean, maintainable infrastructure documentation and playbooks
- β’Occasionally support rare off-hours incidents (donβt worry, really rare)
What We Offer
- β’Hybrid onboarding to start work remotely and relocation support for you and your family.
- β’Comprehensive health insurance for both you and your family.
- β’Professional development budget for conference tickets, online courses, and other relevant resources to help you grow.
- β’Flexible benefits package to tailor perks that matter most for you.
- β’Hybrid work and generous leave options to prioritize your work-life balance.
- β’In-office perks, including free meals and snacks.
- β’Company-funded sport activities, annual offsites and team-building events.
To Shine in This Role
- β’5+ years of experience managing Linux in production (Ubuntu, Amazon Linux)
- β’Strong experience with Kubernetes (ideally EKS), Helm, and Terraform
- β’Comfort with running and debugging Python workloads in containers
- β’Solid understanding of networking, IAM, and cloud security best practices
- β’Hands-on Nginx experience (Ingress and reverse proxy setups)
- β’Excellent communication skills; you can explain complex infra to devs clearly
Nice to Have Skills
- β’Strong Ansible skills beyond the basics
- β’PostgreSQL or Amazon RDS tuning and operations experience
- β’Deep understanding of observability tools (Prometheus, Grafana, Loki, etc.)
- β’Familiarity with PHP production environments
- β’Experience with TDD, CI/CD best practices, and agile development
- β’Any previous SRE-like exposure such as building resilience, automation, or incident tooling