Senior SRE Engineer
6.0/10
Kronos Research
Not specified
Office / on-site
senior
about 5 hours ago
techLinuxBashAnsiblePythonHPCAWSAlibaba CloudGCPTerraformDockerKubernetes
AI SummaryVerified by Aipplify AI
The vacancy is well-defined but lacks compensation details, affecting overall attractiveness to applicants.
AI quality score6.5 / 10
Check Match โ Just drop your CV
See your fit for Senior SRE Engineer in seconds.
Overview
Kronos Research is seeking a Senior SRE Engineer to manage large-scale Linux environments, operate HPC clusters, and manage multi-cloud infrastructures. The role requires strong autonomy and experience in Linux systems administration.
Linux Systems & Automation (Core)
- โขManage large-scale Linux environments: troubleshooting and root-cause analysis
- โขWrite maintainable, hand-off-ready Bash / Ansible / Python automation
- โขOn-call for infrastructure, CI/CD, and production service incidents
HPC Cluster & Storage
- โขOperate HPC clusters (Slurm) along with usage analytics, auditing, and monitoring tools
- โขMaintain and plan storage for compute environments (Lustre, NAS)
Cloud & Hybrid Infrastructure
- โขManage multi-cloud environments (AWS, Alibaba Cloud, GCP) with Terraform / AWS CDK
- โขBuild and operate Docker (ECS) / Kubernetes (EKS) environments and their deployment workflows
CI/CD & Developer Experience
- โขOperate self-hosted GitLab server and Runner fleet
- โขOperate CI/CD systems and design deployment pipelines for research and other projects
GenAI / Internal Platform
- โขBuild internal AI platforms (LangChain / LangGraph / Bedrock, Elasticsearch RAG)
- โขDevelop MCP servers, chatbots, AI agents, and similar services
Requirements
- โข5+ years of hands-on Linux systems administration and infrastructure operations experience
- โขSolid Linux internals knowledge (process / memory / filesystem / networking / systemd / cgroup); able to localize issues even without complete logs
- โขStrong Bash / Shell scripting skills โ able to write maintainable scripts that others can pick up
- โขProgramming ability for data processing, CLI tools, and API services; Python proficiency preferred
- โขSolid storage fundamentals with hands-on experience: RAID levels and rebuild trade-offs, filesystem selection, snapshot and backup planning; NAS / shared storage (NFS / SMB) operations experience
- โขExperience with at least one major public cloud (AWS / GCP / Alibaba Cloud) and IaC tooling (Terraform / CDK / Ansible)
- โขFamiliar with containerization and orchestration (Docker, Kubernetes)
- โขCI/CD pipeline design and operations experience (GitLab CI / Jenkins / Airflow)
- โขAble to own a cross-service subsystem end-to-end: design, implementation, documentation, handoff
- โขStrong autonomy: can drive a problem from discovery, root-cause investigation, decision-making, to delivery with minimal supervision; able to make judgment calls under incomplete information and proactively communicate progress, risks, and rationale
- โขSelf-directed: doesn't wait for tickets โ identifies problems worth solving and prioritizes them independently
Loading similar jobs...