Kronos Research

Senior SRE Engineer

6.0/10

Kronos Research

Not specified
Office / on-site
senior
about 5 hours ago
techLinuxBashAnsiblePythonHPCAWSAlibaba CloudGCPTerraformDockerKubernetes
AI SummaryVerified by Aipplify AI

The vacancy is well-defined but lacks compensation details, affecting overall attractiveness to applicants.

AI quality score6.5 / 10

Check Match โ€” Just drop your CV

See your fit for Senior SRE Engineer in seconds.

Overview

Kronos Research is seeking a Senior SRE Engineer to manage large-scale Linux environments, operate HPC clusters, and manage multi-cloud infrastructures. The role requires strong autonomy and experience in Linux systems administration.

Linux Systems & Automation (Core)

  • โ€ขManage large-scale Linux environments: troubleshooting and root-cause analysis
  • โ€ขWrite maintainable, hand-off-ready Bash / Ansible / Python automation
  • โ€ขOn-call for infrastructure, CI/CD, and production service incidents

HPC Cluster & Storage

  • โ€ขOperate HPC clusters (Slurm) along with usage analytics, auditing, and monitoring tools
  • โ€ขMaintain and plan storage for compute environments (Lustre, NAS)

Cloud & Hybrid Infrastructure

  • โ€ขManage multi-cloud environments (AWS, Alibaba Cloud, GCP) with Terraform / AWS CDK
  • โ€ขBuild and operate Docker (ECS) / Kubernetes (EKS) environments and their deployment workflows

CI/CD & Developer Experience

  • โ€ขOperate self-hosted GitLab server and Runner fleet
  • โ€ขOperate CI/CD systems and design deployment pipelines for research and other projects

GenAI / Internal Platform

  • โ€ขBuild internal AI platforms (LangChain / LangGraph / Bedrock, Elasticsearch RAG)
  • โ€ขDevelop MCP servers, chatbots, AI agents, and similar services

Requirements

  • โ€ข5+ years of hands-on Linux systems administration and infrastructure operations experience
  • โ€ขSolid Linux internals knowledge (process / memory / filesystem / networking / systemd / cgroup); able to localize issues even without complete logs
  • โ€ขStrong Bash / Shell scripting skills โ€” able to write maintainable scripts that others can pick up
  • โ€ขProgramming ability for data processing, CLI tools, and API services; Python proficiency preferred
  • โ€ขSolid storage fundamentals with hands-on experience: RAID levels and rebuild trade-offs, filesystem selection, snapshot and backup planning; NAS / shared storage (NFS / SMB) operations experience
  • โ€ขExperience with at least one major public cloud (AWS / GCP / Alibaba Cloud) and IaC tooling (Terraform / CDK / Ansible)
  • โ€ขFamiliar with containerization and orchestration (Docker, Kubernetes)
  • โ€ขCI/CD pipeline design and operations experience (GitLab CI / Jenkins / Airflow)
  • โ€ขAble to own a cross-service subsystem end-to-end: design, implementation, documentation, handoff
  • โ€ขStrong autonomy: can drive a problem from discovery, root-cause investigation, decision-making, to delivery with minimal supervision; able to make judgment calls under incomplete information and proactively communicate progress, risks, and rationale
  • โ€ขSelf-directed: doesn't wait for tickets โ€” identifies problems worth solving and prioritizes them independently
Loading similar jobs...