Senior Site Reliability Engineer, Observability
9.0/10
Chainlink Labs
$129,000 β $304,000 USD
Remote
senior
about 4 hours ago
aicryptodevweb3AWSTerraformKubernetesPrometheusGrafanaGitHub ActionsPacker
AI Summary
The vacancy is well-structured, providing clear expectations and a comprehensive overview of the role and company.
Check Match β Just drop your CV
See your fit for Senior Site Reliability Engineer, Observability in seconds.
Description
What you'll do
- β’Build and orchestrate Modern OTEL-based Observability Platform
- β’Support multiple telemetry types, like metrics, logs and traces.
- β’Define and support modern governance in observability and problems at scale.
- β’Ensure reliability, security, and performance exceed our defined SLAs
- β’Work with engineers from across the company to help troubleshoot issues, deploy new products and services, and increase velocity while decreasing cognitive load
- β’Lead the design and deployment of monitoring/observability services to detect and alert the team of needed action.
- β’Ingest, aggregate, transform, and utilize data from a multitude of sources in our real time data pipeline.
- β’Oversee the availability, performance, and supportability of our observability infrastructure.
- β’Create processes around alert response operations and support the team to ensure the reliable delivery of oracle data.
- β’Make recommendations to ensure sufficient metrics are collected to create alerts with every new feature release.
- β’Champion reliability and security by taking the time to do your work right the first time.
Conditions
- β’Competitive salary ranging from USD 129,000 to USD 304,000
- β’Opportunities for growth and learning in a remote environment
- β’Commitment to equal opportunity and support for diverse backgrounds.
Requirements
- β’7+ years of relevant professional experience.
You probably have worked on a devops, infrastructure, SRE, and/or platform team before
- β’Ability to develop software outside of the scope of typical infrastructure requirements and configurations
- β’Experience programming in C, C++, Java, Python, Go, Perl, or Ruby
- β’Expert knowledge in all aspects of designing, developing, and managing large real-time systems
- β’Experience with monitoring and logging.
You know how to export metrics using Prometheus, have built a Grafana dashboard or two, and have experience with a centralized logging solution like an ELK Stack, Splunk or Grafana Stack.
- β’Experience with distributed systems and container orchestration.
You have maintained or even built Kubernetes clusters before and feel comfortable deploying completely new services on them
- β’Strong communication skills.
You can give and receive constructive feedback, and you do not shy away from planning meetings and code reviews.
Loading similar jobs...