Chainlink Labs

Senior Site Reliability Engineer, Observability

9.0/10

Chainlink Labs

$129,000 – $304,000 USD
Remote
senior
about 4 hours ago
aicryptodevweb3AWSTerraformKubernetesPrometheusGrafanaGitHub ActionsPacker

AI Summary

The vacancy is well-structured, providing clear expectations and a comprehensive overview of the role and company.

Check Match β€” Just drop your CV

See your fit for Senior Site Reliability Engineer, Observability in seconds.

Description

What you'll do

  • β€’Build and orchestrate Modern OTEL-based Observability Platform
  • β€’Support multiple telemetry types, like metrics, logs and traces.
  • β€’Define and support modern governance in observability and problems at scale.
  • β€’Ensure reliability, security, and performance exceed our defined SLAs
  • β€’Work with engineers from across the company to help troubleshoot issues, deploy new products and services, and increase velocity while decreasing cognitive load
  • β€’Lead the design and deployment of monitoring/observability services to detect and alert the team of needed action.
  • β€’Ingest, aggregate, transform, and utilize data from a multitude of sources in our real time data pipeline.
  • β€’Oversee the availability, performance, and supportability of our observability infrastructure.
  • β€’Create processes around alert response operations and support the team to ensure the reliable delivery of oracle data.
  • β€’Make recommendations to ensure sufficient metrics are collected to create alerts with every new feature release.
  • β€’Champion reliability and security by taking the time to do your work right the first time.

Conditions

  • β€’Competitive salary ranging from USD 129,000 to USD 304,000
  • β€’Opportunities for growth and learning in a remote environment
  • β€’Commitment to equal opportunity and support for diverse backgrounds.

Requirements

  • β€’7+ years of relevant professional experience.

You probably have worked on a devops, infrastructure, SRE, and/or platform team before

  • β€’Ability to develop software outside of the scope of typical infrastructure requirements and configurations
  • β€’Experience programming in C, C++, Java, Python, Go, Perl, or Ruby
  • β€’Expert knowledge in all aspects of designing, developing, and managing large real-time systems
  • β€’Experience with monitoring and logging.

You know how to export metrics using Prometheus, have built a Grafana dashboard or two, and have experience with a centralized logging solution like an ELK Stack, Splunk or Grafana Stack.

  • β€’Experience with distributed systems and container orchestration.

You have maintained or even built Kubernetes clusters before and feel comfortable deploying completely new services on them

  • β€’Strong communication skills.

You can give and receive constructive feedback, and you do not shy away from planning meetings and code reviews.

Loading similar jobs...