Alpaca

Site Reliability Engineer

8.0/10

Alpaca

$98,000 โ€“ $162,000 USD
Remote
mid
6 days ago
cryptodevfintechtechweb3Incident ResponseSRELinuxPythonKubernetesCloud NetworkingPostgreSQL

AI Summary

The vacancy is well-structured, providing clear expectations and compensation details, making it attractive for applicants.

Check Match โ€” Just drop your CV

See your fit for Site Reliability Engineer in seconds.

Description

Your Role

As a Site Reliability Engineer at Alpaca, you'll help keep our brokerage platform reliable, observable, and operable as we grow - working across our cloud infrastructure, Kubernetes platform, observability stack, messaging layer, and data layer.

Things You Get To Do

  • โ€ขOperate production day-to-day - oncall, incident response, postmortems, and the follow-ups that actually close the loop.
  • โ€ขOwn reliability practice - define and refine SLIs/SLOs and error budgets, and help product teams live within them.
  • โ€ขStrengthen our observability across metrics, logs, traces, and alerting.
  • โ€ขShip infrastructure through code in a GitOps workflow - cloud resources and Kubernetes workloads alike.
  • โ€ขLook after PostgreSQL: performance tuning, schema and migration review, online migrations on large tables, HA/DR, and CDC pipelines.
  • โ€ขMentor engineers on reliability and database fundamentals through code review, design review, and pairing.

How We Take Care of You

  • โ€ขCompetitive Salary & Stock Options
  • โ€ขHealth Benefits
  • โ€ขNew Hire Home-Office Setup: One-time USD $500
  • โ€ขMonthly Stipend: USD $150 per month via a Brex Card.

Alpaca is proud to be an equal opportunity workplace dedicated to pursuing and hiring a diverse workforce.

Requirements

Who You Are (must-haves)

  • โ€ข4+ years in SRE, DevOps, Platform/Infrastructure, or backend engineering with significant production operations ownership.
  • โ€ขHands-on experience operating production services on Kubernetes, and shipping infrastructure as code in a GitOps workflow.
  • โ€ขSolid working knowledge of PostgreSQL in production โ€” query plans, pg_stat_*, indexing and schema trade-offs, and what a safe online migration looks like on a non-trivial table.
  • โ€ขCloud networking fundamentals (VPCs, routing, L4/L7 load balancing, DNS, TLS) and comfort debugging cross-service connectivity.
  • โ€ขComfortable with a modern observability stack and proficient with Linux at the operator level.
  • โ€ขPracticed in incident response - calm under pressure, structured debugging, postmortems that drive change.
  • โ€ขAt least working proficiency in Go or Python, plus strong written and verbal communication.
  • โ€ขGenuine interest in databases and in growing your PostgreSQL/DBA expertise.
Loading similar jobs...