Principal Site Reliability Engineer
The vacancy is well-structured with clear responsibilities and compensation, though some details could be improved.
Check Match — Just drop your CV
See your fit for Principal Site Reliability Engineer in seconds.
Overview
Join Copper as a Principal Site Reliability Engineer to enhance system reliability and operational excellence in a remote role focused on blockchain technology and financial services. Since being founded in 2018, Copper has been building the standard for institutional digital asset infrastructure with a focus on custody, collateral management, and prime services. Led by Amar Kuchinad, Copper's Global CEO, the firm provides a comprehensive suite of custody, trading and settlement solutions that reduce counterparty risk and bring greater capital and operational efficiency to digital asset markets. At the heart of Copper's offering is Multi-Party Computation (MPC) technology – the gold standard in secure custody. Copper’s multi-award winning custody system is unique in that it can be connected to centralised exchanges, DeFi applications and even staking pools without the assets leaving the custody. Built on top of this state-of-the-art custody, ClearLoop is the first solution in the market that overcomes a growing industry challenge; counterparty risk with exchanges. This solution underpins a full prime services offering, connecting global exchanges, and enabling customers to trade and settle directly from the safety of their MPC-secured wallets. By reducing settlement time for transfers to a few milliseconds (without blockchain network dependency) and offering enhanced security measures, ClearLoop is rapidly reshaping the way asset managers trade and manage capital. In addition to industry-leading security certifications, Copper has one of the strongest insurance coverages in the industry from an A+ rated insurer, positioning the firm as the partner of choice for institutions seeking to safeguard their assets.
Key Responsibilities
- •Shape SRE;
- •Define how we think about reliability, observability, and operational excellence. Drive the adoption of SRE principles across the organization while building the systems and processes that make those principles measurable – think SLIs, SLOs and error budgets.
- •Scale Through Automation;
- •Champion architectural improvements that enhance both system reliability and deployment velocity. Provide consultation on system architecture, building reusable platforms and frameworks, planning capacity needs, and conducting production readiness reviews to ensure services launch and operate successfully.
- •Drive Technical Excellence;
- •Engage in and improve the lifecycle of microservices, from inception through deployment, operation, observability, and continuous refinement.
- •Lead Through Influence;
- •Partner with engineering and product leadership to embed reliability into our product development lifecycle. Conduct blameless postmortems and drive systemic improvements in incident management. Mentor engineers across the organisation on SRE practices, helping teams take ownership of their service reliability.
Benefits
- •35 Days paid time off per annum, inclusive of annual leave and public holidays. Employees also receive one additional day of annual leave for each year of service.
- •Private Health Insurance.
Skills and Experience
#### Essential
- •Experience in designing, analysing, and troubleshooting distributed systems or micro-services architectures.
- •Established expertise in observability and incident management.
- •Proven experience in driving organizational Change.
- •Excellent communication skills, with a systematic problem-solving approach.
#### Desirable
- •Experience working with production workloads in AWS.
- •Experience working in financial services or similarly regulated environments.
- •Interest in blockchain based technologies and/or ‘decentralised’ finance.
- •Master's degree in Computer Science or Engineering.