About the role

Ripple · Onsite

At Ripple, we’re building a world where value moves like information does today. It’s big, it’s bold, and we’re already doing it. Through our crypto solutions for financial institutions, businesses, governments and developers, we are improving the global financial system and creating greater economic fairness and opportunity for more people, in more places around the world. And we get to do the best work of our career and grow our skills surrounded by colleagues who have our backs.

If you’re ready to see your impact and unlock incredible career growth opportunities, join us, and build real world value.

At Ripple, we’re building a world where value moves like information does today. Through our crypto solutions for financial institutions, businesses, governments, and developers, we are improving the global financial system and creating greater economic fairness and opportunity for more people, in more places around the world.

Ripple Treasury, now a Ripple solution acquired in 2025, marks a significant expansion into the multi-trillion-dollar corporate finance arena. With more than 40 years of experience supporting some of the world’s largest and most sophisticated companies, Ripple Treasury integrates a treasury command center into Ripple’s technology stack—giving corporates the ability to move, manage, and optimize liquidity in real-time, across traditional and digital assets, under one expanded umbrella.

THE WORK:

This is an engineering-first role with a coaching dimension—not the other way around. You will spend the majority of your time doing hands-on observability and reliability engineering work: building instrumentation, designing alert configurations, authoring Terraform, and troubleshooting production systems. Alongside that, you will coach and consult with stream-aligned product teams, helping them build operational maturity over time.

You will join Ripple’s Technical Operations team and work across Azure (80%) and AWS (20%) environments supporting infrastructure that is predominantly Windows-based (80%), handling significant payment volume for enterprise treasury customers. The incident management program you will help build is early-stage—you will be establishing practices, not inheriting a mature playbook.

WHAT YOU’LL DO:

Observability Engineering
- Design and implement monitoring, alerting, and dashboards in New Relic (APM, Infrastructure, Logs, Synthetics) across Azure and AWS; write NRQL queries for troubleshooting, analysis, and reporting.
- Define and implement SLOs/SLIs and error budgets; coach teams on using them to balance feature velocity with reliability and communicate system health to stakeholders.
- Lead alert noise reduction and signal quality engineering—tune thresholds, eliminate false positives, and ensure every alert is actionable.
- Optimize observability costs through log ingestion management, pipeline rules, and New Relic configuration governance.
- Partner with engineering teams to improve observability maturity: structured logging, metrics instrumentation (RED/USE methods), distributed tracing, and effective dashboard patterns.
Infrastructure & IaC
- Develop and maintain Terraform infrastructure as code for provisioning and managing monitoring resources, alert configurations, and observability infrastructure—this is a primary engineering responsibility, not an occasional task.
- Establish and enforce IaC governance standards for observability infrastructure across teams, providing a repeatable, auditable model for how monitoring resources are managed.
- Author and troubleshoot Azure DevOps pipelines; support teams with deployment visibility, change tracking, and release hygiene as it relates to production reliability.
Incident Management
- Administer and configure Incident.IO: alert routing, notification workflows, Slack and OpsGenie integration, and runbook management—operationalizing what exists today and expanding from there.
- Build out incident management foundations that are largely yours to establish: PIR/postmortem processes, on-call rotation design, escalation policies, incident severity classification, and response playbooks.
- Track and report on MTTR, MTTD, and incident frequency; identify trends and drive continuous improvement in partnership with engineering teams.
- Respond to and debrief on production incidents—providing real-time troubleshooting support and facilitating structured post-incident reviews.
Cross-Functional Enablement
- Enable stream-aligned engineering teams to adopt improved observability and incident management practices through workshops, consultation, and hands-on guidance.
- Collaborate with the Subsystems Platform Team to translate common needs into self-service observability and incident management capabilities.
- Build lasting team competency through documentation, training materials, and knowledge-sharing sessions that outlast any individual engagement.

WHAT YOU'LL BRING:

Core SRE Experience
- 7+ years in Site Reliability Engineering, DevOps, or Platform Engineering with a strong focus on observability and production operations.
- Proven ability to deliver hands-on engineering work while coaching and mentoring teams—comfortable switching between builder and consultant modes.
- Experience working in Agile/Scrum environments and collaborating effectively with cross-functional teams.
Observability & Incident Management Expertise — Required
- Expert-level hands-on experience with New Relic (APM, Infrastructure, Logs, Synthetics, Alerts) and strong NRQL proficiency for troubleshooting and analysis.
- Deep understanding of structured logging, metrics collection (RED/USE methods), distributed tracing, and designing effective dashboards and alerts.
- Expertise defining and implementing SLOs/SLIs and error budgets for reliability management.
- Hands-on experience with incident management platforms (Incident.IO, PagerDuty, OpsGenie, or similar).
- Experience designing incident response workflows, on-call rotations, escalation policies, and facilitating post-incident reviews that drive actionable improvements.
- Demonstrated ability to troubleshoot complex production issues using observability data across distributed systems.
Infrastructure & Tools — Required
- Strong Terraform experience: developing and maintaining IaC for cloud infrastructure and monitoring resources; familiarity with IaC governance patterns.
- Proficiency with PowerShell scripting (required given the 80% Windows environment).
- Strong experience with Azure cloud (App Services, Virtual Machines, Azure SQL, networking, monitoring) and working knowledge of AWS.
- Experience with Azure DevOps for CI/CD pipeline authoring and troubleshooting.
- Experience with Octopus Deploy for deployment management and release orchestration.
- Comfort working across both Windows and Linux server environments.
- Familiarity with Slack for operational workflows, alert routing, and incident communication.
Desired / Additional
- Experience with alert noise reduction strategies and observability cost optimization (log ingestion, pipeline rules, cardinality management).
- Background facilitating chaos engineering, game day exercises, or failure injection to build team resilience.
- Knowledge of VM-hosted SQL Server monitoring and performance optimization.
- Familiarity with FinTech compliance requirements (SOC 2, ISO 27001) and audit evidence collection.
- Experience measuring and improving key reliability metrics (MTTR, MTTD, availability, error budgets) at an organizational level.
- Python or Bash scripting experience in addition to PowerShell.
- Familiarity with Jira for incident tracking and workflow automation.
Other common names for this role: Senior Site Reliability Engineer, Observability Engineer, Incident Management Engineer

For positions that will be based in NY, the annual salary range for this position is below. Actual salaries may vary based on numerous factors including, among other things, an individual applicant’s experience and qualifications for the position. This range does not include equity or additional compensation, such as bonuses or commissions.

NY Annual Base Salary Range

$160,000—$200,000 USD

WHO WE ARE:

Do Your Best Work

The opportunity to build in a fast-paced start-up environment with experienced industry leaders
A learning environment where you can dive deep into the latest technologies and make an impact. A professional development budget to support other modes of learning.
Thrive in an environment where no matter what race, ethnicity, gender, origin, or culture they identify with, every employee is a respected, valued, and empowered part of the team.
In-office collaboration for moments that matter is important to our culture, and we give managers and teams the flexibility to decide which 10+ days a month they come in.
Bi-weekly all-company meeting - business updates and ask me anything style discussion with our Leadership Team
We come together for moments that matter which include team offsites, team bonding activities, happy hours and more!

Take Control of Your Finances

Competitive salary, bonuses, and equity
Competitive benefits that cover physical and mental healthcare, retirement, family forming, and family support
Employee giving match
Mobile phone stipend

Take Care of Yourself

R&R days so you can rest and recharge
Generous wellness reimbursement and weekly onsite & virtual programming
Generous vacation policy - work with your manager to take time off when you need it
Industry-leading parental leave policies. Family planning benefits.
Catered lunches, fully-stocked kitchens with premium snacks/beverages, and plenty of fun events

Benefits listed above are for full-time employees.

Ripple is an Equal Opportunity Employer. We’re committed to building a diverse and inclusive team. We do not discriminate against qualified employees or applicants because of race, color, religion, gender identity, sex, sexual identity, pregnancy, national origin, ancestry, citizenship, age, marital status, physical disability, mental disability, medical condition, military status, or any other characteristic protected by local law or ordinance.

Please find our UK/EU Applicant Privacy Notice and our California Applicant Privacy Notice for reference.

Ready to apply to Ripple?

Apply to Ripple

About the role

Observability Engineering

Infrastructure & IaC

Incident Management

Cross-Functional Enablement

Core SRE Experience

Similar jobs

Whoa — hold up

About the role

Observability Engineering

Infrastructure & IaC

Incident Management

Cross-Functional Enablement

Core SRE Experience

Similar jobs

Whoa — hold up

Catch your next role the second it’s posted.

Get the worldwide-remote edge.