About the role

Ridgeline · Onsite

Senior Software Engineer, Site Reliability Engineering

Reno, NV; San Ramon, CA; NYC - Hybrid

Are you passionate about building resilient, highly available cloud platforms that enable engineering teams to move quickly and confidently? Do you enjoy automating complex operational challenges, improving observability, and eliminating manual toil through thoughtful engineering? Are you excited by the opportunity to support mission-critical production systems while collaborating with talented engineers in a fast-moving, innovative environment? If so, we invite you to be a part of our innovative team.

As a Site Reliability Engineer, you'll help ensure the reliability, scalability, and operational excellence of Ridgeline's mission-critical SaaS platform. You'll partner closely with product and platform engineers to improve service reliability, accelerate engineering velocity through automation, and build systems that are easier to operate from day one. Our team of engineers are building with cutting-edge technologies—like Claude Code and Cursor—in a fast-moving, creative, progressive work environment. You'll play a key role in advancing our observability, release engineering, incident response, and automation capabilities while contributing measurable improvements to platform stability and developer productivity.

At Ridgeline, how we work matters as much as what we build. Ridgeliners act like owners, choose growth over comfort, and communicate with transparency. We assume positive intent, bias toward action, and bring solutions—not just problems. We celebrate wins, learn from setbacks, and thrive in a resilient, collaborative, high-performing culture. If this excites you, we'd love to meet you!

You must be work authorized in the United States without the need for employer sponsorship.

The impact you will have

Improve the reliability, availability, and performance of Ridgeline's mission-critical production SaaS platform.
Build automation that measurably increases engineering velocity while reducing operational toil.
Own and improve production observability through metrics, structured logging, distributed tracing, dashboards, and actionable alerting.
Design and enhance CI/CD pipelines, deployment automation, progressive delivery strategies, and rollback mechanisms.
Define and improve Service Level Indicators (SLIs), Service Level Objectives (SLOs), and error budget practices to proactively manage reliability.
Identify capacity constraints and reliability risks before they impact customers.
Participate in an on-call rotation, triaging production issues, coordinating incident response, and driving issues to resolution with very infrequent after-hours support.
Lead blameless postmortems and implement long-term improvements that strengthen platform resilience.
Partner with software engineers on infrastructure design reviews to build highly operable, scalable services.
Develop Infrastructure as Code solutions using Terraform and AWS best practices.
Collaborate across a distributed engineering organization while fostering a culture of ownership, transparency, learning, and continuous improvement.

What we look for

3–6 years of experience in Site Reliability Engineering, DevOps, Platform Engineering, or a related discipline.
At least 2 years supporting mission-critical production SaaS workloads running on AWS.
Experience operating production systems where uptime, performance, and reliability are business critical.
Hands-on experience with AWS services including EC2, ECS or EKS, RDS, S3, IAM, CloudWatch, and managed database or messaging services.
Strong understanding of observability, including monitoring, alerting, distributed tracing, and production diagnostics.
Experience designing or significantly improving CI/CD pipelines using tools such as GitHub Actions, CircleCI, Buildkite, or similar platforms.
Experience with deployment strategies including blue/green, canary, or progressive rollouts.
Proficiency in Python, Go, Bash, or another scripting language used for automation and tooling.
Experience implementing Infrastructure as Code using Terraform.
Comfortable participating in an on-call rotation and leading incident response with composure.
Excellent communication skills with the ability to explain technical concepts to both technical and non-technical stakeholders.
Demonstrated ability to make measurable improvements to platform reliability, operational efficiency, or developer productivity.
Strong analytical and troubleshooting skills with a passion for solving complex technical challenges.
A collaborative mindset with a desire to learn, mentor others, and contribute to a positive engineering culture.

Bonus

Experience with Kubernetes and Helm.
Familiarity with chaos engineering or fault injection practices.
Experience building or contributing to SLO and error budget programs.
Working knowledge of Kotlin, Node.js, or TypeScript.
Experience supporting highly distributed cloud-native applications.
Bachelor's degree in Computer Science, Information Systems, or a related technical discipline.

About Ridgeline

Ridgeline is the industry cloud platform for investment management. It was founded by visionary tech entrepreneur Dave Duffield (co-founder of both PeopleSoft and Workday) to apply his successful formula of solving operational business challenges with bold innovation and human connectivity to the unique needs of the investment management industry.

Ridgeline started with a clean sheet of paper and a deep bench of experts bound by a set of core values and motivated to revolutionize an industry underserved by its current tech offerings. We are building a new, modern platform in the public cloud, purpose-built for the investment management industry and we are prioritizing security, agility, and usability to empower business like never before.

With a growing campus in Reno and offices in New York, Lake Tahoe, and the Bay Area, Ridgeline is proud to have built a fast-growing, people-first company that has been recognized by Fast Company as a “Best Workplace for Innovators,” by The Software Report as a “Top 100 Software Company,” and by Forbes as one of “America’s Best Startup Employers.”

Ridgeline is proud to be a community-minded, discrimination-free equal opportunity workplace.

Ridgeline processes the information you submit in connection with your application in accordance with the Ridgeline Applicant Privacy Statement. Please review the Ridgeline Applicant Privacy Statement in full to understand our privacy practices and contact us with any questions.

Compensation and Benefits

The cash compensation amount for this role is targeted at $153,000 - $210,000. Final compensation amounts are determined by multiple factors, including candidate location, candidate experience and expertise, and may vary from the amount listed above.

As an employee at Ridgeline, you’ll have many opportunities for advancement in your career and can make a true impact on the product.

In addition to the base salary, 100% of Ridgeline employees can participate in our Company Stock Plan subject to the applicable Stock Option Agreement. We also offer rich benefits that reflect the kind of organization we want to be: one in which our employees feel valued and are inspired to bring their best selves to work. These include unlimited vacation, educational and wellness reimbursements, and $0 cost employee insurance plans. Please check out our Careers page for a more comprehensive overview of our perks and benefits.

#LI-Hybrid

Ready to apply to Ridgeline?

Apply to Ridgeline

About the role

Senior Software Engineer, Site Reliability Engineering

Reno, NV; San Ramon, CA; NYC - Hybrid

The impact you will have

What we look for

Bonus

Similar jobs

Whoa — hold up

About the role

Senior Software Engineer, Site Reliability Engineering

Reno, NV; San Ramon, CA; NYC - Hybrid

The impact you will have

What we look for

Bonus

Similar jobs

Whoa — hold up

Catch your next role the second it’s posted.

Get the worldwide-remote edge.