Companies Oowlish Senior Site Reliability Engineer (SRE)

About the role

Oowlish · Remote
Join Our Team

Oowlish, one of Latin America's rapidly expanding software development companies, is seeking experienced technology professionals to enhance our diverse and vibrant team.

As a valued member of Oowlish, you will collaborate with premier clients from the United States and Europe, contributing to pioneering digital solutions. Our commitment to creating a nurturing work environment is recognized by our certification as a Great Place to Work, where you will have opportunities for professional development, growth, and a chance to make a significant international impact.

We offer the convenience of remote work, allowing you to craft a work-life balance that suits your personal and professional needs. We're looking for candidates who are passionate about technology, proficient in English, and excited to engage in remote collaboration for a worldwide presence.

About the Role:
 

We are looking for an experienced Senior Site Reliability Engineer (SRE) to own the reliability, availability, and operational excellence of business-critical production systems.

This is a dedicated Site Reliability Engineering role—not a general DevOps or Infrastructure position. You will define how reliability is measured, lead incident response during production outages, drive observability strategy, and continuously improve operational practices across high-availability environments.

The ideal candidate has hands-on experience managing SLOs, leading major incidents, improving on-call operations, and building a strong reliability culture through automation, observability, and continuous improvement.

Responsibilities:

  • Define, implement, and continuously improve Service Level Indicators (SLIs), Service Level Objectives (SLOs), and Error Budgets.
  • Develop and maintain observability strategies, including monitoring, logging, tracing, and alerting.
  • Own observability configuration, instrumentation, and alert optimization.
  • Lead Incident Command during production incidents and coordinate cross-functional response efforts.
  • Drive blameless postmortems and ensure corrective actions are completed.
  • Own and continuously improve the on-call program, including rotations, escalation policies, runbooks, and alert tuning.
  • Establish production readiness standards for new services.
  • Partner with engineering teams on capacity planning, scalability, and disaster recovery initiatives.
  • Automate operational processes and reliability improvements using software engineering best practices.
  • Continuously improve system reliability, availability, and operational efficiency.
  • Requirements:

  • 5+ years of experience in Site Reliability Engineering, Production Engineering, Reliability Engineering, or similar roles.
  • Proven experience operating production systems in high-availability environments.
  • Hands-on experience defining and managing SLOs, SLIs, and Error Budgets.
  • Experience leading production incident response and Incident Command.
  • Strong observability and monitoring experience.
  • Strong software engineering skills using Python, Go, or TypeScript.
  • Experience working with cloud platforms.
  • Strong written and verbal English communication skills.
  • Must have:

  • Proven Site Reliability Engineering experience.
  • Experience defining and managing:
    • Service Level Indicators (SLIs)
    • Service Level Objectives (SLOs)
    • Error Budgets
    • Experience leading Incident Command during major production incidents.
    • Experience conducting blameless postmortems and driving follow-up actions.
    • Experience designing, maintaining, and improving on-call programs.
    • Experience developing runbooks and escalation policies.
    • Strong observability experience, including:
      • Monitoring
      • Logging
      • Alerting
      • Distributed Tracing
      • Experience tuning alerts to reduce operational noise.
      • Strong automation skills using Python, Go, or TypeScript.
      • Experience supporting mission-critical production systems.
      • Experience working in high-availability production environments.
  • Nice to have:

  • Experience with Datadog.
  • Experience with AWS.
  • Experience with Heroku.
  • Experience working in regulated industries (Healthcare, HIPAA, Financial Services, etc.).
  • Experience establishing or maturing an SRE practice.
  • Capacity planning experience.
  • Disaster recovery planning and execution.
  • Experience with Kubernetes.
  • Experience with PostgreSQL or SQL Server.
  • Experience supporting modern TypeScript-based applications.


  • Benefits & Perks:

    Home office;
    Competitive compensation based on experience;
    Career plans to allow for extensive growth in the company;
    International Projects;
    Oowlish English Program (Technical and Conversational);
    Oowlish Fitness with Total Pass;
    Games and Competitions;


    You can also apply here:

    Website: https://www.oowlish.com/work-with-us/
    LinkedIn: https://www.linkedin.com/company/oowlish/jobs/
    Instagram: https://www.instagram.com/oowlishtechnology/


    Ready to apply to Oowlish?
    Apply to Oowlish

    Similar jobs

    Sign up for suggestions tailored to the jobs you open and the searches you save.

    Apply now
    🤖

    Whoa — hold up

    JobsRadar was built for real people having a rough time in their job search — not for automated requests. You're clicking way too fast and you're now temporarily blocked.

    Come back later. If you're genuinely job hunting, we've got your back — just act like a human.

    Catch your next role the second it’s posted.

    Create a free account and we’ll watch the boards for you — the instant a job matches your search, it lands in your inbox or Telegram. No digging, no refreshing.

    Create free account

    Free forever · takes 30 seconds · already have one?

    Get the worldwide-remote edge.

    Join our Telegram channel for the stuff that helps you land the role — salary benchmarks, the weekly market pulse, and new-feature drops. No spam, just signal.

    Join the channel — it's free