Jobs Companies Focused Staff SRE - Observability

About this Staff SRE - Observability role at Focused

Focused · Onsite · Chicago, Illinois, United States

 

Who we are:

At Focused, we move quickly to deliver quality software that achieves client outcomes and meets their customer’s needs. We strategically partner with our clients to leverage our expertise in design and software, while our clients bring their own domain expertise. We work with a variety of clients from different industries, collaborating as we get new products to market, modernizing legacy systems, or helping teams learn the skills they need to be successful.   

Our values:

  • Listen first • We are experts in product practices but life long learners in the domain of our customers. We research, collaborate, and understand. 
  • Learn why • We ask questions and talk to users to understand problem spaces, objectives, and goals, which allows us to deeply invest and drive towards the outcomes of our clients. 
  • Love your craft • We love diving into a variety of domains and solving problems.  We take pride in delivering value, in communicating progress, and guiding our clients to success.

We are seeking an experienced Staff Observability Consultant with deep expertise in OpenTelemetry and strong Platform Engineering capabilities to help organizations implement, optimize, and scale their observability infrastructure. This role requires a seasoned consultant who can design comprehensive telemetry strategies, implement distributed tracing solutions, establish robust monitoring practices, and interface closely with clients on the observability journey.

Key Responsibilities:

OpenTelemetry & Observability

  • Design and implement end-to-end OpenTelemetry solutions across diverse technology stacks
  • Configure and deploy OpenTelemetry Collectors for efficient data collection, processing, sampling, and routing
  • Establish telemetry pipelines for metrics, traces, and logs across microservices architectures
  • Optimize collector configurations for performance, reliability, and cost-effectiveness

Platform Engineering & Infrastructure

  • Augment existing infrastructure with with integrated observability solutions
  • Implement Infrastructure as Code (IaC) solutions using Terraform, Pulumi, CloudFormation, etc.
  • Architect and manage Kubernetes clusters with comprehensive monitoring and logging
  • Build CI/CD pipelines with embedded observability and automated testing

Site Reliability Engineering (SRE)

  • Establish and maintain Service Level Indicators (SLIs), Objectives (SLOs), and Agreements (SLAs)
  • Implement error budgets, toil reduction strategies, and capacity planning
  • Support incident response procedures and post-mortem processes

Cloud & DevOps Engineering

  • Deploy and manage observability infrastructure across AWS, GCP, and Azure
  • Establish security, compliance, and governance frameworks for telemetry data
  • Experience automating Agent Evaluations in CI/CD pipelines and observability backends.

Required Qualifications:

Core Observability & OpenTelemetry

  • 3-7 years of experience in observability, monitoring, and distributed systems
  • Deep hands-on experience with OpenTelemetry ecosystem, including SDKs, APIs, and specifications
  • Proficiency with OpenTelemetry Collector configuration, processors, exporters, and receivers
  • Strong understanding of telemetry data models, semantic conventions, and instrumentation best practices

Platform Engineering & DevOps

  • 5+ years of Platform Engineering or DevOps experience with focus on site reliability, observability, and incident response
  • Proficiency with Infrastructure as Code tools (Terraform, Pulumi, CloudFormation, CDK)
  • Strong experience with CI/CD platforms (GitHub Actions, GitLab CI, Jenkins, ArgoCD)

Cloud & Infrastructure

  • Hands-on experience with major cloud providers (AWS, GCP, Azure) and their observability services
  • Experience with container technologies (Docker, Podman) and container registries
  • Knowledge of networking, security, load balancing, and distributed systems concepts

Site Reliability Engineering

  • Experience implementing SRE practices including error budgets and toil metrics
  • Proficiency in incident management, on-call procedures, and post-mortem culture
  • Experience with capacity planning, performance optimization, and scalability design

Programming & Automation

  • Proficiency in multiple programming languages preferred (Go, Python, Java, Node.js, Rust)
  • Strong scripting and automation skills (Bash, Python, PowerShell)
  • Understanding of software engineering best practices and testing methodologies

Preferred Qualifications (Exceptional Candidates)

AI & Agentic Frameworks

  • Understanding of Large Language Models (LLMs) and their application in DevOps
  • Knowledge of vector databases, embeddings, and retrieval-augmented generation (RAG)
  • Experience with AI/ML model deployment and monitoring in production environments

Leadership & Communication

  • Strong technical writing and documentation skills
  • Ability to present complex technical concepts to diverse stakeholders
  • A passion for knowledge sharing

Key Competencies

  • Systems thinking and ability to design holistic observability solutions
  • Strong analytical and troubleshooting skills for complex distributed systems
  • Curiosity about emerging technologies, particularly AI applications in operations
  • Adaptability to rapidly evolving cloud-native and observability technologies
  • Collaborative mindset with focus on enabling developer productivity and system reliability

What Sets Exceptional Candidates Apart:

  • Experience with Honeycomb
  • Contributions to open-source observability or AI framework projects
  • Track record of implementing platform engineering solutions that significantly improved developer experience
  • Experience scaling observability infrastructure to handle high event volume

What to know before you apply: 

  • This role will require being in the Chicago office three days per week and up to 20% travel within the United States.
  • Focused is unable to sponsor or take over sponsorship of the employment Visa process at this time.
  • The Chicago base salary range for this role is $160,000 - $200,000.
Ready to apply to Focused?
Apply to Focused

How this SRE salary compares

This role pays $180,000/yrin line with the typical range for SRE roles.

$85,913 median $176,000 $245,065

Typical range $130,000–$206,512/yr, from 460 comparable SRE listings on JobsRadar (pay annualized to USD). See SRE salary insights →

About Focused

Careers with Focus

Hello, we're Focused

At Focused, we take a unique approach to developing high-quality, business-focused, software. We believe that digital products can and should be built to evolve with your business. Our approach is structured around delivering products to market fast, testing with real customers, and iterating based on their feedback.

We work with people who are the best at what they do and who care about making others the best at what they do too. We want to be great people to work with first—who just happen to be exceptional at building software.

Our values:

  • Listen first - Every decision we make is informed by deep listening. We hear every perspective, and we keep listening to find the right solution.
  • Learn why - We keep digging, keep asking questions, and learn new skills continuously to make ourselves and our team better. 
  • Love your craft - We believe in becoming the best at what we do, which means finding the best answers to the hardest problems—not the expected ones.

See yourself working here? Join our team by applying below!

See all jobs at Focused →

Similar jobs

Advanced Technology Services
Reliability Engineer - Industrial Maintenance
Advanced Technology Services
⚡ Apply early United States- Chicago, Illino... Onsite $102,970–$131,690
● New 👁 Seen ✓ Applied 3d ago
Ripple
Site Reliability Engineer, Observability
Ripple
⚡ Apply early Chicago, Illinois, United Stat... Onsite $160,000–$200,000
● New 👁 Seen ✓ Applied 3d ago
Okta
Staff Site Reliability Engineer - Kubernetes
Okta
⚡ Apply early Bellevue, Washington; Chicago,... Onsite $194,000–$267,000
● New 👁 Seen ✓ Applied 1w ago
Okta
Senior Database Reliability Engineer (DBRE)
Okta
⚡ Apply early Bellevue, Washington; Chicago,... Onsite $160,000–$220,000
● New 👁 Seen ✓ Applied 1w ago
TransMarket Group
DevOps/SRE Intern
TransMarket Group
⚡ Apply early Chicago, Illinois, United Stat...
● New 👁 Seen ✓ Applied 1mo ago
NL
Senior Cloud Platform & Site Reliability Engineering Lead
National Life Insurance Company
⚡ Apply early Addison, TX; Montpelier, VT Onsite $136,875–$200,750
● New 👁 Seen ✓ Applied 3h ago
Fluidstack
Site Reliability Engineer, Compute
Fluidstack
⚡ Apply early San Francisco, CA Onsite $175,000–$300,000
● New 👁 Seen ✓ Applied 3h ago
Roku
Senior Machine Learning Engineer, DevOps/SRE
Roku
⚡ Apply early Austin, Texas Onsite
● New 👁 Seen ✓ Applied 3h ago
Roku
Senior Machine Learning Engineer, DevOps/SRE
Roku
⚡ Apply early San Jose, California Onsite $148,750–$361,000
● New 👁 Seen ✓ Applied 3h ago

Sign up for suggestions tailored to the jobs you open and the searches you save.

Apply now
🤖

Whoa — hold up

JobsRadar was built for real people having a rough time in their job search — not for automated requests. You're clicking way too fast and you're now temporarily blocked.

Come back later. If you're genuinely job hunting, we've got your back — just act like a human.

Catch your next role the second it’s posted.

Create a free account and we’ll watch the boards for you — the instant a job matches your search, it lands in your inbox or Telegram. No digging, no refreshing.

Create free account

Free forever · takes 30 seconds · already have one?

Get an edge on your job hunt.

Join our Telegram channel for the stuff that helps you land the role — salary benchmarks, the weekly market pulse, and new-feature drops. No spam, just signal.

Join the channel — it's free