Boeing Cloud Reliability Manager

New job, posted less than a week ago!

Job Details

Posted date: May 20, 2026

Category: Information Technology

Location: Seattle, WA

Estimated salary: $197,475
Range: $161,500 - $233,450


Description

At Boeing, we innovate and collaborate to make the world a better place. We’re committed to fostering an environment for every teammate that’s welcoming, respectful and inclusive, with great opportunity for professional growth. Find your future with us.

The Boeing Company is looking for a Cloud Reliability Manager to join the team in Seattle, WA; Dallas; TX; North Charleston, SC; Chicago, IL; El Segundo, CA; Mesa, AZ; San Diego, CA; Berkeley, MO; or Hazelwood, MO.

The Cloud Reliability Manager will lead the Cloud Reliability organization, owning Runtime Site Reliability Engineer (SRE) and Cloud Operations. You will be accountable for the reliability, scalability, and operational excellence of the enterprise runtime and shared cloud platform. This role combines people leadership, incident command ownership, operational strategy, and a hands-on expectation to drive reliability automation, observability, and post-incident remediation across multi-cloud environments.

Position Responsibilities:

Own strategy, roadmap, and delivery for Runtime SRE and Cloud Operations to meet enterprise Service Level Objectives (SLOs) and operational Service-Level Agreements (SLAs)

Lead, mentor, and grow teams responsible for runtime SRE (SLOs/SLIs, observability, performance engineering, Disaster Recovery (DR), chaos testing) and Cloud Operations (Tier 0/1 triage, incident bridge management, runbooks)

Establish and own incident management processes: detection, escalation, incident command, post-incident reviews, and remediation planning; ensure rapid detection and reduced Mean Time to Repair (MTTR)

Drive observability and telemetry strategy (metrics, tracing, logs) to ensure actionable alerts and proactive detection of platform issues

Lead capacity planning, performance tuning, and disaster recovery orchestration for platform services and multi-cluster fleets

Convert Root Cause Analysis (RCA) outcomes into prioritized engineering work: coordinate with Platform Acceleration, Foundations, Development Experience (DevEx), and Security to implement durable fixes and prevent recurrence

Define and measure operational Key Performance Indicator (KPIs) (Mean time to detect (MTTD), Mean time to acknowledge (MTTA), Mean time to recover (MTTR), SLO adherence, alert noise) and implement automation to reduce manual toil

Own on-call and rotation policies, runbook quality, bridge setup SLAs, and operational playbooks; ensure teams are trained and drills executed regularly

Ensure security, compliance, and change management controls are integrated into operational procedures and emergency responses

Represent Cloud Reliability in architecture reviews, executive incident briefings, and cross-team governance forums

Basic Qualifications (Required Skills/Experience):

5+ years in cloud operations, SRE, and/or related roles

3+ years managing technical teams with on-call responsibilities

3+ years of experience with Kubernetes at scale and multi-cloud runtime platforms (Elastic Kubernetes Service (EKS)/Azure Kubernetes Service (AKS)/Google Kubernetes Engine (GKE))

3+ years of experience with observability tooling (Prometheus, Grafana, OpenTelemetry, Elasticsearch, Logstash, Kibana (ELK)/Elasticsearch, Fluentd, Kibana (EFK), tracing) and alerting design

Experience owning incident response and improving reliability metrics in production environments

Experience with capacity planning, performance engineering, and disaster recovery at cloud scale

Experience with automation tooling (Terraform, Continuous Integration/Continuous Deployment (CI/CD), operators) and integrating reliability into IaC pipelines

Preferred Qualifications (Desired Skills/Experience):

Experience with excellent communication skills while being able to run incident command, lead post-incident reviews, and present technical and business impacts to stakeholders

Experience managing both strategic SRE and operational Tier 0/1 teams in a single organization

Experience in chaos engineering, resilience testing, and failure injection frameworks

Experience with policy as code, security incident response, and regulatory compliance needs in cloud operations

Experience with coding/automation in Go, Python, and/or scripting languages and building operational runbooks as code

Conflict Of Interest:

Successful Candidates for this job must satisfy the Company’s Conflict Of Interest (COI) assessment process.

Drug Free Workplace:

Boeing is a Drug Free Workplace where post offer applicants and employees are subject to testing for marijuana, cocaine, opioids, amphetamines, PCP, and alcohol when criteria is met as outlined in our policies.

Pay & Benefits:

At Boeing, we strive to deliver a Total Rewards package that will attract, engage and retain the top talent. Elements of the Total Rewards package include competitive base pay and variable compensation opportunities.

The Boeing Company also provides eligible employees with an opportunity to enroll in a variety of benefit programs, generally including health insurance, flexible spending accounts, health savings accounts, retirement savings plans, life and disability insurance programs, and a number of programs that provide for both paid and unpaid time away from work.

The specific programs and options available to any given employee may vary depending on eligibility factors such as geographic location, date of hire, and the applicability of collective bargaining agreements.

Pay is based upon candidate experience and qualifications, as well as market and business considerations.

Summary pay range: $161,500 - $233,450

Applications for this position will be accepted until May. 27, 2026

Export Control Requirements:

This position must meet U.S. export control compliance requirements. To meet U.S. export control compliance requirements, a “U.S. Person” as defined by 22 C.F.R. §120.62 is required. “U.S. Person” includes U.S. Citizen, U.S. National, lawful permanent resident, refugee, or asylee.

Export Control Details:

US based job, US Person required

Relocation

Relocation assistance is not a negotiable benefit for this position.

Visa Sponsorship

Employer will not sponsor applicants for employment visa status.

Shift

This position is for 1st shift

Equal Opportunity Employer:

Boeing is an Equal Opportunity Employer. Employment decisions are made without regard to race, color, religion, national origin, gender, sexual orientation, gender identity, age, physical or mental disability, genetic factors, military/veteran status or other characteristics protected by law.



Email job link for Cloud Reliability Manager at Boeing

Provide your email address to receive a message with the job link and details.

Check out other jobs at Boeing.