Microsoft Continuous Evaluation Program Manager

New job, posted less than a week ago!

Job Details

Posted date: Feb 25, 2026

Category: Reliability Engineering

Location: Phoenix, AZ

Estimated salary: $188,900
Range: $119,800 - $258,000

Employment type: Full-Time

Travel amount: 25.0%

Work location type: 0 days / week in-office – remote

Role: Individual Contributor


Description

Overview

Microsoft’s Cloud Operations & Innovation (CO+I) is the engine that powers our cloud services. At its core, datacenter availability isn't just a metric, but a promise of continuity. It is imperative to identify availability improvements & opportunities across Microsoft datacenters. This goal will continually allow our operational cloud to scale in a safe, secure, and reliable manner for our customers.

The Continuous Evaluation Program (CEP) is a strategic initiative within Microsoft’s global datacenter operations, designed to systematically assess, monitor, and optimize the ongoing operational readiness of our infrastructure.

CEP plays a critical role in strengthening Microsoft’s availability, business reputation, and customer experience by proactively identifying risks, mitigating exposure, and driving consistency in operational excellence. As we accelerate our speed to market, CEP ensures scalable, reliable, and high-quality solutions through continuous evaluation and behavioral influence.

Key Program Focus Areas:

Availability: Provides impartial assessments of operational readiness across the datacenter fleet, ensuring consistent uptime and performance.

Standardized Evaluation Framework: Utilizes clear, measurable benchmarks derived from Microsoft’s datacenter operational standards to guide ongoing site evaluations.

Data-Driven Risk Mitigation: Leverages historical data to identify patterns in equipment and system failures, enabling proactive risk identification and elimination.

Scalable Operational Processes: Implements optimized and standardized procedures that support rapid growth without compromising reliability or quality.

Proactive Issue Detection: Identifies potential risks early, with a goal to prevent disruptions and ensure higher availability across operational sites.

Culture of Continuous Improvement: Promotes innovation and agility, ensuring Microsoft’s datacenter infrastructure remains resilient, adaptable, and future-ready.

Responsibilities

Responsibilities:

Align with Microsoft’s culture, objectives and operational standards.

Deliver a best-in-class, objective and impartial evaluation program monitoring Microsoft’s datacenter infrastructure, operational capabilities and performance against our standards, best practices and programs.

Drive global consistency of processes, procedures, and reporting with local operations teams.

Develop methodologies and metrics to validate data center performance, system control parameters and operational efficiency against design intent.

Support Microsoft’s datacenter portfolio expansion to include new country and facility onboarding through operational and site risk reviews.

Manage programs associated with operational readiness.

Review compliance with existing corrective and preventative maintenance programs to enhance operational readiness.

Evolve operational excellence with key focus areas of risk management, uptime availability and safety.

Focus on improved environmental performance, compliance, and risk management.

Support and promote improvement, best practices, corrective and preventive actions

Engages with appropriate partner teams to support initiatives, tasks or projects.  

Establish strong working relationships and engagement with our Engineering Groups (EGs), key partners and Landlord partners (including contributing to MBRs and QBRs) 

Work with regional and global peers to share and build best practices across the entire datacenter portfolio. 

Partner with regional operational leadership and local teams to reduce high-impacting and human-error Critical Environment (CE) incidents year over year. 

Monitor and verify the implementation and effectiveness of remediation action plans.

Create an environment to promote learning and innovation opportunities.

Obtain a clear understanding of Microsoft’s day-to-day operation, management and maintenance expectations for all critical equipment, controls and processes including (but not limited to), operating procedures, standards, change management and drills.

Develop methodologies and metrics to validate datacenter performance, system control parameters and operational efficiency against design intent. 

Support Microsoft’s datacenter portfolio expansion to include new country and facility onboarding through operational and site risk reviews. 

...

Qualifications

Required/minimum qualifications

Doctorate Degree in Mechanical Engineering, Materials Engineering, Reliability Engineering, Electrical Engineering, or related field AND 2+ years technical engineering experience OR Master's Degree in Mechanical Engineering, Materials Engineering, Reliability Engineering, Electrical Engineering, or related field AND 4+ years technical engineering experience OR Bachelor's Degree in Mechanical Engineering, Materials Engineering, Reliability Engineering, Electrical Engineering, or related field AND 5+ years technical engineering experience OR 12+ years relevant technical engineering experience. Additional or preferred qualifications

Doctorate Degree in Mechanical Engineering, Materials Engineering, Reliability Engineering, Electrical Engineering, or related field AND 4+ years technical engineering experience OR Master's Degree in Mechanical Engineering, Materials Engineering, Reliability Engineering, Electrical Engineering, or related field AND 7+ years technical engineering experience OR Bachelor's Degree in Mechanical Engineering, Materials Engineering, Reliability Engineering, Electrical Engineering, or related field AND 9+ years technical engineering experience.Background Check Requirements:

Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings:

Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.Citizenship Verification: This position requires verification of US Citizenship to meet federal government security requirements.#COICareers | #EPCCareers | #DCDCareers

Reliability Engineering IC4 - The typical base pay range for this role across the U.S. is USD $119,800 - $234,700 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $158,400 - $258,000 per year.

Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here:

https://careers.microsoft.com/us/en/us-corporate-pay

This position will be open for a minimum of 5 days, with applications accepted on an ongoing basis until the position is filled.

Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance with religious accommodations and/or a reasonable accommodation due to a disability during the application process, read more about requesting accommodations.



Email job link for Continuous Evaluation Program Manager at Microsoft

Provide your email address to receive a message with the job link and details.

Check out other jobs at Microsoft.