Microsoft Principal TPM Data & Telemetry - Windows Reliability

New job, posted less than a week ago!

Job Details

Posted date: Apr 23, 2026

Category: Technical Program Management

Location: Redmond, WA

Estimated salary: $222,050
Range: $139,900 - $304,200

Employment type: Full-Time

Work location type: 3 days / week in-office

Role: Individual Contributor


Description

Overview

We’re hiring a Principal TPM Data & Telemetry – Windows Reliability, individual contributor to strengthen our Reliability Telemetry & Insights function—ensuring we can consistently operate and evolve the systems that measure Windows reliability and translate signals into clear, actionable decisions for engineering and partner teams.

This role is equal parts telemetry operations, data quality/governance, and insight-to-action program leadership. You will own critical reliability datasets and dashboards end-to-end (from ingestion and validation through reporting and operational rhythms), partner across Windows engineering and ecosystem stakeholders, and help the team scale by building repeatable processes, documentation, and broader bench strength.

Why This Role Matters

Windows reliability is only as strong as the telemetry and operational system behind it. This role ensures our teams can detect regressions early, confidently explain what’s happening, and drive the right corrective actions—without being dependent on a single person’s knowledge.

Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.

Responsibilities

1) Own Reliability Telemetry “Run-the-System” Operations

Own/operate core reliability data pipelines and reporting workflows (availability, correctness, latency, completeness). Establish operational rigor: runbooks, on-call/backup coverage, incident response, and clear escalation paths. Drive data quality improvements: schema management, identity resolution, deduplication, and metric definitions. 2) Deliver Executive-Ready Reliability Reporting & Insights

Build and maintain dashboards and recurring scorecards that track key reliability outcomes (e.g., crash trends, top drivers/components, device cohorts, regressions, risk flags). Proactively identify “what changed” and “why it matters” signals; translate to recommended actions and owners. Create and maintain clear metric definitions, methodology notes, and interpretation guidance to avoid confusion/misalignment. 3) Partner Deeply Across Engineering and Ecosystem Stakeholders

Collaborate with Windows engineering (e.g., kernel/driver/servicing stakeholders), quality teams, and partner-facing teams to align on measurement and priorities. Support OEM/silicon/partner conversations with accurate, explainable reliability telemetry and narratives. Drive cross-team alignment on what actions are required when telemetry indicates regressions or out-of-policy behavior. 4) Lead Programs That Improve Reliability Signal Quality and Actionability

Identify gaps in telemetry coverage and propose/drive work to close them (instrumentation improvements, new cuts, improved categorization). Improve automation and scale: reduce manual reporting, simplify repetitive analysis, and harden tools so others can self-serve. Establish durable operational rhythms: weekly/monthly reviews, action tracking, and follow-through mechanisms. 5) Build Team Resilience and Depth

Document critical workflows and institutional knowledge (how-to guides, data lineage, known pitfalls, “how to debug” playbooks). Create training and enablement materials so others can reliably back up the function. Design work so it is system-owned rather than person-owned (clear ownership maps, redundancy, measurable SLAs).

Qualifications

Required Qualifications:

Bachelor's Degree AND 6+ years’ experience in engineering, product/technical program management, data analysis, or product developmentOR equivalent experience.Other Requirements: Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include but are not limited to the following specialized security screenings:

Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter. Preferred Qualification:

3+ years of experience managing cross-functional and/or cross-team projects.7+ years of experience in one or more of: program management, data/analytics engineering, reliability engineering, telemetry operations, or product analytics.Demonstrated experience owning end-to-end telemetry/analytics systems (ingestion → validation → modeling → dashboards → operational consumption).Solid skills in data querying and analysis (e.g., Kusto/ADX, SQL, equivalent large-scale log analytics).Experience building decision-grade reporting (e.g., Power BI or equivalent) and communicating insights to senior stakeholders.Proven ability to drive cross-functional execution: aligning stakeholders, assigning ownership, and delivering outcomes through ambiguity.Operational excellence mindset: quality bars, monitoring, incident management, documentation, and continuous improvement. Familiarity with Windows reliability concepts (crash telemetry, drivers, servicing, regressions, device cohorts).Experience with large-scale cloud data platforms (Azure data ecosystem, distributed pipelines, identity resolution).Ability to automate analysis/reporting (Python, C#, Spark, data pipelines, workflow orchestration).Prior experience working with hardware + software ecosystem partners (OEMs, IHVs, silicon vendors) or device quality programs.Experience defining metrics/governance: semantic layers, taxonomy, standard definitions, and “single source of truth” design. Working Style / Collaboration Comfort operating in a fast-paced environment with multiple stakeholders and shifting priorities.Solid written communication skills (executive-ready narratives; clear action framing).Ability to be both hands-on (querying/debugging) and high-leverage (driving alignment and ownership). #W+DJOBS

Technical Program Management IC5 - The typical base pay range for this role across the U.S. is USD $139,900 - $274,800 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $188,000 - $304,200 per year.

Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here:

https://careers.microsoft.com/us/en/us-corporate-pay

This position will be open for a minimum of 5 days, with applications accepted on an ongoing basis until the position is filled.

Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance with religious accommodations and/or a reasonable accommodation due to a disability during the application process, read more about requesting accommodations.



Email job link for Principal TPM Data & Telemetry - Windows Reliability at Microsoft

Provide your email address to receive a message with the job link and details.

Check out other jobs at Microsoft.