Microsoft Principal Supercomputing Software Engineer

Job Details

Posted date: Sep 12, 2025

Category: Software Engineering

Location: Multiple Locations, Multiple Locations

Estimated salary: $215,800
Range: $137,600 - $294,000

Employment type: Full-Time

Travel amount: 25.0%

Work location type: 0 days / week in-office - remote

Role: Individual Contributor


Description

Microsoft Azure Artificial Intelligence/High Performance Computing (AI/HPC) team is looking for systems engineers, architects and thought leaders to enable customers in deploying, monitoring, profiling, and debugging their applications on hyperscale cloud infrastructure. Azure is enabling the largest supercomputing deployments to tackle complex computational problems in public cloud, evident from the various HPC products that have already made the mark on Top500, MLPerf and Graph500 rankings.

At this supercomputing scale, we need specialized tools and techniques to maintain the reliability, runtime performance, health of the system and running jobs continuing to meet the Service Level Agreements (SLAs) of customers. Your job would be to build and use state-of-the-art tools and techniques, find operational gaps and instrument features to achieve the smooth operation of cloud-native supercomputers. As a Principal Supercomputing Engineer, you would also bring to the table establishing best practices drive architectural changes and influence roadmap of relevant software and hardware components. Your work will directly impact business goals of a wide range of users and facilitate the next wave of growth and innovation in AI, and HPC in the cloud in general.



Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.

Be part of a comprehensive systems management team focused on operational excellence and customer successAnalyze key system metrics and telemetry to proactively identify and debug HPC system issues, build appropriate tooling, help develop processes and ensure that solutions are responsive to emerging user needsPartner with customers, vendors, and other teams within Azure to drive comprehensive solutions for operating world class Supercomputers in the public cloud environmentEnsure that the Azure platform is performant, scalable and resilientFoster test-driven engineering culture to reduce regressions and bugs in production and will set a higher bar for infrastructure quality

Qualifications

Required Qualifications:

Bachelor's Degree in Computer Science or related technical or scientific field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience5+ years of experience in operating AI/HPC systems, developing and running AI/HPC applications on clusters, or operating Cloud Infrastructure3+ years of specialized experience with one of AI/HPC system management OR High-Speed Networks OR HPC Storage OR managing Cloud Infrastructure

Other Requirements:

Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter.

Preferred Qualifications:

Masters' Degree or PhD in Computer Science or related technical or scientific fieldOperational experience running large scale HPC systems or infrastructure situated in Cloud environmentsPrevious experience with running and troubleshooting machine learning workloads on GPU-based HPC systemsExpertise in Cloud Computing, Virtualization and Container TechnologiesFamiliarity with the HPC software stackSoftware Engineering IC5 - The typical base pay range for this role across the U.S. is USD $137,600 - $267,000 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $180,400 - $294,000 per year.

Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here: https://careers.microsoft.com/us/en/us-corporate-pay

Microsoft will accept applications for the role until September 26, 2025.

Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, color, family or medical care leave, gender identity or expression, genetic information, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran status, race, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable laws, regulations and ordinances. We also consider qualified applicants regardless of criminal histories, consistent with legal requirements. If you need assistance and/or a reasonable accommodation due to a disability during the application or the recruiting process, please send a request via the Accommodation request form.

Benefits/perks listed below may vary depending on the nature of your employment with Microsoft and the country where you work.

#azurecorejobs



Email/text job link for Principal Supercomputing Software Engineer at Microsoft

Provide your email or phone number to recieve a short message with the job link and details.

Check out other jobs at Microsoft.