Microsoft Site Reliability Engineer II - CTJ - Top Secret

New job, posted less than a week ago!

Job Details

Posted date: Sep 16, 2024

Category: Software Engineering

Location: Redmond, WA

Estimated salary: $153,550
Range: $98,300 - $208,800

Employment type: Full-Time

Travel amount: 25.0%

Work location type: Up to 50% work from home

Role: Individual Contributor


Description

Microsoft is seeking a Site Reliability Engineer II (SRE) to join our Silver Infrastructure and Sovereign Operations team. This pivotal role involves defining operations for new, existing and emerging environments. We are looking for a candidate who thrives on solving complex issues, has a clear vision, and possesses the ability to execute end-to-end programs effectively.

As a Site Reliability Engineer II, you will be instrumental in defining operating models for deploying and managing systems within sovereign and air-gapped environments. This role offers the unique opportunity to collaborate with engineers dedicated to enabling a wide range of Azure services for both internal and external customers in highly secured and regulated industries. The systems, processes, and frameworks you develop will be essential in meeting the stringent security policy and assurance requirements of our diverse customer base in the public and private sectors.

If you are passionate about operational excellence and have a track record of success in similar environments, we encourage you to apply and help shape the future of our operations.

Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.

The scale of our operations is enormous. We need people who enjoy analyzing complicated problems, coming up with creative solutions, working in focused teams to build things no-one has thought of before, all in the service of production reliability.    

Defines and develops standardized, repeatable, scalable solutions to guarantee quality and efficient operations. Drive the design, optimization, efficiency and reliability of service management.Communicate on a deeply technical level with software engineers, project management, and operations teams to improve and optimize products, improve infrastructure, reduce manual toil, and evolve services. Drives efforts to collect, classify, and analyze data on a range of metrics. Drives the refinement of products through data analytics and makes informed decisions in engineering products through data integration.  Drives efforts to integrate instrumentation for gathering telemetry data on system behavior such as performance, reliability, availability, and usage. Drives sustaining feedback loops from telemetry resulting in subsequent designs. Creates outputs of telemetry such as notifications or dashboards.  Applies debugging tools and examines logs, telemetry, and other methods to verify assumptions through writing and developing code proactively before issues occur and reactively as issues occur for products. Conducts retrospective debugging of solutions to identify root causes of problems.  Reviews and writes issues postmortem and shares insights with the team.  Builds, enhances, reuses, contributes to, and identifies new software developer tools/processes to support other programs and applications to create, debug, and maintain code for products. Uses open source when appropriate. Begins to develop skills in other tools/topics outside areas of experience. Identifies internal tools and/or creates tools that will be useful for creating the product, determining if methods are still applicable for the current solution. Shares best practices and teaches others about new tools and strategies. Acts as a Designated Responsible Individual (DRI) and guides other engineers by developing and following the playbook, working on call to monitor system/product/service for degradation, downtime, or interruptions. Alerts stakeholders as to status and initiates actions to restore system/product/service for simple problems and complex problems when appropriate. Responds within Service Level Agreement (SLA) timeframe. Drives efforts to reduce incident & request volumes, looking globally at incidences and providing broad resolutions. Escalates issues to appropriate owners.  Ability to meet on call responsibilities periodically to support 24x7 operations.

Qualifications

Minimum/Required Qualifications:

4+ years technical experience in software engineering, network engineering, or systems administration  OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 3+ years technical experience in software engineering, network engineering, or systems administration  OR Master's Degree in Computer Science, Information Technology, or related field AND 2+ years technical experience in software engineering, network engineering, or systems administration.  Other Requirements:

Security Clearance Requirements: Candidates must be able to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings:

The successful candidate must have an active U.S. Government Top Secret Security Clearance. Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. Failure to maintain or obtain the appropriate clearance and/or customer screening requirements may result in employment action up to and including termination.Clearance Verification: This position requires successful verification of the stated security clearance to meet federal government customer requirements. You will be asked to provide clearance verification information prior to an offer of employment. Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter. Criminal Justice Information Services: This position requires passing a background check conducted through the CJIS criminal justice information system by authorized local, state, and/or federal agencies and across multiple states. This role requires candidates to maintain CJIS screening eligibility. Citizenship & Citizenship Verification: This position requires verification of U.S. citizenship due to citizenship-based legal restrictions. Specifically, this position supports United States federal, state, and/or local United States government agency customer and is subject to certain citizenship-based restrictions where required or permitted by applicable law. To meet this legal requirement, citizenship will be verified via a valid passport, or other approved documents, or verified US government ClearancePreferred/Additional Qualifications: 

3+ years of experience with PowerShell, C#, or C++.   Experience working on large-scale distributed services with on-call responsibilities.   Ability to build and influence broadly towards common goals and priorities.   Ownership for end-to-end project lifecycle with solid project management and communication skills.   Experience applying SRE principles in a large production environment.Site Reliability Engineering IC3 - The typical base pay range for this role across the U.S. is USD $98,300 - $193,200 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $127,200 - $208,800 per year.

Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here: https://careers.microsoft.com/us/en/us-corporate-pay

Microsoft will accept applications for the role until September 23, 2024.

Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, color, family or medical care leave, gender identity or expression, genetic information, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran status, race, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable laws, regulations and ordinances. We also consider qualified applicants regardless of criminal histories, consistent with legal requirements. If you need assistance and/or a reasonable accommodation due to a disability during the application or the recruiting process, please send a request via the Accommodation request form.

Benefits/perks listed below may vary depending on the nature of your employment with Microsoft and the country where you work.

#Silver



Check out other jobs at Microsoft.