New job, posted less than a week ago!
Job Details
Posted date: May 04, 2026
Location: Seattle, WA
Level: Director
Estimated salary: $253,500
Range: $207,000 - $300,000
Description
Establish and maintain high-confidence correlation infrastructure between simulated performance and physical hardware measurements (silicon). Architect and evolve the simulation layer to support deep exploration of complex, business-critical workloads (e.g., large language models, advanced kernels) and future system topologies. Identify and solve system-level hardware/software bottlenecks and optimization opportunities at the critical pre-silicon stage. Provide high-confidence lower-bound performance estimates for future ML systems and architectures.Google Cloud’s mission is to make every business successful through AI by combining cutting-edge technology, infrastructure, and talent. AI/ML software engineers in Cloud bridge the gap between pioneering models and a massive product vehicle reaching billions. Our talent density and AI-powered tools drive rapid development, rooted in a culture of empowerment and a bias to action. In this role, you aren’t just building technology; you’re shaping the frontier of enterprise and driving the evolution of advanced models.
Our team is pioneering next-generation performance modeling and simulation technologies that drive multi-year system architecture roadmaps for cutting-edge machine learning accelerators. We are looking for a visionary technical lead to define and own the accuracy and fidelity of our critical co-design simulation platform. Help work on the most complex system-level performance challenges in close collaboration with hardware designers, ML researchers, and product architects, defining the next decade of AI systems at data center scale. If you are excited about building the most powerful ML systems with HW-SW co-design and optimization, please join us and accomplish the missions together.
The AI and Infrastructure team is redefining what’s possible. We empower Google customers with breakthrough capabilities and insights by delivering AI and Infrastructure at unparalleled scale, efficiency, reliability and velocity. Our customers include Googlers, Google Cloud customers, and billions of Google users worldwide.
We're the driving force behind Google's groundbreaking innovations, empowering the development of our cutting-edge AI models, delivering unparalleled computing power to global services, and providing the essential platforms that enable developers to build the future. From software to hardware our teams are shaping the future of world-leading hyperscale computing, with key teams working on the development of our TPUs, Vertex AI for Google Cloud, Google Global Networking, Data Center operations, systems research, and much more.
The US base salary range for this full-time position is $207,000-$300,000 + bonus + equity + benefits. Our salary ranges are determined by role, level, and location. Within the range, individual pay is determined by work location and additional factors, including job-related skills, experience, and relevant education or training. Your recruiter can share more about the specific salary range for your preferred location during the hiring process.
Please note that the compensation details listed in US role postings reflect the base salary only, and do not include bonus, equity, or benefits. Learn more about benefits at Google.
Qualifications
Minimum qualifications: Bachelor's degree or equivalent practical experience. 8 years of experience programming in C++ or Python. 5 years of experience testing, and launching software products. 5 years of experience with performance, large-scale systems data analysis, visualization tools, or debugging. 3 years of experience with software design and architecture.Preferred qualifications: Experience with hardware/software co-design problems, especially performance analysis and bottleneck identification at the pre-silicon stage. Experience with ML system architectures, including knowledge of compilers, Intermediate Representations (IRs), and hardware accelerators. Experience enabling and optimizing large-scale ML models (e.g., LLMs, large embedding models). Ability to lead technical strategy for complex systems, influencing both simulation toolchains and hardware roadmaps. Proven expertise in constructing custom IR dialects and leveraging open-source compiler frameworks (MLIR, XLA) to solve system level analysis and exploring software-hardware mapping opportunities. Expertise in architecting high-confidence, high-velocity system performance modeling and correlation infrastructure.