Do you want to contribute to the critical solutions for Azure Machine Learning Platform? Do you want to enable distributed model training workloads that are self-managing and can be operated with ease in datacenters around the world? The Azure Machine Learning Service team provides APIs to Microsoft customers to easily train their distributed or single node ML/DL models by using best in class GPUs, IB network connectivity. The team is integrated with the Data Movement and Model Operationalization work flow, making it easy for customers to get their model training and operationalization needs met at one place. The Azure Machine Learning Service gives a simple API to users to use any toolkit (e.g. TensofFlow, PyTorch etc.) to train their model and it handles all the intricacies related to mounting file systems, containers, container registries, GPU drivers, KeyVault integration, collecting metrics etc. As part of the core team, you will be dealing with multi-tenancy, reliability, gang scheduling
As a senior member of the team, you will be responsible for proposing and developing innovative solutions that span across multiple layers of technologies. You will be responsible for fully understanding the requirements and driving AI workloads onto the platform, by providing end-to-end architectures, tooling etc, while keeping an eye on industry adoption of various Open Source technologies. You will also be participating in live site, security reviews, Audit reviews etc.
A minimum of a Bachelor's degree in Computer Science or Engineering, or a related field, or equivalent alternative education, skills, and/or practical experience is required. 4+ years of experience as a software developer Expertise in C#, Java, Scala, C++ or Python Solid Computer Science fundamentals, fluent in multi-threaded programming, Strong experience/inclination for architecting at scale. Excellent technical design, problem solving and debugging skills. Ability to plan, schedule and deliver quality software. Experience with Windows Azure, Kubernetes, Other Open source scheduling platforms