New

Applied Scientist

Microsoft
United States, Washington, Redmond
Aug 19, 2025
OverviewOutlook is one of the most widely used communication and productivity tools in the world, and Copilot is transforming how millions of users engage with it through the power of AI. As an Applied Scientist, you will play a critical role in advancing our Outlook's Co-pilot efforts in the areas ofLarge Language Model(LLM), Prompt Eng, Evaluation, Relevance andResponsible AI(RAI). This multifaceted role is responsible for developing an end-to-end infrastructure and measurement framework, fostering cross-functional collaboration, and leveraging data science and AI expertise to guide decision-making. The candidate will work with multiple large organizations and stakeholders to drive the evaluation of our LLM systems and associated components. Microsoft's mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond. ResponsibilitiesStrategic Leadership: Develop and execute a comprehensive strategy for LLM evaluation, encompassingLLM quality, costs,model performance,model utility (user experience andprompt effectiveness), and responsible AI considerations,in alignment with company-wide efforts and informed by emerging research.Program Management: Oversee and manage large-scale, cross-functional evaluation programs, ensuring alignment with organizational objectives and timelines.Develop and maintain a robust measurement framework to track and report on LLM performance and user impact. Drive engineering product roadmap to construct automated evaluation pipelines integrated into the product workflow.Data Science Expertise: Utilize strong data science skills to design experimentation, analyze data, create OKRs, create measurement and metrics, and derive actionable insights to enhance LLM systems.Responsible for influencing product and user experience based on evaluation results.Model and Prompt Evaluation: Lead efforts to assess and improve the performance and effectiveness of language models and prompts, driving iterative enhancements, including synthetic and manufactured data creation.User Experience Enhancement: Collaborate with User Experience teams to evaluate and optimize user interactions with AI systems, enhancing user satisfaction.Responsible AI (RAI): Implement RAI principles and guidelines in AI systems, ensuring ethical and unbiased practices in model development and deployment.Contribute to the LLM research body: form partnerships and lead deep research initiatives in areas of LLM evaluation and user experience optimization that contribute to the scientific body and deepen the product team's understanding and expertise of user mental models of and alignment to LLM-powered experience.Cross-Functional Collaboration: Work with engineering, research, product, and other teams to ensure seamless integration of evaluation processes into the development lifecycle.OtherEmbody our Culture and Values