How industries are solving challenges using Ansible.

kanav Gupta
6 min readDec 1, 2020

Microsoft automates to achieve more with Red Hat Ansible Automation Platform

To support its strategic mission, Microsoft has set a goal of end-to-end digitization. This effort simplifies processes and experiences for end users across all of its infrastructure teams managing services and applications. As part of this shift, the company is focused on building a culture of success, supported by automation technology. Using Red Hat Ansible Automation Platform and working closely with Red Hat Consulting, Microsoft created a standardized, centralized network automation environment that reduces routine, repeatable tasks and complexity. DevOps teams across the company can now focus on sharing knowledge, building skills, and creating innovative technology solutions.

Accommodating growth with new network approach Microsoft Corporation develops, manufactures, and supports software, consumer electronics and computers, and related services. Its mission is to “empower every person and every organization on the planet to achieve more”.1 To support this mission, Microsoft has set a goal of end-to-end digitization, an approach that will simplify processes and experiences for end users across all of its services and applications. “Digital transformation at Microsoft is about how we’re reinventing our operations and radically improving customer experience by eliminating manual work,” said Ryan Mecca, Principal Software Engineering Group Manager, Engineering Platforms and Data Insights, at Microsoft. Keeping pace with customer and partner expectations required addressing increased complexity across Microsoft’s corporate network infrastructure — comprised of tens of thousands of endpoints, more than 400 engineers, and close to 150,000 total employees — that connects all of Microsoft’s offices, sites, and retail locations worldwide. “We have thousands of devices of various makes and models and software versions, so at times, it’s hard to keep up with all the different vendors and ways that we interact with those devices,” said Bart Dworak, Software Engineering Manager at Microsoft. Additionally, code created by development and engineering teams was not version-controlled or peer-reviewed, leading to duplication, quality issues, and further complexity. “We had to change our mindset in how we’re managing and deploying our global network, which included not only modernizing our platforms but modernizing our skill sets,” said Mecca. To simplify and scale at pace with market demands, Microsoft looked to create a scalable, technology-agnostic automation framework that would reduce manual workloads with efficient tools and processes, as well as mitigate performance and security issues with standardized, tested code. This new solution would replace its legacy production automation solution to provide comprehensive automation capabilities, supported by a more collaborative, iterative development approach. Creating automation environments with a strategic partner As one of the largest contributors to open source, Microsoft sought an enterprise open source solution that would provide effective automation across different network vendors and create opportunities for employee engagement and collaboration. The company chose to work with its strategic partner Red Hat to adopt Red Hat Ansible Tower and Red Hat Ansible Engine (both now part of Red Hat Ansible Automation Platform) running in Microsoft Azure. “Our strategic mission at Microsoft is to support and grow our Azure cloud. We’re seeing a tremendous increase in Ansible users in Azure, growing our cloud while also bringing more contributors to Ansible to help grow that platform,” said Mecca. “There’s drive and passion from the industry to use Ansible and Azure together.”

Microsoft’s teams established three automation environments: • Development, where code is developed and tested on a small scale • User acceptance testing (UAT), where code is peer-reviewed and tested at scale • Production Engineers now automate repeatable, day-to-day tasks by deploying Ansible Playbooks to the network through a centralized playbook version control system. Additionally, these three environments support a collaborative DevOps approach across the company’s network and engineering teams. As part of its adoption of Ansible, Microsoft worked closely with Red Hat Consulting, including initial on-site planning in Redmond as well as ongoing architecture design and training over nine months. For example, Microsoft teams and Red Hat architects chose GitHub as a single source code version repository. “The relationship was great. Red Hat sat directly in our offices, with our employees, and really build a camaraderie as part of the team,” said Mecca. “With any large deployment, specifically at Microsoft’s scale, we run into challenges. When we found some bugs for our use case, it was refreshing to see the level of engagement of Red Hat’s consultants to help us quickly develop and deploy new features to remediate the issues.” Building a culture of modern development Standardized network automation at scale Microsoft has used its staged Ansible environments to automate routine, time-consuming engineering tasks, such as delivery of logic-based changes to ensure services are available to customers. Events in the network trigger other workflows, such as advanced telemetry, ticketing, logging, and analytics. Automating also helps the company follow a phased, iterative approach to code creation that protects code quality with scheduled releases of tested, verified network configurations. Standardizing on a user-friendly automation solution has not only helped Microsoft solve complexity by creating a single source of truth for services, dependencies, and integrations, but also made it easier for non-engineers to focus on service creation with peer-reviewed code. DevOps teams can now work more efficiently to create new, valuable features and services for end users while maintaining production performance. “Digital transformation is really changing the way that we think about how we solve problems. In the past, we had to manually do the same deployment again and again,” said Dworak. “With Ansible, we can create blueprints to deploy it multiple times. And every time we deploy, it’s exactly the same. Instead of redoing work and having a lot of different, single-use versions, we can continually finetune this shared code.” This approach creates opportunities for Microsoft to continue scaling to support customer demands at a much faster pace.

Establish collaborative, creative development mindset To support its adoption of modern automation technology, Microsoft also underwent a cultural shift. There is an organization-wide commitment to learning new skills and technologies: developers are learning networking, while network engineers are learning software development and developmentrelated tools like Git and Ansible. This DevOps approach to both work tasks and professional development has led to greater understanding and collaboration between teams. One engineer created Zero to Hero, a training series on automation concepts and writing playbooks. Additionally, there are now self-hosted Python learning groups, and more than 100 active participants discuss and share information in an internal automation community. Equipped with new skills and confidence, Microsoft’s teams are finding new, creative ways to solve business challenges using code and playbooks. “Teams are coming together to solve engineering problems in a shared environment of co-creation,” said Sonika Munde, Remote Access Services Engineer, Core Service Engineering, at Microsoft. “We are truly seeing One Microsoft in action.” In turn, Microsoft can contribute back to the open source Ansible community. “From the beginning, we’ve used a lot of the ideas and available code from the community, but now as we are moving faster and developing solutions, we can contribute our own ideas back,” said Dworak. Saved thousands of hours of operational work Implementing Ansible has helped Microsoft save thousands of work hours per year, including several weeks’ worth of work by reducing production downtime and network configuration defects. By completing code peer reviews and gated check-ins through preproduction environments, the company has reduced the amount of defects and bugs introduced into its production environment. This approach ultimately reduces major incidents and outages, improving network quality. Additionally, faster issue resolution means less time is devoted to repetitive support work. “We looked at the types of alerts and tickets that our help desk was getting, and we were able to write automation to take care of almost most of the incidents,” said Dworak. “We actually have a process out right now that closes about 97–98% of the tickets that come in via automation.” With the time and money saved by adopting standardized, stable automation, Microsoft’s teams can now focus on creating innovative infrastructure solutions that provide higher service quality to end users.

Rethinking the possibilities of technology and culture By focusing on people, process, and technology, Microsoft has evolved in its automation journey from manual scripting and changes to a continuous integration and delivery (CI/CD) approach supported by a centralized, service-based architecture. “The complexity and scale of our network and operations forced us to think like software engineers and develop something that not only works but is iterative, usable, and highly available,” said Munde. “We have strong support from our leadership team, and we’re seeing a growing interest among peers into automating operational tasks.” Red Hat and Ansible technology will continue to play an important role in Microsoft’s automation strategy — for example, as part of its strategic Network as Code vision to operate, maintain, and deploy its global networks with automation. But people remain the heart of Microsoft’s transformation and strategic growth. “Every single process, service, and application at Microsoft is going through digitization and optimization. We are investing in automating all of our critical business processes,” said Ludovic Hauduc, General Manager, Core Platform Engineering, at Microsoft. “So technology is important. It’s critical. But culture comes first.”

--

--