Saltstack: How IBM and Cloudflare use Salt to manage their global networks

When I look at smaller-sized tech companies, I tend to judge them by the company that they keep. By that I mean who they partner with, who are their customers, and where are their products being used. By any of those metrics, SaltStack is in very good quarters indeed.

At the SaltConf18, we heard from several large customers using Salt to run some very sophisticated and complex networks, such as Cloudflare and IBM Cloud. Both companies run their infrastructure with just a few staffers, which is another testimonial to how powerful Salt can be in its automation and orchestration features.

Tom LeFebvre is a network engineer and was the presenter for Cloudflare. Cloudflare runs about a tenth of the total global Internet traffic across its infrastructure, and is used by some of the largest web properties to accelerate the delivery of their content. They manage more than seven thousand servers with Salt, located in more than 150 different data centers running more than 250 Salt Master copies.

They are deep users of Salt, and are constantly trying to improve their deployment to make it operate faster and more reliably. When you are connecting servers between China and the US, you have to keep network latencies and traffic to a minimum, especially as it has to traverse the Great Chinese Firewall.

Some of the things they have learned is to try to use packages rather than scripts to update server operating systems, and use highstate calls whenever possible to reduce the loads being placed on the Minions. They also developed a series of graphical dashboards that keeps track of the highstates and set up special alerts for help troubleshoot failed conditions or when Minions were consuming too much time to complete their tasks. They tied these conditions to notifications that were sent out to the staff via Google chat messages, which shows how easy it is to extend Salt with other services. They also rewrote some of their Pillars into pure Python, again to help increase performance. Finally, they are increasing the number of Masters deployed in each data center to handle their canary deployments, which means providing an early warning when something goes wrong with one of their massive system rollouts or upgrades.

Also presenting at the conference were an unlikely couple: Nathan Newton from IBM Cloud and Mike Wiebe from Cisco. The two have been active in working with SaltStack to modify its minions and other code to work with the giant network gear that IBM Cloud uses to run its global network. Newton spoke on how he has just 12 team members that runs their network and a large part of that efficiency is due to Salt. IBM Cloud has tens of thousands of Cisco NX-OS and Arista EOS network switches that are spread across 80 data centers around the world.

Again, what impressed me was how both men were working with SaltStack to extend the original premise of the product to handle the completely different context of network management, by having the Minions run directly on the Cisco gear. Newton said during one of his presentations, “IBM is good at building data centers, but once they are built the next day we need automation to take care of them.” That’s where they need help. They reached a tipping point last year where they were maintaining 60,000 different devices and “we couldn’t do it manually. We needed to be more proactive and have better automated tools.” That’s where Salt came into play. One of the reasons why they duo went with Salt was because of its event-driven automation, and the ability to cause particular actions and not just notify the team when something went wrong.

What impressed me most about both IBM and Cloudflare’s implementations was how willing they were to keep pushing Salt to do more and do it better. Both of them obviously believe in the product to trust it to be such a critical part of their network infrastructure.

Web Informant

David Strom's musings on technology

Saltstack: How IBM and Cloudflare use Salt to manage their global networks

Leave a Reply Cancel reply