Server sprawl is a widespread problem for IT professionals. This week's advice column shares strategies for managing chaotic server environments.
March 3, 2025
[Root] Access is an advice column for questions about IT issues, career moves, and workplace concerns.
Our organization is rapidly expanding, and with each new project, someone spins up another server, either on-prem or in the cloud, without decommissioning the old ones.
We're trying to deal with the obvious consequences of this. We're wasting money on underutilized servers, our monitoring tools are overloaded with alerts, and our documentation has become a mess. I've lost track of how many servers sit idle because the people who set them up have moved on or forgotten about them. This situation is also a serious security risk, with old, unpatched servers just waiting to be exploited. How do we rein in server sprawl?
—Server Swamped
This is a complicated problem to solve, yet it is incredibly common. In fact, server sprawl is likely one of the most frequent challenges IT shops face today.
Here are some methods for taking back control.
Before you can start managing server sprawl, the first step is to create an inventory of all existing servers. However, you need a reliable system to identify all your virtual machines before you can do that. In most cases, this means developing a set of tags to apply to existing VMs and any new ones created in the future.
Based on my experience, I recommend taking a slow and methodical approach when deciding on these tags. While it's tempting to rush into inventory collection, it's crucial to establish a solid tagging taxonomy system first. Otherwise, you will inevitably find yourself needing additional tags in the middle of the process, which would force you to start over.
So, what kind of tags should you apply to your virtual machines? The specific tags will vary depending on your organization, but at a minimum, each tag should identify:
Who is responsible for the server
Which department the server belongs to
The workload it supports
For example, distributed applications often consist of multiple servers. In this case, you might create an "Application" tag to identify the application the server is part of. Likewise, many applications are tied to specific organizational projects. You might consider creating a "Project" tag to determine whether a server is still relevant. For instance, if a server is a part of an ongoing project, you should probably leave it alone. However, if the project wrapped a year ago, that server might be a candidate for decommissioning.
You may also want to categorize servers based on their role. At a high level, this could mean distinguishing between infrastructure servers and project servers. This distinction is important because infrastructure servers are typically semi-permanent. For example, you wouldn't want to delete your organization's domain controllers just because they are several years old. In contrast, most projects eventually end, meaning you can likely decommission project-related servers at some point.
Some organizations take this further by implementing automated lifecycle management policies for project servers. For example, an organization may have a policy that when someone creates a virtual machine for a project, the VM is automatically assigned an expiration date. A few weeks before the expiration date, the owner receives an e-mail notification. At that point, they can either extend the server's lifespan if the project is still ongoing or do nothing, allowing the server to be deleted when it expires.
Another way to manage server sprawl is to standardize the virtual machine deployment process. Start by limiting who in the organization can create (or even request) a new VM. As a part of the provisioning process, the person making the VM should apply the appropriate tags and write a detailed justification for why the new server is needed. While some of this does admittedly sound like bureaucratic nonsense, it serves a purpose:
Improved tracking: It helps the IT department identify the VM, its owner, and its purpose.
Fewer unnecessary VMs: Adding a few extra steps to the process makes it inconvenient enough to prevent people from spinning up new VMs on a whim, reducing the overall number of servers you must manage.
You might also implement a system of chargebacks or showbacks.
Chargebacks involve billing individual departments for the IT resources they consume.
Showbacks are similar, but instead of billing departments, you provide VM owners with a report showing how much their virtual machines cost the organization.
The goal is to make VM owners aware of the actual costs associated with the workloads they deploy.
Read More Tech Advice:
You May Also Like