Why Do We Need a Systems Management Tool?
Current trends in the IT world continue to accelerate the rate of change in every area. Applications, server platforms and networks are no longer the slow moving entities they once were. They are subject to change on an almost daily basis. In this environment, it becomes more and more important for the IT Operations team to quickly detect, and respond to changes, or anomalous events.
My employer is a relatively new business. Applications would be customized packages, and a large section of its core IT systems would be outsourced. Slowly, those package based solutions morphed into custom applications and, for a variety of reasons, these outsourced systems were brought back in-house a couple of years ago. This presented those of us in the IT department with some interesting challenges. One of those challenges was how to go about managing our newly re-acquired IT infrastructure and applications.
When we first decided to move our core systems from an outsourced to an in-house IT Operations Department, our requirements were limited. Checking the availability of some services and the load on the network and key servers was about as much as we thought we needed.
It became obvious over time that this was rather optimistic. Each new service added seemed to result in a new management tool being installed on a System Administrator's workstation. At one point we had three separate network monitoring systems, three separate performance management tools and a plethora different scripts, web pages and command line tools. The DBA team had one tool, the Network Admins another, the Unix and Windows teams yet another. We sent out critical alerts by email, pager, and SMS, often to completely inappropriate people.
The company was growing, and it looked like it was beginning to need a grown-up systems management tool, but which one?
What Do We Expect from a Systems Management Application?
There is definitely a "sweet spot" for systems management applications. Some are suited to smaller environments, others are most definitely suited to enterprise scale environments with more demanding requirements. Unsurprisingly the enterprise scale products often come with enterprise scale price tags and learning curves.
We had a few key requirements:
Platform independence: Our network management system would have to run on available hardware (at the time, SPARC/Solaris).
Performance: Any solution would need to scale from a few hundred nodes to a few thousand nodes.
Enterprise level features: We required at least SNMP trap management, configurable alert escalation and availability and performance reports for the management team.
Rationalize support roles: We needed to be able to take individuals out of the process. That meant an end to emails sent by systems to developers in the middle of the night. Our operations team needed to be the first contact for every event.
Reduce tasks: It would need to lighten the burden on the Operations Team, not increase it.
Extensibility: Previous experience indicated that there was no such thing as a complete solution.
Low cost of entry: It needed to replace a portfolio of Open Source products.
Longevity: Some Open Source products seem to wither on the vine with no apparent cause, or fragment through disagreements between developers. Commercial products too are subject to the vagaries of the market.