What open source systems management tools are out there to help manage a virtual data center?
Open source systems management risks are relatively low. It is important to choose the right product that will manage the infrastructure best. It is also important to train staff to use the tools. I've been in Tivoli shops that could do anything and everything with Tivoli, and on others where it was hated -- the difference in these cases being the amount of expertise present and training provided. The same applies to Zenoss, Groundwork, etc. There's a lot of hype to worry about, as well.
Virtualization's biggest risk is the "all the eggs in a few baskets" concern. However, this is not really a concern to IT Managers with any significant degree of experience. Building in the appropriate redundancy to your network, physical, and virtual systems is crucial. Other than the higher damage potential of hardware fault, the risks of virtualized servers are actually less than physical servers. The portability and hardware-agnosticism of guests makes moving hardware platforms a breeze compared to the old "backup-build-lay OS down-lay apps down-restore" method of changing servers. What are the rewards?
For open source systems management, the real rewards are in reduced costs, and a lack of vendor-lock. Also, those organizations willing to become active in an open source systems management project will gain an ability to steer the development of the product to match the needs of the business far beyond what can be driven in a traditional proprietary vendor model.
For virtualization, there are many benefits -- disaster recovery is eased when you don't have individual servers dependent on hardware. Backup becomes as easy as a snapshot. Provisioning new server resources goes from taking weeks to taking days (if not down to just hours!). Hardware acquisitions are reduced. Electrical bills are reduced. The number of hardware problems go down. Servers are consolidated. Do data center managers need to be concerned about data center sprawl?
Virtual sprawl is no more serious a threat than physical sprawl. Virtual sprawl adds a layer of complexity in tracking which host systems are the "homes" to which guests, and it adds a security layer in keeping unauthorized personnel from creating "gray-box" guests. However, virtualizing mitigates aspects of physical sprawl that are equally complex, such as power and cooling concerns, space issues, and even many networking concerns. For the work it takes to manage virtual sprawl, much more work is actually saved in avoiding physical sprawl.
That said, sprawl is an issue that needs to be addressed, and to avoid sprawl becoming a problem, IT managers should always maintain adequate documentation, and the initial virtualization project should consider sprawl during the design phase. For those shops where virtualization has been introduced without a formalized process, going back and documenting the architecture, as well as tuning it to meet the challenge of virtual sprawl, is a significant step forward in maximizing the value of virtualization.
This documentation can often be produced by systems management tools either within the product (such as VMware's VirtualCenter) or outside in systems management software. What about hardware decisions -- should data center managers be considering scale-up instead of scale-out?
I personally prefer a scaled-up approach because there is a reduction in ongoing costs, such as power, space, cooling, and physical maintenance. Also, the complexity factor is reduced when there is less hardware to manage. An exception to that would be data centers without existing centralized storage -- the initial acquisition becomes more expensive in scale-up operations if a SAN infrastructure is not already in place.
Scale-out can work for a time, but in the end, having ten servers at $20,000 (or even $30,000!) each vs. 100 servers at $4,000 each is a big difference in cost, not to mention time spent on management and on controlling sprawl.