IT infrastructure is built once and then operated for years. The real value of hyper-converged infrastructure is in ensuring that ongoing operations are simple, saving operator time every month for years. However, there are still maintenance activities that should be attended to regularly to keep the HCI platform in tip-top shape.
While the most common software maintenance is patching and updating, these updates may unlock new configuration options or enable new automation.
Updates: The never-ending story
Just like any other software, an HCI platform requires updates. Every platform uses a hypervisor, and many also use virtual storage appliances. Hypervisor updates are inevitable, and often beneficial, as they resolve issues and improve performance. Whatever hypervisor is used, it will have some mechanism for updates, whether it is integrated with the hypervisor or uses an external service such as Windows Update or VMware vCenter for updates.
If your hyper-converged infrastructure uses a virtual storage appliance, there will be updates to that appliance too. HCI vendors will often provide an update mechanism that integrates the storage appliance updates and hypervisor updates into a single tool. Ideally, the update tools are integrated into the HCI cluster management to enable updating of the entire cluster without VM downtime. You, as the customer, must still decide when to apply these updates.
Updates can also be triggered if you experience an issue and the vendor support team advises you to update to a newer version to resolve your specific problem. In addition, you should take a proactive approach and schedule regular updates to a test environment before they are deployed to production.
If you do not have a suitable test environment, you may want to wait a few weeks before deploying an update. This enables a third-party organization or other businesses to test the update before it is deployed to the production environment. If you take this wait-and-see approach, you should watch for news of failed updates and replacement update packages. Ensure there are no failures reported for a few weeks before you deploy.
Not having a test environment and deploying straight to production is a significant risk. Whoever controls the IT budget should be made aware of this situation and notified of the costs to implement a test environment to reduce future risks.
One of the challenges with IT infrastructure is that it is configured and deployed to a static configuration and then runs for a long time. When deploying an HCI platform, accepted best practices should be followed, and all associated technology should be deployed to an optimal platform. However, best practice is not a static truth. Over time, accepted best practices change, but the deployed HCI environment usually is not modified to follow new procedures.
When the HCI environment is built, document any variations from default settings and the reasons for these variations. You should also note why important settings were left at defaults. At least once a year, review the current practices and consider whether changes are required to the deployed hyper-converged infrastructure. If configuration changes are made, remember to update all documentation.
One of the core ideas of operational efficiency is to automate routine and repetitive tasks. Part of any ongoing optimization effort should be to identify everyday tasks -- such as incident management, event log creation and low-risk software updates -- that can be automated.
Freeing up operations team members as routine tasks are automated enables an organization to review best practices and configuration details. Automated processes are also less error-prone, require less rework and deliver more consistent service.
An HCI platform should deliver simple operations with minimum human effort. Over time, maintenance activities like patching and updating will be required, along with reviews of the deployed environment for best practices, consistency and optimization.
Hyper-converged infrastructure is designed to enable rolling upgrades, so there is seldom a forced requirement to bulk replace and review deployments. A regular review program and a structured update cycle will keep your HCI platform healthy.