peshkova - Fotolia


Boost application resilience to keep workloads running

Increase application resiliency and availability for enterprise workloads through clustering, replication, snapshots, microservices and application design.

Application resilience and availability are critical attributes of modern enterprise workloads. Applications need to survive hardware failures, work through service faults, such as load balancer and domain name system errors, and tolerate the impacts of LAN and internet disruptions. Each event can impact business revenue, reputation and even regulatory compliance. Here are five ways to enhance application resilience and service availability.

Clustering adds application resilience

Clustering is almost universally used to bolster application resilience, performance and availability. One instance of an application has a finite throughput capacity -- it can only do so much work in a given time. If you push the application beyond its capacity -- perhaps expecting it to process more transactions per second than it can handle -- the application will experience reduced performance or crash and become unavailable. Each additional instance of that application workload can multiply the effective capacity of the application and allow the cluster to accomplish more work than a single instance of the workload could ever handle. This is the notion of clustering for scalability. If the enterprise needs more work from the application, you can deploy more load-balanced nodes to the application cluster.

But additional nodes will enhance application availability, and this is the key to application high availability (HA). If one node fails, the remaining nodes within the cluster will share the computing load. Load balancers can identify a failed node and redistribute application traffic to the remaining nodes. The application remains available. In many cases, users will never even notice the issue. Nodes can be local to the same data center or distributed across different data centers to guard against possible problems to facilities, internet disruptions and other potential threats.

RAID and replication are staples of storage availability

RAID remains the core of storage availability. RAID 0 (striping) offers no data protection but spreads data out across multiple disks, where simultaneous spindles can enhance storage performance. RAID 1 (mirroring) duplicates the data of one disk to another. If one disk fails, the duplicate seamlessly takes over, and you can replace and rebuild the defective disk from the working copy. RAID 5 spreads data and parity across multiple disks, or a RAID group. If a disk fails, the remaining parity information can reconstitute the missing data and rebuild the defective disk's contents. This protects the storage group against single-disk failures. RAID 6 also spreads data across multiple disks but includes a double layer of parity information. This can tolerate and rebuild two simultaneous disk failures -- a technique called dual parity.

You can also combine RAID techniques to achieve multiple benefits. For example, mirror a RAID 5 or RAID 6 group (RAID 1) to a second disk group, which combines performance and ensures rapid data access while rebuilding failed disks. You can formulate a storage protection scheme that is most appropriate for each given enterprise application.

Replication can take place between compatible storage subsystems, but IT often uses it to duplicate data periodically from one data center site to a secondary site or the public cloud -- an off-site or remote location that can help protect against data loss in the wake of a serious facility problem. You can perform replication synchronously between local storage resources, where latency is not a significant factor, or asynchronously between distant storage resources, where latency can be substantial. You can use RAID and replication together.

Snapshots and migration provide flexibility

Virtualization technology allows you to provision, deploy and manage modern enterprise applications as VMs on data center servers. VMs offer tremendous flexibility because of a greater utilization of compute resources compared to physical servers but also an assurance of adequate logical isolation of each VM that shares the same physical system. Even though VMs exist as images in a server's memory space, you still must protect those against server faults and application crashes that could compromise the VM and cause the application to become unavailable.

However, not all workloads are important enough to justify investments in clustering and other application availability options. Snapshots are one common means of copying the VM's point-in-time state in server memory to a file in storage. You can frequently capture and easily restore snapshots -- restoring the application to its state at that moment in time. Often, IT uses snapshots for application rollbacks and quick recovery in the event of application disruptions. You can also use snapshots to create duplicate application instances -- often for application testing, development and evaluation.

It's easy to migrate VMs between virtualized servers, either within the data center or between remote data centers. Migration is typically used for tasks like workload balancing. This allows IT administrators to adjust the number of workloads on a given server to optimize available compute resources or bolster application performance by moving the workload to another server with more compute resource availability. To prevent workload disruptions, preemptively migrate VMs off a server when monitoring and management tools detect issues with that server's health. You can also manually invoke migration to perform routine maintenance procedures on a server.

Microservices, containers offer virtualization opportunities

An emerging trend in application design is to abandon traditional monolithic designs and reimagine applications as a collection of much smaller "stateless" functions or services that communicate commands and data using APIs. This is a microservices approach. You can build, test, deploy and scale each component separately. And since each component is ideally stateless, a failure or fault in one component won't cause significant data loss or instability in the overall application; you can simply restart a troubled component.

It's easier to update a microservices application. While a monolithic application update would require complete functional regression testing, a component update only requires testing of that particular component. Since components exist independently, the stateful interrelationships that often exist in monolithic applications do not exist and do not require testing.

The components that comprise a microservices application are frequently deployed to virtualized containers. Containers are an alternative virtualization technology used to provision a server's resources. Where each VM provides a completely isolated operating environment, each container shares the same underlying OS, drivers and other dependencies. This shared approach makes each container extremely small and resource-efficient, allowing many more containers to reside on the same server. Containers are quickly created and destroyed, so you can spin up and scale the components that comprise a microservices application almost on demand.

Create application resilience in the design phase

Application resilience typically involves a workload's ability to survive after problems occur in one or more of its components -- and still provide the best possible measure of service to the business and its users. This means that you should integrate availability into modern application design and testing.

Part of the application resilience discussion centers on application architecture and design. The microservices approach is just one popular example of an emerging design paradigm that constructs complex, highly scalable applications. Automation and orchestration offer the ability to scale out components and automatically balance the load as traffic demands change over time. For example, if one component or function of the greater application needs to handle more traffic requests, you can replicate that component -- and only that component -- to handle the additional traffic.

Considerations for resilient application design

Resilient application designs, such as microservices, demand renewed attention to testing and vetting each change to the application. Understand how the loss of each component, module, service or dependency will affect the application's overall availability. If the application does not meet those established goals during testing, you should not release it to production. Sometimes, you must improve the availability of related services or dependencies, such as storage, to meet an application's availability goals.

You must also evaluate applications for security as a measure of user authorization and hardness against both external and internal attacks. Weigh the safety of business data, and ensure that only authorized users can access the application's data and the data is protected against loss or theft. This could include some form of identity and access management framework, as well as a level of data encryption in-flight and at rest.

Additionally, evaluate applications for scalability to determine how readily you can scale the workload to meet traffic demands. Modern modular applications tend to scale better because you can often deploy additional components and balance loads faster -- often automatically -- with fewer computing resources.

All of these concepts are frequently accompanied by comprehensive application performance management tools designed to help you understand how the workload performs in production against its design goals.

Applications also often seek to embrace a stateless design. Basically, a stateless function gets all of the data it needs to function from the outside, performs its respective tasks and then delivers its result to the user or another function -- leaving behind no modes, selections, choices or configuration preferences to affect the function's behavior. If the function crashes, you can simply restart it with no net loss of data.

Next Steps

How can software-defined applications deal with workload resiliency?

Although not ideal, outages tend to create better resiliency

Hyperscale cloud offers application resiliency options for admins

Dig Deeper on Emerging IT workload types