How do I evaluate cloud providers? What are the cloud outage handling strategies?

16-022-supersoniccontract

Cloud computing has always been one of the focus of attention. So in response to your point of interest, we will bring you an introduction to cloud computing:

    How to evaluate cloud computing providers

(1) The software and hardware technologies used by cloud computing providers

Do they use the same hypervisor? This is important to maintain compatibility with what the enterprise currently has and to allow transfer of existing skill sets. If not, retraining may be a hidden cost. Also, are they using a similar or better enterprise-class computing and storage infrastructure? Enterprises must rely on the cloud provider's infrastructure to be as performant, resilient, and reliable as the equipment they will install in their own data centers.

(2) Evaluate shared (public cloud) or dedicated (private cloud) environments

Cloud computing providers have different options for cloud-based infrastructure. If an enterprise chooses to place its environment in a shared platform, then it will share resources with other tenants. It may not mind placing lower tier applications or development systems in an environment that may not be able to guarantee the availability of all resources in a transaction at a lower cost. For more critical applications, the enterprise may need to verify that the cloud provider is not oversubscribed and can meet the workload requirements; the need for complete control indicates the need for a private cloud.

(3) Cloud Computing Environment Management

Enterprises choosing a cloud computing provider must understand and be comfortable with how it interacts with and manages its environment. Is it self-managing, so other computers or networks can be added instantly, or does the cloud provider manage these things for the enterprise? Both options have their advantages, but also different cost points.

(4) Cloud Transparency and Management Levels vs. On-Premise Deployment

When an enterprise migrates its business to a cloud provider, the IT team may no longer have visibility into storage metrics, hypervisor utilization, or even network topology. Therefore, the enterprise should evaluate the tools it needs to have insight into its environment to deliver business value. Alternatively, it should assess the risks and implications of not having management capabilities and visibility. It is important to understand what tools are provided by the cloud provider, from remote access via VPN all the way to integrated consoles.

     cloud outage handling strategy

1. Determine the business value of the disaster recovery plan

Determine what needs to be performed to implement a disaster recovery plan. Some plans are automated. For example, critical workloads are typically protected by some type of cluster, and the cluster should continue to operate even if a node (or instance) fails. However, a disaster recovery strategy for secondary workloads may require human intervention or decentralized steps, such as restoring and restarting snapshots or switching to backup instances.

If human intervention is required, you need to consider the work and costs involved in the recovery process and determine the business value of initiating recovery. Ask if it will take longer and cost more to recover the workload than just waiting for the cloud provider to resolve the outage. Communications from the cloud provider will significantly impact this decision.

2. Implement a Disaster Recovery Plan

In many cases, mission-critical disaster recovery plans may be fully automated, and managers may not need to take any intentional action. For example, even if a node becomes unavailable during a cloud outage, clusters across AWS cloud availability zones or Azure cloud zones may continue to function.

However, less critical workloads may require planned actions. Use prepared scripts, templates, or other resources to orchestrate an appropriate disaster recovery response. When the enterprise decides to initiate a disaster recovery plan that requires human intervention, administrators must take immediate action. This may include restarting from a snapshot or redirecting traffic to a standby instance during a cloud outage.

The disaster recovery plan needs to be tested periodically. Test exercises are performed to ensure appropriate processes and resources are in place to drive workload recovery. Testing also verifies the configuration of relevant resources, such as IP addresses and associated drivers and correlations. If the recovery works properly in routine testing, it is likely to work properly in an actual disaster recovery situation.

3. Monitor the disaster recovery strategy

Regardless of the workload involved in implementing a disaster recovery strategy or the level of automation, it is still important to verify that the recovered workload is functioning properly. Managers should compare the performance of workloads running in a disaster recovery state with the performance of the same workloads running under normal conditions.

Application monitoring tools look at the status of workload operation. These tools also collect logs, metrics, and events to relay operational data about recovered workloads. In addition, they continue to monitor workload performance and availability throughout a cloud outage.