In this post, I want to focus on the availability aspects of migrating a typical application – near-legacy, single VM solutions where Windows or Linux VMs have been spun up to deliver an application all on one server or perhaps a couple of servers.
One of the advantages of virtualization is the ability to encapsulate the contents of a physical server into a number of files on shared storage. In the event of the physical virtualization host failing, the VM can be automatically restarted on another available host in a crash consistent manner. This is known as HA (High Availability) and protects VMs in the case of unplanned downtime; albeit just requiring a reboot.
Capabilities such as vMotion or Live Migration allow running VMs to be migrated from one host to another, without loss of service. This is great for planned downtime, when VMs can be migrated off a host to allow maintenance to be performed on the host without affecting production applications.
There is also the concept of dynamic resource scheduling, where the initial optimal placement of VMs is supported through a cluster of hosts, and then those VMs can be moved around using vMotion to balance load across the cluster from a CPU and RAM perspective. The least active VMs are usually the ones that get moved around in this scenario.
Many IT teams have become accustomed to these capabilities for their traditional applications over the past few years and, understandably, expect the same facilities when migrating apps to the cloud.
However, the hyper-scale clouds such as Amazon Web Services and Microsoft Azure have used a different methodology from the outset – that of ‘design for failure’. When designing applications for the cloud (typically new builds), the idea is to take into account that your VMs are going to fail on a regular basis, and design around that using ‘availability zones or sets’. This ensures that there is always more than one VM carrying out a particular function to account for the fact that potentially at least one is likely to be down at any given time.
Indeed, single VM applications will not qualify for a SLA guarantee, and Microsoft recommends not using single VMs for this very reason.
In the case of Azure, availability sets are designed using the concept of fault and update domains. Fault domains define the group of virtual machines that share a common power source and network switch – while update domains indicate groups of virtual machines and underlying physical hardware that can be rebooted at the same time.
In this way, VMs are grouped together to try to protect against unplanned failure in the event of host issues or host reboots following patching of the virtualization software. As the underlying hosts are running Hyper-V on Windows Server, they are subject to the usual ‘Patch Tuesday’ updates.
While Hyper-V supports Live Migration, Microsoft took the view when designing Azure not to use Live Migration on the platform, and hence the need for availability sets.
This, of course, does not play very nicely with the traditional workloads that we’ve discussed, and often leads to customers having to ‘double up’ on VMs just to ensure that they can maintain service, and this can add significantly to cost.
This has led to many of the ‘pets vs cattle’ analogies for traditional versus cloud-native apps. Administrators cared for their traditional VMs, whereas the VMs associated with cloud native apps can be created and destroyed with no real feeling of ownership, and they might only exist for minutes or hours rather than months or years.
For customers wanting to migrate existing ‘near-legacy’ on-premise applications to the cloud, the iland Secure Cloud
offers all the features that customers are used to in their own facilities – HA, vMotion and DRS, as well as host affinity and anti-affinity rules. With this architecture, iland is able to offer customers a 100% availability SLA and ensure that the cost of running legacy applications in the iland cloud is kept to a minimum without the need to use multiple VMs to ensure availability.