DRaaS with Veeam – Failover and Failback Procedure

ByAugust 4, 2017
CloudNow that we have replicated all of our servers and created a failover plan, it is time to test our failover process. There are several items you will want to test and verify before a real disaster strikes, such as network access, application functionality and data integrity. The best way to ensure the DR environment will operate as expected during a disaster is to periodically run a failover test of your replicas. Previously, we discussed the considerations and differences between a full failover and partial failover with Veeam. Now we will walk through the technical process of starting a failover in both instances, as well as failing back to production.

Performing a Full Failover with the Veeam Failover Plan

Testing a failover plan is the best way to simulate a full site disaster on your end. The replica servers will power on in the iland DR environment and the Veeam NEA will enable your NAT rules if you do not use your own firewall. In this case, there is no communication to the production and DR side, so all access would be granted through the failover plan’s NAT rules. Running the failover plan does not impact production in anyway and all changes are erased once the failover is undone.

  1. It is helpful to disable any replica jobs to iland during a test failover. This prevents unnecessary failures or incomplete snapshots if a job runs while the server is failed over.
  2. In the Veeam console, under the Backup and Replication tab, select “Failover Plans.” a. Alternatively, you can start a failover plan in the Veeam web portal. The link and credentials to that will be provided by the iland Cloud Services team.
  3. Right click the failover plan you wish to test and select “Start.” You can also select “Start To,” which allows you to pick the most recent checkpoint from a given time. For example: Start To the most recent checkpoint from 7/20/2017.
    a. There is also a test option in this list. The test function powers up the servers and then does a heartbeat test. Once the test is complete, Veeam shuts the servers down and ends the test. This is typically pretty brief and does not allow actual access into the servers.
  4. Screenshot 1
  5. After selecting start, you will see the Veeam statics popup showing the status of the failover.
  6. Screenshot 2
  7. Once the failover plan is complete, all servers should be powered on and running. a. Keep in mind that it may take a few minutes for specific services to start, such as IIS.
  8. At this point, you should be able to access the servers through the ports you have configured in the failover plan. For example: HTTP/HTTPS, SSH, RDP
  9. a. If you are unable to access any of your servers or notice any other issues, please feel free to call or email the iland Support team.
  10. To undo the failover plan, you can right click on the failover plan in Veeam and select “Undo.” This will power off the replica VMs and set the servers ready for replication again.
  11. Screenshot 3
  12. Once the undo failover operation completes, the replicas will be ready for replication again. a. All of the changes made to the servers during the failover will also be erased during the undo.
Performing a Partial Failover in Veeam

You may just need to test the data or functionality of one particular server. This can be done with a partial failover within Veeam. If you are using the Veeam NEA, you do not have to make any networking changes, as the appliances on both ends create a Layer 2 bridge. So, the server you failover will be accessible internally from the production side. Keep in mind that this may cause some issues when accessing the replica server if the production server, with the same IP and hostname, is still online. In these cases, it may be best to change the IP, or disconnect the production server temporarily while you test. In the case of a live server, this is usually not an issue as the production server would either be powered off or already inaccessible.

  1. In the Veeam console, under the backup and replication tab, click the replicas icon in the left-hand navigation.
  2. Right click the server you wish to failover and choose “Failover Now.”
  3. a. Keep in mind that partial failovers can only be started from your Veeam console. Individual replicas are not able to be failed over from the Veeam web portal or from the service provider side. Screenshot 4
  4. Once the failover process is complete, this server should be accessible internally from the production end. Keep in mind that since the failover plan was not used, the NAT rules created within the plan will also not be in use.
  5. When testing has completed, you can right click on the replica server again and choose to undo the failover. Like the failover plan, when the undo operation completes, the replica is ready for replication and any changes made during the test will be erased.
Failback to Production

When failing over servers in a live disaster scenario, you will most likely need to failback to production. Because the undo operation erases all changes made on the replicas, we will want to avoid using that after a real disaster. When performing a failback to production, Veeam will set up reverse replication from the iland DR site back to your production environment, and then failover back to your site. This process does cause downtime as both servers will be down during the failback process. It is best to schedule this process during a maintenance or outage window.

  1. In the Veeam console, under the backup and replication tab, click on the “Active” node, which will be located under Replicas.
  2. This should show your replicas that are currently failed over. You can select one, multiple, or all servers, and either click “Failback to Production” at the top or right click the replica and choose “Failback to Production.”
  3. Screenshot 5
  4. In the Failback Wizard, you will confirm and select the servers you wish to failback. Check each server to failback and click “Next.”
  5. On the destination page, you will need to configure the destination at the source production environment.
  6. Screenshot 6 a. Failback to the original VM: This will set up for the server that was failed over at iland to replicate back to the production source environment using the original server as a seed. This means that the original server on the source side will be powered off if it is still powered on and the server at iland will overwrite the data on this server. Once the process is complete, the original source server is replaced by the failback server with all the changes made while at iland still intact. During this replication, Veeam will only replicate the changes made to the server that was been powered on at iland.
    b. Failback to the original VM restore in a different location: This option is used if your vCenter infrastructure was changed during the failover event. If the hosts or datastores had been removed from your vCenter environment, or if the resource pools have changed, this option allows you to select the new hosts, datastores, or resources pools that are in use. This option will overwrite the original server and use it as a seed the same way it does on the first option.
    c. Failback to the specified location: This option can be used if the original source server was completely removed from your environment. You can also use this option if you wish to fail back the target server without overwriting the original server. You will need to specify the host, datastore, folder and resource pool to be used for the replica. Please note that this option does require the server at the iland side to perform a Full Replica of the server since it is not using the original server as a seed. You may also want to leave this server powered off or disconnect the NIC once the failback is completed to avoid any IP or Hostname conflicts.
  7. If you are specifying the location, you will need to select the target datastore, host and other information to the source environment.
  8. Once you have everything configured to your preference, you can click “Finish” on the summary page. You will also have a checkbox to automatically power the server on once the failback is complete, or leave it powered off.
  9. Once the failback is initiated, you will get a status of the reverse replication. Veeam will power off the replica server at the DR site and you will need to ensure the VM on the production site is powered off as well if it still exists. With both servers powered off, the failback can occur without any new changes being made that would not have been replicated during the reverse replication process.
    a. During this time, it may be wise to undo any other changes that made outside of Veeam during the failover. This may be DNS changes or any changes to user access while running on the DR site.
  10. Once the Failback operation completes, the replica server at the DR site remains powered off and ready for replication again.
As mentioned previously, it is a good idea to inform iland of any live failover events. The iland Support team and engineers will ensure that your recovery environment is running as expected and will continue to monitor everything during the failover. We can also work with you to strategize and plan the failback operation to your production environment and ensure that it completes without issues.
Mike Mosley

Mike Mosley

Mike Mosely is a cloud engineer at iland and has worked at the company for over 3 years. He holds a number of VMware certifications including VCP5 as well as the Veeam VMCE certification. Mike works closely with customers to build cloud solutions that fit their requirements.