2014/01/29

What the cloud?!!

Recently I have noticed a lot of people using Veeam B&R for "Cloud" solutions. Cloud for me personally has become something indefinable because people are just using it for everything that is even remotely connected to the Internet (literally and figuratively speaking).

For Veeam users, "cloud" seems to be the challenge to get the data to a partner or a second location. But the possibilities here are not 1 solutions fits all. Further more the naming is sometimes a bit confusing -I admit- and causes people to uses the terms incorrectly. So here is my attempt to clarify the different possibilities and naming.

First of all I made this diagram to clarify some stuff. I suggest you take a look before continuing reading.

Replication Job (Part of Backup & Replication)

Replication job is what I started calling "Hypervisor Replication" as it is comparable like "Storage Replication" at SAN level. With replication, Veeam will backup a VM and restore it on-the-fly to a second vSphere (or Hyper-V but not cross hypervisor type) environment. The effect is that data is stored directly at the hypervisor level volume level (VMFS). The great advantage is that because your are replicating at a different layer in the infrastructure stack, Veeam is hardware independent which is the promise of virtualisation in the first place. Starting up the VM doesn't require you to implement a solution or scripting to resignature LUNs or re-register VMs. They VMs are just there, ready to be started. Veeam Backup & Replication console can assist you to change the network settings matching the target side. Also Failover and Failback can be done from the console. In this case, Failback will only need to sync back changes.

The great thing about hypervisor replication is that will copying, Veeam can make the data consistent (VSS consistency). You can even have multiple restore points so that you don't have to Failover to the latest replication (which might contain the initial corruption). These restore point will be implemented as snapshots at the target side

For replication it is advised to have a proxy at both sides. These will start a stable TCP/IP connections to each other to transfer the data. The primary proxy will do a disk level deduplication and some compression before sending the data over the wire to the receiving proxy. This proxy will "unpack" the data and inject the data back in the hypervisor environment. So for replication Veeam doesn't offer WAN Acceleration but does does offer WAN Optimization. This is important as some solutions sell WAN Optimization as WAN Acceleration.

So Replication is mostly used for DR and does not create backup files and although multiple restore points are possible they are mostly limited by VMware (28 snapshots) and disk space. What I noticed recently is that people are using the term replication to describe the backup copy job.

Backup Job (Part of Backup & Replication)

Backup job is what people get right in 99% of the time. It is a job that will take the data and store it in the Veeam proprietary format. Why not store it in the native format? Well it is all about disk savings!

With Veeam there are 2 strategies: (Forward) Incremental and Reverse Incremental. Both have their advantages and disadvantages. This is sometimes an overlooked setting which can have an impact on off-loading jobs. These are also job level settings and so if you look in a repository, each job will create a folder which contains the proprietary backup chain.

A job can have multiple VMs and this is good because deduplication will be done at the job level backup chain files. So try to group similar VMs together to get better storage savings

With Forward Incremental you will start the first day by creating a full backup of all VMs and store that data in a VBK format. Inside this file/format, compression and deduplication is applied. Next day and incremental VIB file is created, only saving the data that has been changed. The process will continue the next day and more VIB files will be created. There is a catch however, these VIB files are linked to each other and to the first Full. So imagine you setup a retention policy of 3 restore points, you can't throw away anything after 4 backups because the last VIB is still dependent on that first full. To solve this issue, Veeam force you to run an active full or synthetic full effectively discontinuing the old chain and starting a new chain. Active full means you will just read all the data from the production storage and create a new full VBK like you did the very first time. Synthetic full means that you will first run an incremental backup. Then as a post process, you will create a full VBK based on the data that is already in the repository, thus allowing you to create forever incremental backups.This post process is quite I/O intensive as it needs to read and write all the data. Thus you could for example spread the load of creating these synthetic fulls over multiple day by creating multiple jobs and letting one job run synthetic full run on Friday, the other one on Saturday and the other on Sunday, etc.

With Reverse Incremental the strategy is different. First day you will create a VBK file like with  Forward Incremental. The second day however you will update this VBK file. I will always explain it as "copy-on-write". Before Veeam makes a change to the VBK data, it will first read that data store it in a VRB (Reverse Increment file) and then overwrite it. The effect is that you get incremental backups but your latest backup is always the full backup. In effect you will create a reverse backup chain. And so in case you configured 3 restore points, you can throw away stuff after the 4th run because nothing is dependent on the oldest VRB file. Rather the last VRB is dependent on previous VRB/VBK files. With Reverse Incremental you can only run an active full as synthetic full wouldn't make sense. This will also discontinue the chain as it will create a new VBK from scratch leaving the old one in place.

Veeam Support has made some great animations to make everything a bit more clear, if you got lost in translation: http://www.veeam.com/kb1799

I always advise people to run an active full every 1 month refreshing all the data and making sure that 1 corruption does not stay forever incremental in your backups. If you have multiple jobs, you can spread the load over multiple weekends. 1 job can create a full the first week, another one the second week, etc.

So when to use what? Use Reverse Incremental if you only backup to disk and optionally use the backup copy job. It will save space as it doesn't force you to create weekly (synthetic) fulls. However if you want to copy the data, the copy process will have to process a full VBK on a daily basis because Veeam updates the VBK. The only exception is the backup copy job because it has some built in intelligence as it is native to backup & replication.

Further more, the copy-on-write process will require more I/O because of the read/write/update process. Finally this I/O is fairly random albeit that a relatively big block size is used. So some deduplication devices might not like it so much. In these cases, use the default Forward Incremental, sacrificing disk space but only the need to copy a VBK once a week with any other copy process.

Backup Copy Job (Part of Backup & Replication)

First of all, backup copy job is not replication :D. I can't stress this enough. The backup copy job does +- what its names suggest it does, except it does not. It will copy backup data but not VBK/VIB/VRB files. This is confusing for people but it adds great flexibility. When you create a new backup copy job, and you link a primary job it won't actually link the primary job but rather add all the VMs to the backup copy job that you configured in the primary job. When the backup copy job runs it will look for restore points for each individual VM in all backups, copy the latest restore point and create its own backup chain in the second repository. This look a bit like forward incremental but after the retention policy is fulfilled the oldest increment (VIB) will be rolled into the VBK file. This is sometimes confusing when people use Reverse Incremental on the primary job as they don't see VRB files but VIB files. Great thing is that the backup copy job is granular as you can pick out maybe only a subset of VMs to copy.

GFS policy is part of the backup copy job because you could use backup copy job to tier between fast and slower storage for longer retention. In this case it is important to understand that the backup copy job will almost run in sync with the primary job. So if you configure 7 restore points and 14 restore points on the copy job you won't have 21 unique restore points. Rather you will have 7 restore points that have a duplicate twin because of the backup copy and 7 older unique restore point after the backup copy job has synced.

The great thing about the backup copy job is that it is the first and only job in v7 that can make use of WAN Acceleration (so not only optimization). This is done by adding a WAN accelerator role (Windows Service) at both sides. This WAN accelerators will keep a global cache which can be shared by multiple backup copy job and multiple runs. Further more fingerprinting will be done at a very small block level so to give you real WAN acceleration. The downfall is that processing will take longer and this is the reason why it is part of the backup copy job and not of the primary job.

Important thing about the backup copy job is that the target is a repository. So if you want to copy the data to another location, Veeam will need to have an data mover/agent at the other side. This agent can run on a Linux server or a Windows server. However if you want to use WAN acceleration, you will require a Windows server as this service is Windows x64 only. Good news is that, you can have both WAN Acceleration and Repository role installed on the same Windows machine. Further more the source backup architecture will need to be able to talk to this windows server over IP, so some kind of VPN/MPLS solution has to be in place.

Backup Copy Job and Cloud backup are +- complementary "jobs" however backup copy job requires you to have CPU, Memory and Storage at the "cloud side". Cloud backup does not but then again can't do any WAN Acceleration.

Backup to Tape (Part of Backup & Replication)

Going to be fairly short about it. It does exactly what you think it does. It copies the backup files (VBK/VIB) to tape. In this case it makes sense to use forward incremental if you do a daily backup to tape. With Reverse Incremental, it will have to copy a VBK file each day to tape potentially consuming a lot of tapes.

Important notice, Veeam will be restore point aware if you use backup to tape. That means, it knows where which restore point for which VM is located. However since it copies VBK/VIB files, restoring means it first has to stage those file to a repository and then allow you to do the final restore. It will be automagically but it might take time. It is one good reason to split up job because to restore a VM that is 50GB but is stored in a VBK of 4TB (because you added a lot of VMs to the job) requires to first stage the whole VBK (and additional VIB if required)

If you are thinking about vaulting, you can create a second media pool/backup to tape job next to your local backups to tape. This job can be configured so that at the end of the run it will export the tapes to the I/O slots, ready for some field engineer to get them physically out of the library and bring them to a remote location.

Cloud backup (Not part of Backup & Replication)

Cloud backup is part of the Cloud Edition Veeam offer. It is an add-on solution which can not bought separately.  When you buy Cloud Edition, you get Backup & Replication & Cloud backup. Also you don't really buy Cloud Edition but you rather rent it on a yearly basis (subscription fee). This make Cloud Edition a good OPEX solution instead of a CAPEX buy it all at once solution.

Basically what Cloud Edition does is, it copies backups file (VBK/VIB) to the Cloud. In this case, cloud can be Amazone, Azure, HP or even local players offering Openstack SWIFT storage. What is important is that Cloud Edition will talk to Storage API to store bits and bytes. This means those APIs don't need to be Veeam aware as it doesn't processes those bytes but just stores them. So no Veeam components need to be installed at the other end and so there are already a lot of partner offering potential "Veeam backup storage space". Further more, since the API are HTTP based (mostly some form of S3), no complicated VPN has to be setup for each customer. Finally, solutions like Openstack are multi-tenant at their core so a Partner does not need to set up different Openstack servers for each customer.

One of the cool features that is in cloud edition is that you can also use encryption. In this case the backups will be encrypted before sending them to the cloud.

The downfall is that features like WAN acceleration are not possible as Veeam can not install any service at the other side. This means that copying might consume more bandwidth. Furthermore, doing a FLR can't be done in the cloud by importing the backups there, but you need to copy back the whole VBK and import it back in backup and replication. For Amazon there is some scenario where you can use an EC2 container and then do an extract in the cloud.

With Cloud it is also advised to use Forward Incremental instead of Reverse Incremental.

Again, Backup Copy Job and Cloud backup are +- complementary "jobs" however backup copy job requires you to have CPU, Memory and Storage at the "cloud side". Cloud backup does not but then again can't do any WAN Acceleration and does require a different license.