2015/04/09

Veeam Data Domain integration X-Rayed

With the latest release of v8, Veeam introduced Data Domain integration. The integration is based on DD boost. But what does that actually mean. Well it means we will do faster backup copy job (and backups) towards a DD.

So what's so good about the integration? First of all Veeam supports "Distributed Segment Processing". The basic idea that Veeam will do the Data Domain deduplication at the Veeam side. The main advantage is DD deduplication is global dedup. If you have for example 2 backup copy jobs, copying each 1 windows VM without DD boost, Veeam will send over the OS blocks 2 times. Simply because they are different jobs.

With DD Boost, when the gateway server (the component that talks DD boost), has to store a second copy of the OS blocks, it will send pointers down the line because it knows that the Data Domain has already stored those duplicate blocks. The main advantage is that there will be less network usage and the DD doesn't have do any processing anymore. Hence the term Distributed (=each client) Segment (chunks of data) Processing. Also the job performance might boost significantly because the second time those same blocks need to be stored, there is no real write occurring on the Data domain, just a meta data update.

Although most people focus on this part of the DD Boost integration, that's not my favorite part of the integration. So in this article I want to focus on "Virtual Synthetics" and what it means. To understand the benefits. Lets first look at how a backup copy job works.

First of, the backup copy job or bcj doesn't copy files. It copies data and it constructs new backup files at the target side. So if you use forward incremental, reverse incremental or forever incremental, the result of the bcj will always be the same. The bcj uses a similar strategy as forever incremental. So lets take a really trivial example. Imagine you created a bcj with a retention of 2 points.

The first day you will create a full backup file.

The second day, you will create an increment file and store only blocks that have changed. No rocket science so far.
The third day, you will create another increment. However 3 points are more than the configured policy of 2. So some action is required.
Just deleting files is not an option. You can not delete the oldest backup file because it is the full backup. This is why the backup copy job does something called a merge.
The idea is that you take the blocks from the oldest increment, read them from disk and then write them back to the original full backup file, essentially updating the changed blocks in the full backup file
The result is that the full backup file is actually representing the restore point from day 2 and so the amount of backup files equals again the retention policy you configured.

But why is that bad for Data Domain? Well Data Domain was designed for sequential write. In fact, it was created to replace Tape and that is why most dedup devices have a VTL functionality.

With tape in mind, remember those old VHS cassettes? If you record one soap (different episodes) on one tape in chronological order and one day you want to binge watching the soap, you could just put in the cassette and push play. No delay when switching episodes, because it is just steaming the tape.

However, imagine you are going out and different members of the family recorded different series and movies on one cassette, when you wanted to play your specific part, you needed to skip back and forward to get to your part. This took some time cause the tape needs to go to that specific point in time.

Benching or just playing the video, in data terms is what we call Sequential I/O. You just read the data in the chronological order that you have written it. Skipping back and forward  to read (or write to) that specific part you need is what is called random I/O. And as with tapes, it is pretty slow. Now if you design your device to act like tape, you can write really fast, but the random I/O kills you.

Well this is why bcj merging is actually pretty slow on deduplication devices in general. It is a lot of random reading and writing to files. So how does DD boost help here? First of all you should understand that DD has meta data where it stores pointers to the data blocks that make up that specific files. But let's not go into too deep how a filesystem works, you got wikipedia for that

When DD boost is being utilized, Veeam will not read block a', d', h' and l' and then write them back to the DD as it usually does. Instead, it instructs the DD to point to the blocks already on disk
Because you are just writing pointers, the "merge process" is fast compared to the regular standard merge process.

So this make the Data Domain an excellent backup copy job target. Especially because you can define GFS on the bcj. So imagine you instruct the bcj to keep 6 weekly backups, you will actually have 7 full backups on the DD, one for every week and one active VBK.

One thing that does not change is the restore time. Veeam restores are pretty random, especially if you need to read from different files. Imagine you have a chain of 14 points, and you want to restore something from the newest point, with the backup copy job, that would mean accessing 14 different files in the background.

If you ask me, I think the DD is an excellent bcj target. Just keep a small amount of files on jbods for example (for example 7 to 14 points), which are excellent in handling random I/O, and then tier them to a DD. If you chose your policy correctly on the main target, 95% of the restores will come from the first tier. However if something bad happens and you need to go back a couple of months, it is acceptable that the restore might take a bit longer in favor of the huge amount of restore points you need to keep. For Veeam, this is actually the preferred architecture