2015/11/27

Veeam Application Report

As many as you know, although Veeam B&R has an agentless approach it still makes sure that all the applications are consistently flushed just before the backup starts. To do this, Veeam B&R leverages VSS. One thing it also does is, it tries to detect what applications are installed in what VM. This data is collected so that during restore, you don't have to figure out which VM is holding what application and where exactly the application database is stored inside of the VM (for example for Exchange, it will detect the path(s) leading to the EDB(s)).

Now a fellow SE colleague requested to add this "application detection" to the main GUI. They wanted to leverage the detection to sort out with VMs have what application installed. Adding it to the main GUI would however make it more complex but you can actually leverage the data via Powershell.

So here is a sample script you can use as a starting point:
https://raw.githubusercontent.com/tdewin/veeampowershell/master/veeam-per-app-detect.ps1

It generates a nice clean report with all the VMs that have detected applications (yes even Oracle so it is v9 ready), grouped per application. The output should look something like shown in the screenshot below:


Enjoy!

2015/10/09

RPS 0.3.2

Just a small update (which required some re-engineering under the hood).

First of all, when you click the backup files size, you get a small "pop-window". This will tell you the uncompressed and compressed bandwidth usage over multiple interval. It should help you to understand how much processing power you need for a certain amount of input. The first 2 lines are uncompressed in byte and bit. the second two lines is the data 'compressed' in bit an bits. The columns indicate your time window. Notice that clicking on the full file or the incremental file gives different output so you can check full runs vs incremental runs real easily.


The second feature is available when you click the total file size. It will give you a table overview of the output, which you can easily copy it to excel or calc. The numbers are all in GB so they give a predictable output.


The final feature is a very small one but I really like it because it took literally 10minutes to code but will remove some frustration. During a recent conf call with my colleague Johan Huttenga, I noticed he was struggling with inputting 30Tb of data. He needed to calculate it by multiplying 1024 by 30 to get exactly 30TB and not 29,99 TB. So in this version, you can input 30TB, and it will be automatically converted to "30720". Same for 1PB to "1048576". The input is case insesitive so tb,Tb,TB,pb,PB,pB, etc. should all work. For example, fill in 1TB like shown below


As soon as you push enter, tab to another input or click on the simulate button, the input will be dynamically changed.

2015/08/25

Global backup report in Powershell

A lot of partners, try to go that extra step and also manage onsite Veeam backup environments. They mostly want a report with all jobs status from all Veeam backup servers, instead of plowing through 100's of emails sent by multiple backup servers

Enterprise manager allows you to do that. You can add multiple backup servers and it will give you a global overview. However, it also acts like a license manager. So if you have different customers, with different licenses, you can not add them all together in one Enterprise manager.

A way around it would be to create some Restful API integration. That would be the cleanest way to do it in my humble opinion. However if you want to have a quick hack, you can also do it by using Powershell. Just launch a remote session to all those backups servers and collect the data.

Now a lot of people just need a small "sample" script, just to get started. So here is a basic "sample". It is surely not future complete and has very poor error handling. But it can get you started.

So the first part defines the instances. Granted it would be cleaner to take the table, convert it to a csv, and then import it at the beginning of the script. The instances table exists out of objects that define: the customer, the backupserver, the username and then the password in an encrypted form. Not sure how to get the password in an encrypted form? Just use the code at the top to generate what you need. However make sure that the whole password doesn't have any line breaks when you copy past!


Resulting in the pre-created code


After correct copy/pasting and removing line breaks, you should get something like this


If you then run the code, it should connect to all the instances, execute some Veeam PS Code, built a table and then collect it centrally. The end result? A $globaljob table, which you can then use to build a csv report, html report, one big email, etc.. Hope it can be useful to somebody as a starting point!

2015/08/17

Getting the correct input into RPS

In a first article about the Restore Point Simulator (RPS) I talked about the history of RPS and why it was created. I want to take the time now to explain the correct input parameters. Although I added some tool-tips in 0.3.1, people are sometimes confused how it all ties together.

TL;DR? The RPS GUI maps quite directly to the Veeam interface, check the screenshots to see how, if you already understand how Veeam works.

For those that didn't read the previous article, I'll repeat the formula here which we use at Veeam to do rough estimations. Why? Because RPS is directly based on it:
Backup size = C * (F*Data + R*D*Data)
Data = sum of processed VMs size by the specific job (actually used, not provisioned)
C = average compression/dedupe ratio (depends on too many factors, compression and dedupe can be very high, but we use 50% - worst case)
F = number of full backups in retention policy (1, unless backup mode with periodic fulls is used)
R = number of rollbacks (or increments) according to retention policy (14 by default)
D = average amount of VM disk changes between cycles in percent (we use 10% right now, but will change it to 5% in v5 based on feedback... reportedly for most VMs it is just 1-2%, but active Exchange and SQL can be up to 10-20% due to transaction logs activity - so 5% seems to be good average)
This formula and RPS have 3 parameters in common. Data would be the first one and it maps to "Used Size GB". To give you and example, if you would have a VM with one VMDK, the used size would be the amount of blocks the VM has already written to. So for example if you would have a thick provisioned VMDK of 50GB but you only use 20GB inside the guest, the used size would be around 20GB. A more correct definition would be what the VM is actually using when it would be thin provisioned. Because Veeam backs up at the block level, a thin provisioned disk would be exactly what Veeam needs to process during a full backup (and some meta data).

D or delta is the amount of data that changes between backups. In RPS the parameter is called change rate. This is dependent on 2 parameters. First of all the frequency of backups which is in most companies "daily". Secondly the application. Don't put this parameter to low. Veeam backups up at the block level. A small update, can flag a bigger block on the VMDK level than you estimated. If you fill up a disk sequentially like a file server, you won't notice this  so much because 10 sequential changes could be only flagging 1 block. However, if you are doing a lot of small random I/O on various locations, the number can quickly rise. So from my experience,  a number of 5% changes is fairly optimistic while 10% is rather conservative. I personally prefer a more conservative approach.

Finally C or compression which is called "Data left after reduction". I have had tons of people discussing this parameter with me because you can interpret this one in various ways. However look at the formula and it will all make sense. The formula basically says:
Backup Data = C * (Total Data In)
So C is a factor that should make "Total Data In" smaller. So the smaller the number, the smaller the backup. Compression factor is thus a percentage that tells how much data is left after it has done it job. If you define 40% (40/100) and consider a "data in" of 100, the backup data would be 40 = (40/100)*100 . If you define 60%, you actually tell the engine that you expect worse results because the end result would be 60 = (60/100)*100. So the lower the compression value, the better the compression.

For some people this feels counter-intuitive. If you prefer compression factors like 2x or 3x instead, you can easily convert those by dividing 1/(compression factor). So for example 2x would be 1/2 = 50%. If we put that in the previous formula, we would get 50 backup data = 50%*(100 total data in). So data was compressed to 2x. For 3x, you would get 1/3 or 33% =~ 35%. Again if we fill this into the formula you would get 35 = 35%*(100). Use this link if you really want to use 33%

If you want to disable compression, put data reduction to 100%.

In the formula there are 2 parameters left, F & R. Like discussed in the previous article, these one are hard to calculate, and that is exactly what RPS does for you. So for retention points, you should just define your policy. If you need 14 daily restore points, put in 14 for "Retention points" and daily for "Interval". You will see that in some case, you might end up with more restore points than you configured. This can be normal as Veeam considers your retention policy but also dependencies between Fulls and Incrementals. So don't try to adjust the retention points because you feel it miscalculated, instead take a look at the example given in previous blog article.

What also influences the amount of parameters is the style parameter. This actually maps directly to Veeam GUI.  For example, "Backup Copy Job" (BCJ) should be quite easy to understand. It refers directly to a Backup Copy Job . Selecting should it should also show you BCJ specific configurations


For the other "styles" called Incremental and Reverse, the other setting map mainly to the advanced settings of a regular job except for "Retention Points"


If you selected Incremental without any active fulls or synthetic fulls, you would get a forever incremental job. It also maps directly to the GUI but more specifically to the advanced settings of a regular job. Granted, the checkboxes are under the buttons "Days" and "Months". Also there is no global "enable" checkbox for synthetic or active but do understand that checking one of the boxes enables "Active" or "Synthetic". Finally "Monthly" has preference over "Weekly". In the Veeam GUI, you can selected only one, here you can enable checkbox for both but weekly will be ignored when you enabled one Month.


So if you want a weekly synthetic backup, it would be like shown below. I did not implement "Transform", and with good reason. Until the day of today, I have received 0 requests for it.


A Monthly Active Backup would like shown below. Important here is to know this does not defines GFS policies. Those can only be defined in a Backup Copy Job. I once had a guy that checked only January because he want to have a "yearly full backup for archiving". He was quite surprised by the result. Basically you just told the engine that you want to have a yearly chain which only get resets in January.


Finally "Reverse" incremental, can be found in the same place.


That leaves only 1 parameter left, and that is the growth simulation. It is a recent addition to RPS, and personally I think it is one of the coolest things added to it. Let me explain what it does and how it works. If you need to size for the following 3 years, and you know that you have a growth rate of 10% on a yearly basis, you can just add that to the RPS. What it does is, it takes your "used data", and grows it on a daily basis via: Future Used Data = Used Data * ( 1 + 10% ) ^ (Day X/365). Thus on the last days of the simulation -after 3 years-, it would put Future Used Data = Used Data * ( 1 + 10%) ^ (1095/365).

So lets imagine you have chosen reverse incremental, 3 years simulation, 10% year on year data growth and 1000GB of data. For simplicity, let's disable compression. Calculating this manually would give you roughly 1000*(1+10%)^(1095/365) = 1000*(110%)^3 = 1000*(1,10)^3 = 1000*1,10*1,10*1,10 = 1331. Now check the Full backup in the configured example. Also interesting is that the increments from 2 days  ago (retention point 3), has a smaller "Future Used data" set. It used 3y - 2 days in the formula, thus Future Used Data = 1000*(1+10%)^(1093/365) =~ 1330,30.


The differences are small in regular jobs but once you go GFS, the time growth calculation can have a dramatic impact which is hard to calculate with excel.


With this, I have discussed all parameters. Some might ask, what about "Quick Presets", what does that do? Well they are just quickly preconfigured scenario's that you can use. For example, if you want to have a monthly active backup, you can click all 12 select boxes or you can just select the matching style + monthly active full to quickly configure this scenario

If you made it this far, thanks for reading and enjoy playing with RPS.

2015/08/05

A brief history of the Restore Point Simulator

During the development of the restore point simulator, I often have encountered questions from users that led me to believe that it is not always clear how to use the tool and what it can do for you. In this blog article series, I want to take the time and explain you  why RPS was developed in the first place and how you can use it.

In the beginning there was nothing, just our famous formula to calculate repository spaces. I'll quote it here because it is still the main idea behind RPS. Many Veeam SEs had there own excel configuration sheet to quickly spit out some numbers, some more pretty than others.

Backup size = C * (F*Data + R*D*Data)
Data = sum of processed VMs size by the specific job (actually used, not provisioned)
C = average compression/dedupe ratio (depends on too many factors, compression and dedupe can be very high, but we use 50% - worst case)
F = number of full backups in retention policy (1, unless backup mode with periodic fulls is used)
R = number of rollbacks (or increments) according to retention policy (14 by default)
D = average amount of VM disk changes between cycles in percent (we use 10% right now, but will change it to 5% in v5 based on feedback... reportedly for most VMs it is just 1-2%, but active Exchange and SQL can be up to 10-20% due to transaction logs activity - so 5% seems to be good average)

This formula has some difficulties. First of all the (C)ompression ratio and the (D)elta are difficult parameters to estimate but it does give you some hints what we at Veeam use inside and a fairly good explanation why these values are chosen. But more difficult are F and R. These values define how much full backups you will need or how many incrementals you need. With reverse incremental / forever incremental, that is quite easy to calculate, you'll have F = 1 and R = rps - F. 

However when you talk about weekly synthetics or active fulls, the number is rather difficult to calculate. Even Veeam users do not always understand the effect of a certain policy. For example, if you configure a forward incremental with weekly full and 2 restore points (rps), you can expect up to 9 rps, because of dependencies. I had countless discussions with customers arguing that Veeam did (does) not respect their rps policy, when in fact it does its absolute best to respect your policy. If you run the simulation, you can actually see the dependency. In the fist column (called Retention), you will see something like 3 (2) or 4 (2). This means that point 3 or 4 are both kept because point (2) is dependent on it.

Now if you want to excellify this, you can come up with something like F = #Weeks + 1, R = (F*7*#DailyBackups - F).  Imagine 14 rps with daily backup, that would be F = 2+1 =3 and R = 3*7*1 - 3 = 21 - 3 = 18. Well that would be really close to what RPS says, but explaining that to people does take some time and it is not always accurate but more guesstimation.

Another common misconceptions is that a monthly backup would require less space than a weekly backup. While this can be the case, remember that a monthly backup would create a chain of 30 points. If you configure a policy of 14 points in forward incremental with monthly full, the worst case scenario would be 12 days after a second full backup is created. This because you got 12 increments dependent on the current full, but you need to keep the whole previous chain, because the oldest restore point is an increment dependent on a previous full backup and a chain of 30 increments. If you configured weekly full, a chain would be maximally 7 days, so less would be stored. This can exponentially grow when you do for example a backup every 12 hours or even more. However if configure for example 60 restore points, a monthly full backup can be cheaper than a weekly full backup. The more days worth of restore points configured, the more likely a monthly full backup will actually consume less space.

These 2 examples, show exactly why RPS was made. Different customers cases require different approaches. Also, it reconfirms that assumption is the mother of all mistakes. So explaining how retention works without very difficult formula's was actually my main goal when the first edition of RPS was made.

Another example is my new all time favorite that shows why what feels natural is not always reality. Some months ago, a partner thought a forever incremental backup chain of 365 would be more efficient than a GFS policy with 12 fulls.  This even surprised me the first time I ran it because incremental backups feels more lightweight. I remembered from my v7 SE training that GFS should be more efficient, but just running the simulation reconfirms this.

It is true that forever incremental versus weekly full is so much more efficient in terms of disk space savings. However 30 incrementals, quickly add up and for long time retention, a monthly full could be more efficient. There is one caveat. With 365 increments, you do have more granularity than 12 full monthly points in time. However, I do want to remind you that those 12 full backups are completely independent of each other. So a single bit rot corruption would only impact one point, while in a 365 restore point chain, this potentially impacts the whole chain. So I think in the majority of cases the more efficient disk usages and the in-dependency of points is better than a very long chain of increments, but hey, it is up to a company to decide their policy.

Finally, I remember one of the major updates was adding GFS support. Calculating and explaining GFS policies is nearly impossible with Excel. Why? Imagine you configured weekly backups to keeps the restore points of Sunday and you configure monthly backups to keep the restore points of the first Sunday of the month. In this case, the first Sunday of the month backup, could be used to satisfy the weekly backup policy as well as the monthly backup. In fact this is what Veeam does. So if you configure for example, 12 weeklies and 3 monthlies, you would assume that the amount of full is 12+3+1 (1 for the simple retention policy). However this is not the case. If you configured your policy correctly so that weekly and monthly points can coincide (schedule button), you will actually get less points. You can see this common points again in the retention column. "10W 3M 0Q 0Y" means that it represents the 10th weekly point but also the 3th monthly point.

@poulpreben (if you don't follow him on twitter, do it now) and I spend hours discussing how we could calculate this with formula's. We concluded that the only way to actually do it was to emulate what happens inside B&R on a daily basis for some time. In fact that is what RPS does. If you configure a retention policy, it will try to predict a period of time in which the worst case scenario should occur (most data on disk). This is why, when you configure 5 yearly backups, it takes some time to calculate, because it will run over 2000 days trying to simulate the behaviour of B&R.

So TL;DR? Don't just assume, run it through RPS. Be critical with the results of RPS (Software can contain bugs) but also try to understand why something is different than you first estimated.

2015/05/05

Veeam Job Managers, Veeam Agents, Tasks and more..

One of the most common statements I hear is that Veeam is good for SMB. Maybe with good reason. The GUI is simple and sleek. It doesn't require you to study 2 weeks just to install and configure some jobs. This was done by intention and makes it a good solution for SMB, but not exclusively for that segment.

A lot of green magic is going on under the hood. The application is created so that it works out of the box in smaller environments. However, when you are in a bigger environment, it might be interesting to dig a little deeper and understand the global architecture. Most of the problems I see when talking to people are just mis-configurations or not understanding how it all works together.

When I start talking about proxies and repositories, to my surprise, I still see a lot of people wondering what it all means. Hopefully after this blog posts, or everything is clear, or you are so fascinated that you want to know more.

Before we think about backup, let's look at the scalability of a movie theater. If we look at people handling the ticket process, we can see a couple of people that are important.

First of all, we have people that are selling tickets. They are actually executing all the hard work. They do a constant repetitive process of taking money and exchange it for a ticket. There job is fairly simple in what they need do, but they are quite busy.

At the gate of the movie room, there is somebody checking those tickets, and letting people inside the room. Maybe they can handle multiple tickets, but only person (or group) is allowed into the screening room at the time because otherwise it will be all messy. It is clear, that scanning or checking a tickets is a much simpler process then selling the whole tickets. Less interactions is required. Thus the ratio, cashiers vs doormen is different

At the heart of the system is someone that regulates the whole chain, let's say the floor manager. He decides where cashiers are sitting and thus in effect which customers they are handling. He is instructing the doormen when they need to come to work and when they can take a break. Most importantly, he is not doing any of the real physical work but it is the brain of the operations.Without him instructing, nobody isn't doing anything.

Scaling out is easy. If you run a small theater, you might have only 1 person doing all the jobs. This of course simplifies stuff. However people are not so good in multi threading. So if you need to scale out, you can just hire more cashiers and doormen. These do the hard physical work but require minimal training because they just repeat the same job over and over.




Well Veeam has a very similar chain of responsibility. They are not always apparent but scale out is possible. So how does a movie theater map to Veeam? Well in v6 , Veeam introduced Veeam Agents (VA) or Datamovers (Enterprise Scalability). These small binaries are like the cashiers and doormen. Actually a VA has the logic for both being the doorman or being the cashier at the same time. Still these routines are pretty simple and the Veeam Agent does not have any GUI or anything.

So what do VA's do. Well the task or source agent will read the data from production, execute compression and vmdk level deduplication and send it to the job or target or job agent. The job agent takes the data, deduplicates at the job level and writes it to the backup file. Above it all is the job manager. It will instruct task agents and job agents to work together. However except for scheduling work, it is not really interacting a lot with the VMDK data itself. It just checks that all VM's and VMDK's are succesfully backed up that are configured in this specific job. Don't confuse the job manager with the backup server itself. The Job Manager is just middle management. It is the backup service that will notice that a job is ready to be scheduled. But instead of handling multiple jobs itself, it delegates this to different floor managers. However, he has the overview of theater and ensures that everything is running smoothly.

Already we can see a couple of interesting stuff:
  • A source agent handles individual VMDK's
  • The target agent takes multiple VMDK's or tasks for that specific job and writes it to one file. The reason is quite simple. Writing from multiple process to one file can create a lot of confusion about who is updating one segment of the file, especially if we look at metadata updates. Also job level deduplication is quite difficult if different process are writing to the same file. In case you need multiple streams, you need to power on multiple jobs at the same time.
  • The Job Manager runs on the backup server
  • The VA's can run on separate servers, called proxies or repositories. However, for VMware environments, by default Veeam creates a proxy server and a repository server on the main backup server itself. This gives the impression that Veeam is not scalable because it works out of the box. Don't be confused, if you want to run everything on one physical big physical servers, that's fine. However the scalability is there. For Hyper-V environments, Veeam will actually put the VA on the Hyper-V host itself by default. This is what is called an "on-host" proxy.
  • I once got the remark about a customer that he needed an VA for each VM he had in his organization. This is of course not the case. It would be ideal of course. Imagine that you go to the movie theater and that for each customer there is a cashier. That means you will never ever have to queue. However, we all know that is not how it works. When a customers enters, it queue in one of the available lines, and it when it is his turn, he will be served. Even better, maybe there is somebody at the front checking the available resources and then balancing the load over the available queues. Same   for Backup & Replication, you will need an amount of VA's and Veeam will load balance the load over them in a smart way.
  • Maybe not so clear, but after the all visitors went into the screening room, the door needs to be closed so that no light and noise from outside the room goes in. For Veeam, it is the same, the job is not over after all VMDKs have been backed up. The job agent still needs to consider retention. If you configured 14 points, but after the backup, 15 points are on disk, maybe some clean up action is needed. Since the job agent is close to the repository, it will be the one deleting the files, executing merges or synthetic fulls.
  • A VA for Windows can be a proxy and a repository. A VA for Linux can only act as a repository. Finally you can imagine that you can not run the job agent on a CIFS repository. So Veeam picks out another server automatically to run the job agent. Sometimes, this green magic, does not 100% have all the information it needs to make a rational decisions. Imagine the CIFS share is on another site, and the connection between both sites are limited. In this case it makes sense to run the Job agent near the CIFS share so it can do the merge or synthetic full locally. So if you have multiple sites, it might be wise to configure the gateway server manually (aka, where do I run the job agent)
Another thing that might be interesting to know is that the VA is just a simple binary. It is not being loaded as a service. So how does it get started. Well that depends on the server. For Windows, when you add a repository, the first thing you actually do is a add a "managed server". Adding a managed server for Windows means Veeam will pus:
  • An installer service : to install or update the transport service
  • Transport service : the service running on a windows managed server that kickstarts the VA
You can also check this. If you already configured windows backup proxies and repositories before, you can go to backup infrastructure. Under managed servers, you will see your proxies pop up. If you open up the GUI, you can see under ports, the installer service and the transport service. On the server itself, you will be able to see the Veeam services as well via the services.msc.

Pushing those services is done via the administrative share. However, if due to security configurations, you are not able to do so, you can actually manually install "the installer service". Support can help you with this



For Linux server, Veeam just uses SSH to upload the agent and start the agent dynamically. SSH is a pretty stable protocol for uploading and managing Linux server. This way, Veeam doesn't have to integrate with the different init.d, upstart, etc. mechanisms.

So for this part, lets put all this info into practice. On a Veeam Backup Server with default proxy, I started two jobs





Then I started a small script I wrote to monitor Veeam processes .



First of all, you will see a lot of green magic going on. I'm not going to discuss all the services now but maybe pick out some specifics.

The first one with ID 11604. This is the Veeam Backup Service or the local branch manager. It is started at boot time (parent is the services.exe daemon).

The next 2 processes are job manager. We have two Jobs running so of course, the backup services has started two floor managers to handle those. You can see them with ID  87228 (Backup Job Linux) and  84604 (Backup Job Windows). How do I know they are corresponding to those jobs? Well in the command line, there are random hashes. The first one is actually the Job ID, which you can find in the logs. For example, at the end of the Backup Job Linux log I found
Job has been stopped successfully. Name: [Backup Job Linux], JobId: [82062049-544e-44fc-9bb8-f3727e3464ac]
The second one is the Session ID, which changes every time the job runs. You can also find these IDs in the database. For example, if you need a reference table, you can check BJobs for the matching job ID vs job name

Since I'm running everything on the same server,  you can also see the VAs. You will that there are more VAs active than described above. For example per VM there is also a VA running. But let's check the ones we discussed.

For proxy VA's, you can check ID 85072 (sharepoint vmdk) and 86760 (ad vmdk). What I like is that you can also check is where they are running and logging to. Also, it is important that even though they are running on the backup server, they have not been started by the backup server but rather by the VeeamTransportSVC daemon. So the job manager has contacted this Transport service to kickstart the whole process. Finally, although the Linux Job was running, it didn't have any resource ready to backup the VMDKs (proxy slots) free.

At the target side you can see the target agent or repository agent running with ID 87744 (Backup Windows Job) and ID 86656 (Backup Linux Job). As discussed, there is only one target writer for every job.

2015/04/09

Veeam Data Domain integration X-Rayed

With the latest release of v8, Veeam introduced Data Domain integration. The integration is based on DD boost. But what does that actually mean. Well it means we will do faster backup copy job (and backups) towards a DD.

So what's so good about the integration? First of all Veeam supports "Distributed Segment Processing". The basic idea that Veeam will do the Data Domain deduplication at the Veeam side. The main advantage is DD deduplication is global dedup. If you have for example 2 backup copy jobs, copying each 1 windows VM without DD boost, Veeam will send over the OS blocks 2 times. Simply because they are different jobs.

With DD Boost, when the gateway server (the component that talks DD boost), has to store a second copy of the OS blocks, it will send pointers down the line because it knows that the Data Domain has already stored those duplicate blocks. The main advantage is that there will be less network usage and the DD doesn't have do any processing anymore. Hence the term Distributed (=each client) Segment (chunks of data) Processing. Also the job performance might boost significantly because the second time those same blocks need to be stored, there is no real write occurring on the Data domain, just a meta data update.

Although most people focus on this part of the DD Boost integration, that's not my favorite part of the integration. So in this article I want to focus on "Virtual Synthetics" and what it means. To understand the benefits. Lets first look at how a backup copy job works.

First of, the backup copy job or bcj doesn't copy files. It copies data and it constructs new backup files at the target side. So if you use forward incremental, reverse incremental or forever incremental, the result of the bcj will always be the same. The bcj uses a similar strategy as forever incremental. So lets take a really trivial example. Imagine you created a bcj with a retention of 2 points.

The first day you will create a full backup file.

The second day, you will create an increment file and store only blocks that have changed. No rocket science so far.
The third day, you will create another increment. However 3 points are more than the configured policy of 2. So some action is required.
Just deleting files is not an option. You can not delete the oldest backup file because it is the full backup. This is why the backup copy job does something called a merge.
The idea is that you take the blocks from the oldest increment, read them from disk and then write them back to the original full backup file, essentially updating the changed blocks in the full backup file
The result is that the full backup file is actually representing the restore point from day 2 and so the amount of backup files equals again the retention policy you configured.

But why is that bad for Data Domain? Well Data Domain was designed for sequential write. In fact, it was created to replace Tape and that is why most dedup devices have a VTL functionality.

With tape in mind, remember those old VHS cassettes? If you record one soap (different episodes) on one tape in chronological order and one day you want to binge watching the soap, you could just put in the cassette and push play. No delay when switching episodes, because it is just steaming the tape.

However, imagine you are going out and different members of the family recorded different series and movies on one cassette, when you wanted to play your specific part, you needed to skip back and forward to get to your part. This took some time cause the tape needs to go to that specific point in time.

Benching or just playing the video, in data terms is what we call Sequential I/O. You just read the data in the chronological order that you have written it. Skipping back and forward  to read (or write to) that specific part you need is what is called random I/O. And as with tapes, it is pretty slow. Now if you design your device to act like tape, you can write really fast, but the random I/O kills you.

Well this is why bcj merging is actually pretty slow on deduplication devices in general. It is a lot of random reading and writing to files. So how does DD boost help here? First of all you should understand that DD has meta data where it stores pointers to the data blocks that make up that specific files. But let's not go into too deep how a filesystem works, you got wikipedia for that

When DD boost is being utilized, Veeam will not read block a', d', h' and l' and then write them back to the DD as it usually does. Instead, it instructs the DD to point to the blocks already on disk
Because you are just writing pointers, the "merge process" is fast compared to the regular standard merge process.

So this make the Data Domain an excellent backup copy job target. Especially because you can define GFS on the bcj. So imagine you instruct the bcj to keep 6 weekly backups, you will actually have 7 full backups on the DD, one for every week and one active VBK.

One thing that does not change is the restore time. Veeam restores are pretty random, especially if you need to read from different files. Imagine you have a chain of 14 points, and you want to restore something from the newest point, with the backup copy job, that would mean accessing 14 different files in the background.

If you ask me, I think the DD is an excellent bcj target. Just keep a small amount of files on jbods for example (for example 7 to 14 points), which are excellent in handling random I/O, and then tier them to a DD. If you chose your policy correctly on the main target, 95% of the restores will come from the first tier. However if something bad happens and you need to go back a couple of months, it is acceptable that the restore might take a bit longer in favor of the huge amount of restore points you need to keep. For Veeam, this is actually the preferred architecture