vCoffee : Drink coffee and check your backup jobs from your smartphone

For some months now I had the idea to make an App for Veeam Availibility Console (VAC) and / or Veeam Backup & Replication (VBR). While getting a coffee in the morning I noticed, people spend quite a lot of time queuing or waiting for the machine to deliver the coffee.

That's why I'm glad to introduce vCoffee today. It is a small app, currently available in alpha which you need install manually on your Android device. The app itself is rather simple: You login, get an overview of all jobs and you can see the latest state. If required, you can click the job, and start the job directly from the app. As a bonus, you also get an "RPO" indicator, which tells you if the jobs started in the last 24h. So if the job was successfully but didn't run for the last 5 days, this will also be indicated in the first screen.

What I'm also really excited about is that it covers both VAC REST API and VBR REST API. This means that as a partner, you can check all your customers or tenants from the app. As a customer, you are able to monitor your VBR server via the Enterprise Manager. Do note that REST API is part of the Enterprise Plus licensing.

One thing that is really important is that Android seems to be quite paranoid about security and this means that you can not use self signed certificates. For VAC, I don't think this is a big issue but maybe for enterprise manager, you might have used self signed certificate. That's why I also would like to refer to my colleague Luca Dell'oca. He wrote an excellent article about "Let's encrypt". I used the article for both VAC and Enterprise Manager. For enterprise manager, if you have already installed it, there is an excellent article that explains how to replace the certificate.

Here is a small demo of the app. I can tell you that on my native phone it works a lot faster but due to the Android emulation, it looks quite slow in the demo.

The code is released under MIT License on VeeamHub, the Veeam community which get contributions from Veeam employees but also external consultants an Veeam enthusiasts.  This means that everybody can contribute and reuse the code as he or she likes. It also means that no responsibility will be taken and you can not contact Veeam Support for help. Basically, this app was not developed by Veeam R&D and has not been checked by Veeam QA.

In a follow up article, I'm planning to discus the VAC REST API, because I was amazed how simple this really was. This because the app itself is JavaScript code and the JSON support of the VAC REST API makes parsing the objects extremely simple

Finally, you can download a debug build here as long as I'm not running out of bandwidth usage.


RPS Workspace

Many people have been using http://rps.dewin.me and honestly it gives me great pleasure that people like it and use it so often. I tried to make the tool as straightforward as possible but one thing people do not seem to understand is the line "Work Space". So on a regular basis I get the question, what the hell is "Workspace" and how is it calculated.

In the early days of RPS, it didn't have this Workline space. However, during some discussions, some fellow SE's where concerned that there was no buffer space for:

  • Occasionally running a manual full
  • Not filling the Filesystem for 100% cause that is just not best practice
  • Space that is used during the backup process itself

So the fist two ones, I hope, are pretty clear. The second one is not always clear. So imagine that you are running a forever incremental. You configured 3 points, and that is what you will get after the backup is done. However, during the backup, the first thing that happens is that an incremental point is created. After the incremental backup is done, the merge process happens. However, that also means that during that "working period", you actually have 4 restore points on disk (1 full + 3 incrementals). Thus you need to have some extra space.

That hopefully explains the why. Now the how. This one is a bit more complicated. The initial workspace was pretty simple, take a full backup additionally. While this is great in smaller environments we pretty soon came to the conclusion that if you have 200TB of "full data" (all fulls together), you probably do not need 200TB of workspace. Especially because typically there is not one humongous job that covers the complete environment. Probably you have split up the configuration in a couple of jobs and those jobs are probably not running all at the same exact time.

So the workspace has some kind of bucket system where the first bucket has a higher rate then the last one. Once the first bucket is filled, it overflows to the next one. This means that the workspace does not grow lineair with the amount of used space.

Here are the buckets themselves:
0-10 TB = source data will be compressed and then multiplied with a factor of 1.05
10-20 TB = source data will be compressed and then multiplied with a factor of 0.66
20 - 100 TB = source data will be compressed and then multiplied with a factor of 0.4
100 - 500 TB = source data will be compressed and then multiplied with a factor of 0.25
500 TB+ = source data will be compressed and then multiplied with a factor of 0.10

Let me give you some examples. If you have 5TB of source data, that 5TB will fit exactly in the first bucket. Thus the calculation is rather easy. If you use a compression factor of 50% (the default), you will get:
5TB x 50/100 x 1.05 =~ 2.6 TB Workspace

If you have a source data of 50TB however, it does not fit in the first bucket. It has to split the data over 3 buckets. The first 10TB in the first bucket, the next 10TB in the second bucket and the last 30TB in the third bucket. Thus the calculation would be roughly:
10 TB x 50/100 x 1.05 + 10 TB x 50/100 x 0.66 + 30 TB x 50/100 x 0.4 =~ 5 + 3 + 6 = 14TB Workspace

You can verify that here:

Finally if you have a big customer or you are a big customer and you have 500TB. You will see a split of 10,10,80,400. Thus the calculation would be:
10 TB x 50/100 x 1.05 + 10 TB x 50/100 x 0.66 + 80 TB x 50/100 x 0.4 + 400 TB x 50/100 x 0.25 =~ 5 + 3 + 16 + 50 = 74TB Workspace

You can verify that here:

So instead of saying that with 500TB, you will need 250TB of workspace, it is drastically lowered to 74TB. And again, that makes sense, the environment will be split up in multiple jobs, so those will not be running all at the same time and you will probably not run an active full on all of them at the same time.

For those want to play with it and see those buckets in action, I created a small jsfiddle here:

Just change the workspaceTB and click run to update the output


Self Service Demo with Veeam Backup for Office 365 using the REST API

Just today Veeam released the 1.5 GA version of Office 365. This versions ships with a proxy-repository model introducing scalability for the bigger shops and service providers. It also features a complete REST API and I personally love it. It means that the community has a chance to extend the product without any limitations. (For those just getting started, know that by default, the REST API is not enabled, so you should enable it in the main options. You can find the main menu in the top left corner under the "hamburger" icon)

And just to demo how powerful it is, I already made a small demo. The demo basically allows you to startup a self service recovery wizard, on which a user can login in with his LDAP/AD credentials, and then restore his own mails independently from the admin. This is quite common request I get in the field where admins don't really feel comfortable poking around the end-user's mailbox even if they don't have bad intentions. 

The self service demo aka "Mail Maestro" source can be found on VeeamHub . A compiled Windows version can be found here . Besides the source code, the Github also shows how you can use certificates to "secure" the connection between the end user and the server. BTW, the code only works with an on-premises exchange server and a local LDAP connection, just because I didn't had the time to set up an Office365 account etc. Most of the wizard will probably work, I'm just assuming that during restore, the credentials that are being used to restore (by default, the credentials that are being used to login) might not work. 

Ok so let's try it out. When you download the compiled version, you will get the binary and the config file. Start by editing the json file with for example notepad. I removed the "vbomailbox" argument because I will supply this by command line.

Maybe some side notes. The LDAP server is of course a reference to the LDAP / AD server. To lookup the user that you want to allow to do his self service restore, we temporarily need to bind to it and lookup his account, email address and it's distinguished name. You can use a readonly user for this. The rest should be quite self explanatory, except maybe for "LocalStop". If you enable LocalStop, you can type "stop" on the command line, to cleanly close the session  from the server side. The user himself will be able to stop the wizard from the portal after logging in to indicate that he is ready. Both will clean up the restore session in VBO365 (headless Veeam Explorer).

So let's go to the command line and pass the config file. Since we removed vbomailbox, Mail Maestro will complain that it is not aware, what user you want to use in this session. You can supply it at the command line by using -vbomailbox

Let's supply a user that is being backed up

Great, the process is starting. Mail Maestro is able to find the user, start the headless Veeam Explorer session and was able to find the mailbox in the backups. You can also see that it is serving on the http://localhost:4123. Open the firewall port and replace the localhost with the server ip to grant remote access

So if the user logs in with his email address, he will be authenticated against LDAP and then hopefully the wizard will be quite self explanatory

Let's login the mailbox and delete all the mails in the inbox

Now let's restore them from Mail Maestro by clicking the green restore button next to the Email Box

... and the mails are back

When the user is done, he can stop the portal via the button in the top right corner of the portal. I noticed that if the browser window is too small, the button might not show up. Anyway you can always stop the wizard by typing "stop" on the command line

Final notes, as with many of my projects, this is just a demo. If you feel like you could use this in production environment, please evaluate the code. This is published under MIT licenses, so basically, you can do whatever you want with it on your own risk. I hope however, that this shows how powerful the new API is and what you can do with it. I can only imagine that in the future, service providers would be able to built their own backup portal and offer Backup as a service. In fact I know my colleague Niels Engelen  has been working on such a demo in PHP. 


Adding AD/OU users to VBO365 via Powershell

For the Veeam fans out there, you must have been living under a rock if you don't know that there is a new backup product for Office 365 (Veeam Backup for Office 365 or short handed VBO365). It allows you to backup mailbox items like mail, calendar items, etc.

While 1.0 is already released, the 1.5 is currently in public beta. One of the cool things it brings is scalability, which a lot of users have been asking for. However it also brings full automation support in the form of a complete Rest API and a complete Powershell module. In this blog post I want to show you the power you get with the new Powershell module.

Quite often I get asked on how to add only a selected amount of users to a job. For example, a company has 4000 mailboxes, but only want to select a certain amount of mailbox for protection in a certain job. This makes even more sense with v1.5 since you can define multiple repositories with a different retention. So maybe for the helpdesk guys, you don't really want to backup to long, but for the managers, you want to keep the mails backed up for 8 years. Handpicking those users per job can be a tedious job.

With the new Powershell Module, you can  automate this task. There is a new cmdlet called "Add-VBOJob" that allows you to define a new job. It takes the following parameters:

  • Organization (Get-VBOOrganization)
  • Target Repository (Get-VBORepository)
  • Mailboxes (Get-VBOOrganizationMailbox)
  • Schedule Policy (New-VBOJobSchedulePolicy)
  • Name

To see it in action, I made a sample scripts that queries Active Directory and get's all users in a certain OU. Then based on those users, you can make a list of email address that you want to add. Then armed with that list, you can use "Get-VBOOrganizationMailbox" to select the correct mailboxes.

You can find the script here. It should be quite straight forward. Here are some screenshots seeing it in action

Firs of all, the module is in "C:\Program Files\Veeam\Backup365\Veeam.Archiver.PowerShell". So you can just execute "import-module 'C:\Program Files\Veeam\Backup365\Veeam.Archiver.PowerShell'". However the $installpath trick in this scripts, tries to find out the installation directory even if you did not install VBO365 on the default location.

Now as you can see from the output, it found 3 users in the OU :

  • bbols@x.local
  • ppeeters@x.local
  • tbruyne@x.local
The scripts "builds" the email address list based on the SamAccountName, but of course if you have a different policy, you can change the example. For example, I imagine quite a few companies having something like FirstName.LastName@company.com. Btw if you are wondering, "x.local" isn't a real DNS name, so how does that work with VBO365? Well it seems that 1.5 will also support on premise Exchange and Hybrid Deployments.

After building the email list, the script created the job

If we check the job, you will see those email addresses (mailboxes) where successfully added to the job

Well in this case, it was only 3 users in my test lab, but I can imagine if you need to add 500 users, you will be grateful nothing having to add them one by one. Also, you would be able to do this in a for loop, going over multiple OU's, creating multipes jobs.  Finally if you are going to use this in production once it is GA, I would recommend that you validate that you have the same amount of users in the OU as in the job. In this example, it just simply checks all the mailboxes (Get-VBOOrganizationMailbox) and verifies if the email address associated to a mailbox is in the initial email list. If it is, it is added to the job.


Gathering your Veeam Agent's in Veeam Backup & Replication

So with the new upcoming version of Veeam Agent for Windows, you will be able to backup directly to a Backup & Replication server. Not everybody knows this but you will also be able to backup to Veeam Backup & Replication Free Edition provided you have a license for the Veeam Agent for Windows. This might be important for smaller shops who have only a couple of machines and do not have a Veeam Backup & Replication license.

The steps to enable this are not so difficult but without the GA product, there is no documentation so it might be difficult to figure out how it all ties together. So here are the 7 steps you need to take to get it all working. Special thanks to Clint Wyckoff who shared these instructions internally.

Step 1: Start by installing Veeam Backup & Replication

Fairly simple start. Download Veeam Backup & Replication Free Edition. Then mount the ISO to your target server, click the install button to start the installation. Basically, in this example, we did a next next next finish install. If you are doing this in production, it might actually be good to read what you are doing. Notice in the license step, I did not assign any license so the free mode will be installed.

Step 2: Enable full functionality view

The next step is where one of my partners got stuck. You need to enable "Full Functionality" view to get through the next steps. By default, you get "Free Functionality" view which shows you only the options you can use with the Free mode. However, if you add a Veeam Agent Windows license, you will unlock more functionality than is available by default in the free mode for your agent backups

To enable it, go to the main menu, select view and then finally select the "Full Functionality" mode

Step 3: Add the Veeam Agent for Windows license to Backup & Replication

This might also be confusing, but you do not need to add the license during the Veeam Agent for Windows install. Rather, you add it to Veeam Backup & Replication and then, when you connect a Veeam Agent for Windows to VBR, it will acquire the license from the VBR server. This is good because you get a central place to manage the license.

Go to the main menu but this time choose "License". In the popup, click the install license button and select the Veeam agent for Windows license file (lic file) in the file browser. The result should be that the license is installed but the VBR server itself remains in Free mode

Step 4: Define permissions

Next step is to define the permissions on your repository. Got to "Backup Infrastructure" section and click the "Backup repositories" node. Then select the repository you want to assign rights to. Finally click "Agent Permissions". Then in the popup window, you will be able to assign permissions

For this tutorial, I made a separate user called "user1", just to show you that you can do very granular permissions

Step 5: Install the client

Installing the agent on another machine should be fairly trivial. However, in this setup, we choose not to configure the backup during install nor to create a recovery medium. However I would highly recommend you to do create a recovery medium so that you can execute bare metal recoveries if needed.

Step 6: Configure the client

Once the product is installed, we can configure it. To open the control panel, go to your system tray. A new icon should have appeared  which has a green V. Because we did not configure anything yet, it should also have a small blue question mark on it. Right click it and select control panel.

When the control panel appears, ignore the fact that it does not have a license (click no). Click configure backup to start the configuration. 

Finally in the backup wizard, as a target select Veeam Backup & Replication Repository. Specify the FQDN/IP and the credentials. When you click next, the permissions are checked and the license is acquired from the backup server. In the next step, you are able to select the repository.

Btw, if a user connects without permissions on the repository, the configuration wizard will refuse to go the next step

Step 7: Run the backup

With the configuration done, you are ready to run the backup. You can see the backup job and backup from the Veeam Backup & Replication repository.

In the Veeam Agent for Windows, if you click the license tab, you will also see that the agent is licensed through Veeam Backup & Replication

So what's next?

Well, you can explore what other functionality is enabled when you backup to a free edition. One cool feature would be to "backup copy" your job to a second location. For example, in the following screenshot, I defined a repository on another drive, and then did a backup copy job to the second location.


Under the hood: How does ReFS block cloning works

In the latest version of Veeam 9.5, there is a new feature called ReFS Block cloning integration. It seems that the ReFS Block cloning limitations confuse a lot of people so I decided to look a bit under the hood. It turns out that most limitations are just basic limitation of the API itself.

To understand better how it all works, I made a small project called Refs-fclone . The idea is to give it a source file (existing) and then to duplicate that file to a target file (non existing) with the API. Basically creating a synthetic full from an existing VBK. 

It turns out that idea was not so original. During my google quests for more information (because some parts didn't worked), it appeared that there was a fellow hacker that made the exact tool. I must admit that I reused some of his code so you can find his original code here

Nevertheless I finished he project, just to figure out for myself how it all works. In the end, the API is pretty "easy" I would say. I don't want to go over the complete code but highlight some important bits. If you don't understand the C++ code, just ignore it and read the text underneath it. I tried to put the important parts in bold.


Before I even got started, my initial code did not want to compile. I couldn't figure it out because I had the correct references in place. But for some reason, it could not find "FSCTL_DUPLICATE_EXTENTS_TO_FILE". So I start looking into my projects settings. Turned out, it was set to compile with Windows 8.1 as a target and when I changed it to 10.0.10586.0, all of the sudden it could find all reference. 

This shows an important lesson. This code is not meant to be ran on Windows 2012 because it just doesn't have the API call supported. So many customers have been asking, will the ReFS integration work on Windows 2012 and the answer is simple: NO. At the time it was developed, the API call didn't exist. Also, you will need to have the underlying volume formatted with Windows 2016 because again, the ReFS version in 2012 did not support this API call.

So let's look at the code. First, before you clone blocks, there are some requirements which I want to highlight in the code itself:
FILE_END_OF_FILE_INFO preallocsz = { filesz };
SetFileInformationByHandle(tgthandle, FileEndOfFileInfo, &preallocsz, sizeof(preallocsz));
This bit of code defines the end of the file. Basically it tells windows how big the file should be. In this case, the size if filesz which is the original file size. Why is that important? Well to use the block clone API, we need to tell it where it should copy it data to. Basically a starting point + how much data we want to copy. But this starting point has to exist, so if want to make a complete copy, we have to resize it to be a big as the original

if (filebasicinfo.FileAttributes | FILE_ATTRIBUTE_SPARSE_FILE) {
FILE_SET_SPARSE_BUFFER sparse = { true };
DeviceIoControl(tgthandle, FSCTL_SET_SPARSE, &sparse, sizeof(sparse), NULL, 0, dummyptr, NULL);
Next bit is the sparse part. The "if" statements basically check if the source file is a sparse file, and if it is, we should make the target sparse (tgthandle) as well. So what it is a sparse file? Well basically if a file is not a sparse file, it will allocate all the data on disks if you resize it. Even if you didn't write anything to it yet. A sparse file only allocates space, when you write non zero data somewhere. So even if it looks like it is 15GB big, it might only consume 100MB on disk but space is not really allocated. 

Why is that important? Well again, the API requires that source and target files need to have the same setting. This code actually runs before the resizing part. The reason is simple, if you do not make it sparse, the file will allocate all the space on disk, even if we didn't write to it. Not a great way to make space-less fulls.

if (DeviceIoControl(srchandle, FSCTL_GET_INTEGRITY_INFORMATION, NULL, 0, &integinfo, sizeof(integinfo), &written, NULL)) {
DeviceIoControl(tgthandle, FSCTL_SET_INTEGRITY_INFORMATION, &integinfo, sizeof(integinfo), NULL, 0, dummyptr, NULL);
Finally this bit. Basically it get the integrity stream information from the source file and then copies it to the target file.  Again, they have to be the same for the code to allow for block cloning.

This shows that basically the source and target file have to be pretty much the same.  This partially explains why you need to have an Active Full on your chain before block cloning starts  to work. The old backup files might not have been created with ReFS in mind!

Also for integrity streams to work, we don't need to do anything fancy. We just need to tell ReFS, this file should be checked. 

The Cool Part

for (LONGLONG cpoffset = 0; cpoffset < filesz.QuadPart; cpoffset += CLONESZ) {
if ((cpoffset + cpblocks) > filesz.QuadPart) {
cpblocks = filesz.QuadPart - cpoffset;
DUPLICATE_EXTENTS_DATA clonestruct = { srchandle };
clonestruct.FileHandle = srchandle;
clonestruct.ByteCount.QuadPart = cpblocks;
clonestruct.SourceFileOffset.QuadPart = cpoffset;
clonestruct.TargetFileOffset.QuadPart = cpoffset;
DeviceIoControl(tgthandle, FSCTL_DUPLICATE_EXTENTS_TO_FILE, &clonestruct, sizeof(clonestruct), NULL, 0, dummyptr, NULL);
That's it. That is all what is required to do the real cloning. So how does it work. Well first there is a for loop that goes over all the chunks of data of the source files. There is one limitation with the block clone API. You can only copy a chunk of 4GB at a time. In this project CLONESZ is defined as 1GB to be on the safe side.

So imagine you have a file of 3.5GB. The for loop will calculates that the first chunk starts a 0 bytes and the amount of data we want to copy is 1GB. Next time, it will calculate the the next chunck starts at 1GB and we need to copy 1 GB, and so on..

However the forth time, it actually detects that there is only 500GB remaining, and instead of copying 1GB, we copy only what is remaining (filesize - where we are now).

But how do we call the API? well first we need to create a struct (think of it is as a set of variables). The first variable references the original file. Bytecount says how much data we want to copy (mostly 1GB). Finally the source and target file offset are filed in with the correct starting point. Since we want duplicates, the starting point for the block clone is the same.

Finally we just tell windows to execute the "FSCTL_DUPLICATE_EXTENTS_TO_FILE" on the target file, which basically invokes the API. We give it the set of variables we filled in correctly. So basically the clone API itself + filling in the variables is only 5 lines of code.

The important bit here, is that you can not just copy files on a ReFS volume and expect ReFS to do the block cloning. An application really has to tell ReFS to clone data from one file to the other and both files have to be on the same disk.

This has one advantage though. The API just clones data even if Veeam has compressed that data or encrypted it. Since Veeam actively tells ReFS to clone the data, it doesn't have to figure out what data is duplicate, it just does the job. That is a major advantage against deduplication: you can still secure and compress your files. Also, since the clone is just a simple call during the backup, it doesn't require any post processing. And no post processing means no exuberant CPU usage or extra I/O to execute the call.

Seeing it action

This is how E:\CP looks like before executing refs-fclone. Nothing special. An empty directory and the ReFS volume has 23GB free

Now lets copy a VBK to E:\CP with the tool. It shows that the source file is around 15GB big and it is cloning 1GB at the time. Interestingly enough you see that last run, it just copies the remainder of the data.

This run took around 5 seconds max to execute this "copy". Seems like nothing really happened. However, if we check the result on disk, we see something interesting: 

The free disk space is still 23GB. However we can see that a new file is created that is 15GB+. Checksumming both files give exactly the same result.

Why is this result significant? Well it shows that the interface to the block clone API is pretty straight forward. It also means that although it looks like Veeam is cloning the data, it is actually ReFS that manages everything under the hood. From a Veeam perspective (and also end-user perspective), the end result looks exactly like a complete full on disk. So once the block clone API call is made, there is no way to undo it or to get statistics about it. All of the complexity is hidden.

Why do we need aligned blocks?

Finally I want to share you this result

In the beginning, I made a small file that had some random text like this. In this example, it has 10 letters in it, which means it is 10 Bytes on disk. When I tried the tool, it didn't work (as you can see), but the tool did work on Veeam backup files. 

So why doesn't it work. Well the clone API has another important limitation. Your clone regions must match a complete set of clusters. By default the cluster size is 4KB (although for Veeam it is strongly recommended to use 64KB to avoid some issues). So if I want to make a call, the starting point has to be a multiple of 4KB. Well 0 in a sense is a multiple of 4KB so that's OK. However the amount of bytes you want to copy, also has to be a multiple of 4KB, and 10B is clearly not. When I padded the file, to be exactly 4KB (so adding 4096 chars), everything worked again.

This show a very important limitation. For block cloning to work, the data has to be aligned, since you can not copy unaligned data. Veeam backup files are by default not aligned. Thus it is required to run an active full before the block clone API can be used. To give you a visual idea what this means. On top "a default" Veeam Backup file, at the bottom, an aligned file which is required for ReFS integration

Due to compression, data blocks are not always the same size. So to save space, they just can be appended after each other. However for the block clone API, we need align the blocks. The result is that we sometimes have to pad a sector with empty data. So why do we need to align, can we just not clone more data? After all it doesn't consume more space?

Well take for example the third block. Unaligned, it is cluster 2,3,4. Aligned it is only in cluster 3 and 4. So because of the aligned blocks, we have to clone less data. You might think, why does it matter because cloning does not take extra space? 

Well first of all it keeps the files more manageable without filling it with junk data. If you copy 2 and 4 from the unaligned file, you basically add data that is not required. Next, if you delete the original file, the data does start "using space on disk". Because of the reference, you basically tell ReFS not to delete the data blocks as long as they are referenced by some file. So the longer these chain continue, the more junk data you might have.

So this is the reason why you need an active full. A full backup has to be created with ReFS in mind, otherwise the blocks are not aligned and in this case Veeam refuses to use the API

If you want to read more about block size I do recommend this article from my colleague Luca

One more thing

Here is a fun idea. You could use the tool together with a post process script to create GFS point on a Primary Chain. Although not recommended, you could for example run a script every month that "clones" the last VBK to a separate folder. The clone is instant so doesn't take a lot of time or much extra space. You could script your own retention or manually delete files. Clearly this is not really supported but it would be a cool idea to keep for example one full VBK as a monthly full for a couple of years