2017/02/22

Under the hood: How does ReFS block cloning works

In the latest version of Veeam 9.5, there is a new feature called ReFS Block cloning integration. It seems that the ReFS Block cloning limitations confuse a lot of people so I decided to look a bit under the hood. It turns out that most limitations are just basic limitation of the API itself.

To understand better how it all works, I made a small project called Refs-fclone . The idea is to give it a source file (existing) and then to duplicate that file to a target file (non existing) with the API. Basically creating a synthetic full from an existing VBK. 

It turns out that idea was not so original. During my google quests for more information (because some parts didn't worked), it appeared that there was a fellow hacker that made the exact tool. I must admit that I reused some of his code so you can find his original code here

Nevertheless I finished he project, just to figure out for myself how it all works. In the end, the API is pretty "easy" I would say. I don't want to go over the complete code but highlight some important bits. If you don't understand the C++ code, just ignore it and read the text underneath it. I tried to put the important parts in bold.

Prerequirements

Before I even got started, my initial code did not want to compile. I couldn't figure it out because I had the correct references in place. But for some reason, it could not find "FSCTL_DUPLICATE_EXTENTS_TO_FILE". So I start looking into my projects settings. Turned out, it was set to compile with Windows 8.1 as a target and when I changed it to 10.0.10586.0, all of the sudden it could find all reference. 

This shows an important lesson. This code is not meant to be ran on Windows 2012 because it just doesn't have the API call supported. So many customers have been asking, will the ReFS integration work on Windows 2012 and the answer is simple: NO. At the time it was developed, the API call didn't exist. Also, you will need to have the underlying volume formatted with Windows 2016 because again, the ReFS version in 2012 did not support this API call.


So let's look at the code. First, before you clone blocks, there are some requirements which I want to highlight in the code itself:
FILE_END_OF_FILE_INFO preallocsz = { filesz };
SetFileInformationByHandle(tgthandle, FileEndOfFileInfo, &preallocsz, sizeof(preallocsz));
This bit of code defines the end of the file. Basically it tells windows how big the file should be. In this case, the size if filesz which is the original file size. Why is that important? Well to use the block clone API, we need to tell it where it should copy it data to. Basically a starting point + how much data we want to copy. But this starting point has to exist, so if want to make a complete copy, we have to resize it to be a big as the original

if (filebasicinfo.FileAttributes | FILE_ATTRIBUTE_SPARSE_FILE) {
FILE_SET_SPARSE_BUFFER sparse = { true };
DeviceIoControl(tgthandle, FSCTL_SET_SPARSE, &sparse, sizeof(sparse), NULL, 0, dummyptr, NULL);
}
Next bit is the sparse part. The "if" statements basically check if the source file is a sparse file, and if it is, we should make the target sparse (tgthandle) as well. So what it is a sparse file? Well basically if a file is not a sparse file, it will allocate all the data on disks if you resize it. Even if you didn't write anything to it yet. A sparse file only allocates space, when you write non zero data somewhere. So even if it looks like it is 15GB big, it might only consume 100MB on disk but space is not really allocated. 

Why is that important? Well again, the API requires that source and target files need to have the same setting. This code actually runs before the resizing part. The reason is simple, if you do not make it sparse, the file will allocate all the space on disk, even if we didn't write to it. Not a great way to make space-less fulls.

if (DeviceIoControl(srchandle, FSCTL_GET_INTEGRITY_INFORMATION, NULL, 0, &integinfo, sizeof(integinfo), &written, NULL)) {
DeviceIoControl(tgthandle, FSCTL_SET_INTEGRITY_INFORMATION, &integinfo, sizeof(integinfo), NULL, 0, dummyptr, NULL);
}
Finally this bit. Basically it get the integrity stream information from the source file and then copies it to the target file.  Again, they have to be the same for the code to allow for block cloning.

This shows that basically the source and target file have to be pretty much the same.  This partially explains why you need to have an Active Full on your chain before block cloning starts  to work. The old backup files might not have been created with ReFS in mind!

Also for integrity streams to work, we don't need to do anything fancy. We just need to tell ReFS, this file should be checked. 


The Cool Part

for (LONGLONG cpoffset = 0; cpoffset < filesz.QuadPart; cpoffset += CLONESZ) {
LONGLONG cpblocks = CLONESZ;
if ((cpoffset + cpblocks) > filesz.QuadPart) {
cpblocks = filesz.QuadPart - cpoffset;
}
DUPLICATE_EXTENTS_DATA clonestruct = { srchandle };
clonestruct.FileHandle = srchandle;
clonestruct.ByteCount.QuadPart = cpblocks;
clonestruct.SourceFileOffset.QuadPart = cpoffset;
clonestruct.TargetFileOffset.QuadPart = cpoffset;
DeviceIoControl(tgthandle, FSCTL_DUPLICATE_EXTENTS_TO_FILE, &clonestruct, sizeof(clonestruct), NULL, 0, dummyptr, NULL);
}
That's it. That is all what is required to do the real cloning. So how does it work. Well first there is a for loop that goes over all the chunks of data of the source files. There is one limitation with the block clone API. You can only copy a chunk of 4GB at a time. In this project CLONESZ is defined as 1GB to be on the safe side.

So imagine you have a file of 3.5GB. The for loop will calculates that the first chunk starts a 0 bytes and the amount of data we want to copy is 1GB. Next time, it will calculate the the next chunck starts at 1GB and we need to copy 1 GB, and so on..

However the forth time, it actually detects that there is only 500GB remaining, and instead of copying 1GB, we copy only what is remaining (filesize - where we are now).

But how do we call the API? well first we need to create a struct (think of it is as a set of variables). The first variable references the original file. Bytecount says how much data we want to copy (mostly 1GB). Finally the source and target file offset are filed in with the correct starting point. Since we want duplicates, the starting point for the block clone is the same.

Finally we just tell windows to execute the "FSCTL_DUPLICATE_EXTENTS_TO_FILE" on the target file, which basically invokes the API. We give it the set of variables we filled in correctly. So basically the clone API itself + filling in the variables is only 5 lines of code.

The important bit here, is that you can not just copy files on a ReFS volume and expect ReFS to do the block cloning. An application really has to tell ReFS to clone data from one file to the other and both files have to be on the same disk.

This has one advantage though. The API just clones data even if Veeam has compressed that data or encrypted it. Since Veeam actively tells ReFS to clone the data, it doesn't have to figure out what data is duplicate, it just does the job. That is a major advantage against deduplication: you can still secure and compress your files. Also, since the clone is just a simple call during the backup, it doesn't require any post processing. And no post processing means no exuberant CPU usage or extra I/O to execute the call.

Seeing it action

This is how E:\CP looks like before executing refs-fclone. Nothing special. An empty directory and the ReFS volume has 23GB free


Now lets copy a VBK to E:\CP with the tool. It shows that the source file is around 15GB big and it is cloning 1GB at the time. Interestingly enough you see that last run, it just copies the remainder of the data.


This run took around 5 seconds max to execute this "copy". Seems like nothing really happened. However, if we check the result on disk, we see something interesting: 


The free disk space is still 23GB. However we can see that a new file is created that is 15GB+. Checksumming both files give exactly the same result.

Why is this result significant? Well it shows that the interface to the block clone API is pretty straight forward. It also means that although it looks like Veeam is cloning the data, it is actually ReFS that manages everything under the hood. From a Veeam perspective (and also end-user perspective), the end result looks exactly like a complete full on disk. So once the block clone API call is made, there is no way to undo it or to get statistics about it. All of the complexity is hidden.

Why do we need aligned blocks?

Finally I want to share you this result


In the beginning, I made a small file that had some random text like this. In this example, it has 10 letters in it, which means it is 10 Bytes on disk. When I tried the tool, it didn't work (as you can see), but the tool did work on Veeam backup files. 

So why doesn't it work. Well the clone API has another important limitation. Your clone regions must match a complete set of clusters. By default the cluster size is 4KB (although for Veeam it is strongly recommended to use 64KB to avoid some issues). So if I want to make a call, the starting point has to be a multiple of 4KB. Well 0 in a sense is a multiple of 4KB so that's OK. However the amount of bytes you want to copy, also has to be a multiple of 4KB, and 10B is clearly not. When I padded the file, to be exactly 4KB (so adding 4096 chars), everything worked again.

This show a very important limitation. For block cloning to work, the data has to be aligned, since you can not copy unaligned data. Veeam backup files are by default not aligned. Thus it is required to run an active full before the block clone API can be used. To give you a visual idea what this means. On top "a default" Veeam Backup file, at the bottom, an aligned file which is required for ReFS integration



Due to compression, data blocks are not always the same size. So to save space, they just can be appended after each other. However for the block clone API, we need align the blocks. The result is that we sometimes have to pad a sector with empty data. So why do we need to align, can we just not clone more data? After all it doesn't consume more space?

Well take for example the third block. Unaligned, it is cluster 2,3,4. Aligned it is only in cluster 3 and 4. So because of the aligned blocks, we have to clone less data. You might think, why does it matter because cloning does not take extra space? 

Well first of all it keeps the files more manageable without filling it with junk data. If you copy 2 and 4 from the unaligned file, you basically add data that is not required. Next, if you delete the original file, the data does start "using space on disk". Because of the reference, you basically tell ReFS not to delete the data blocks as long as they are referenced by some file. So the longer these chain continue, the more junk data you might have.

So this is the reason why you need an active full. A full backup has to be created with ReFS in mind, otherwise the blocks are not aligned and in this case Veeam refuses to use the API

If you want to read more about block size I do recommend this article from my colleague Luca

One more thing

Here is a fun idea. You could use the tool together with a post process script to create GFS point on a Primary Chain. Although not recommended, you could for example run a script every month that "clones" the last VBK to a separate folder. The clone is instant so doesn't take a lot of time or much extra space. You could script your own retention or manually delete files. Clearly this is not really supported but it would be a cool idea to keep for example one full VBK as a monthly full for a couple of years

2016/12/15

Recovery Magic with Veeam Agent for Linux

This week the new Veeam Agent for Linux was released. It includes file level recovery but also bare metal recovery via a Live CD. If you just want to do a bare metal recovery it is fairly easy to use. But you can do more then just do a 1-to-1 restore. You also have the option to switch to the command line and change your recovered system before (re)booting into it.

You might wonder why? Well because it gives a lot of interesting opportunities. In this example, I have a Centos 7 Installation which I want to restore. However the system was running LVM and during the restore I decided to not restore the root as an LVM volume but rather directly to disk. Maybe the other way around would make more sense but it is just for the fun of showing you the chrooting process.

Basically, I did a full system recovery (restore whole) but just before restoring, I selected the LVM setup, deleted it and I restored the LVM volume directly back to /dev/sda2. Here is the result



I have a GIF here of the whole process but browser do not seem to like it. You can download it to see the whole setup


Because we altered the partitions the system will be unbootable. If you try to boot, you might see the kernel load but it will because it can not find it's filesystem. Here is for example a screenshot of such a system, that fails to boot because we did not correct for the changes made below. Again this is only when you change the partition setup drastically. If you do a straight restore, you can just reboot the system without any manual edits.



Once restored I went back to the main menu and selected switch to command line


Once we are there we need a couple of thing. Basically we will mount our new system and chroot into it. You can start by checking if your disk was correctly restored with "fdisk -l /dev/sda" for example. It shows you the layout which make it easier for the next commands. Execute the following commands but do adopt them for your system (you might have a different layout then I). Make sure to mount your root filesystem before any other system.

mkdir -p /chroot
mount /dev/sda2 /chroot
mount /dev/sda1 /chroot/boot
mount -t proc none /chroot/proc
mount -o bind /dev/ chroot/dev
mount -t sysfs sys /chroot/sys
chroot /chroot

The output should be your systems shell


Ok so we are in the shell. For Centos 7 we have to do 2 things. First change /etc/fstab and second of all update the grub2 config. Fstab is quite straight forward. Use "vi /etc/fstab" to edit the file with VI. Then update the line that mounts your root "/". In my case I had to change "/dev/mapper/centos-root" to "/dev/sda2"


Now we have to update grub2 (and this is why we actually need the chroot). Use "vi /etc/default/grub" to edit the default grub config. Then  remove rd.lvm.lv=centos/root. Here are before and after screenshot. If you are going the other way, you might have to add LVM detection

Before:


After:

Now we need to still apply the default by using "grub2-mkconfig -o /boot/grub2/grub.cfg"


Now exit the chroot by typing "exit". Then unmount all the (pseudo)filesystems we have mounted earlier and you are ready to reboot. You can use "mount" without arguments to check the mounted filesystems. Make sure to umount "/chroot" as the last one

umount /chroot/proc
umount /chroot/sys
umount /chroot/dev
umount /chroot/boot
umount /chroot



Now reboot and see your system transformed. You can just type "exit" to go back to the "GUI" interface and reboot from there or just type reboot directly from the command line


2016/09/22

Figuring out Surebackup and a remote virtual lab

The Idea

If you want a to setup a Surebackup job, the most difficult part is setting up the virtual lab. In the past, great articles have been written about how you need to set them up but  a common challenge is that the backup server and the virtual lab router have to be in the same network. In this article, I wanted to take the time to write out a small experiment I made the other day, to see if I could easily get around this. This question pops up once in a while, and now at least I can tell that it is possible.


In this example, the virtual lab is called "remotelab". A linux appliance has been made, called "remotevlab" which sits in the same network as the backup server. It routes requests from the backupserver to a bubble network called remotevlab VM Network. This mimics the production range and reuses the same range. To allow you to communicate with the segment, the appliance uses Masquerading. In my example, I used a mask of 192.168.5.x, so if I want to access the ad, I can contact 192.168.5.103, and the router translates that 192.168.1.103, when the package passes.

For those who have already setup virtual labs, this is probably not rocket science. However for this scheme to work, the backup server needs to be aware that it should push IP packages for the 192.168.5.x range to the remotevlab router. So when you start a Surebackup Job, it automatically creates a static route on the backup server. When the backup server and the remotevlab router are in the same production network, all is good.

However, when they are in a different network, suddenly it doesn't work anymore. That is because, your gateway is probably not aware of the 192.168.5.x segment. So when the package is send to the router, it just drops it or routes it to its default gateway (which in turn might drop it). One way to resolve the issue, is to create those static routes in the uplink router(s) but network admins are not persée the infrastructure admins, and most of the times, they are quite reluctant to add static routes to routers they do not control (most of the times they are quite reluctant to execute any of the infra admins request, but on a sunny day, they might considering opening some ports though). So lets look at the following experiment


In my small home lab, I don't really have 2 locations connected via MPLS or different routers. So to emulate it, I created an internalnet which is not connected with a physical NIC. I connected the remotevlab there (the production side of the virtual lab router). In this network, I use a small range called "192.168.4.x".

The Connection Broker

Ok so far so good. Now we need a way for the v95backup server to talk to the remotevlab. To do this, a small linux VM was created with Centos 7 minimal. Itself has 2 virtual network adaptors. I called them eno1 and eno3 but these are just truncated names as you will see in the config. eno1 has assigned an IP in a production range. In this case it is the same range as v95backup server, but you will soon see that this doesn't have to be the case. The other network eno3 is connected to the same network as remotevlab and this is by design. In fact it is acting like the default gateway for that segment. Here are some copies of the configuration :

eno1:
# [root@vlabcon network-scripts]# cat ifcfg-eno16780032
TYPE=Ethernet
BOOTPROTO=static
IPADDR=192.168.1.199
NETMASK=255.255.255.0
GATEWAY=192.168.1.1
DNS1=8.8.8.8
DEFROUTE=yes
PEERDNS=yes
PEERROUTES=yes
IPV4_FAILURE_FATAL=no
IPV6INIT=no
IPV6_AUTOCONF=yes
IPV6_DEFROUTE=yes
IPV6_PEERDNS=yes
IPV6_PEERROUTES=yes
IPV6_FAILURE_FATAL=no
NAME=eno16780032
UUID=1874c74a-6882-435f-a465-f5fb11c60901
DEVICE=eno16780032
ONBOOT=yes
eno3:
# [root@vlabcon network-scripts]# cat ifcfg-eno33559296
TYPE=Ethernet
BOOTPROTO=static
IPADDR=192.168.4.1
NETMASK=255.255.255.0
DEFROUTE=no
PEERDNS=no
PEERROUTES=no
IPV4_FAILURE_FATAL=no
IPV6INIT=no
IPV6_AUTOCONF=no
IPV6_DEFROUTE=no
IPV6_PEERDNS=no
IPV6_PEERROUTES=no
IPV6_FAILURE_FATAL=no
NAME=eno33559296
DEVICE=eno33559296
ONBOOT=yes
You will also need to setup routing (forwarding), and a static route, so that the appliance is aware of masquerade networks. This is fairly simple, by creating a route script
#[root@vlabcon network-scripts]# cat route-eno33559296
192.168.5.0/24 via 192.168.4.2 dev eno33559296
 And by changing the correlated kernel parameter. You might check with sysctl -a if the parameter net.ipv4.ip_forward is not already set to forwarding (but on a clean install it should not)
# enable forwarding  /etc/sysctl.d/99-sysctl.conf
# check with sysctl -a | grep net.ipv4.ip_f
echo "net.ipv4.ip_forward = 1" > /etc/sysctl.d/90-forward.conf
sysctl -p /etc/sysctl.d/90-forward.conf
# check with sysctl -a | grep net.ipv4.ip_f
So basically we setup, yet another router. So how do we talk to the appliance without having to add static route to the appliance. Well we can use a layer 2 VPN. Any VPN software will do but in this example I choose PPTPD. You might argue that it is not that secure, but it is not really about security here, it is just about getting a tunnel. Plus I'm not really a network expert, and PPTPD seemed extremely easy to setup. Finally, because the protocol is quite universal, you don't have to install any VPN software on the backup server, it is built into Windows. I followed this tutorial https://www.digitalocean.com/community/tutorials/how-to-setup-your-own-vpn-with-pptp . Although it was written for Centos 6, most of it can be applied to Centos 7

The first thing we need to do is download PPTPD. It is actually hosted in the EPEL repository, so you might need to add those repositories, if you have not done that yet.
rpm -Uvh https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
Then install the software
yum install pptpd
Now the first thing you need to do is assign address to the ppp0 adapter and the clients logging in. To do configure localip (ppp0) and remoteip (clients) in /etc/pptpd.conf 
#added at the end of /etc/pptpd.conf
localip 192.168.3.1
remoteip 192.168.3.2-9
Next step is to make a client login. By default it uses a plaintext password. Again, since it is not really about security (not trying to build tunnels over the internet here), that is quite ok. You set them up in /etc/ppp/chap-secrets. "surebackup" is the login, "allyourbase" the password. PPTPD is just the default configuration name. * means that everybody can use this. So if you want, you can still add a bit of security by specifing only the backup servers IP.
#added at the end of /etc/ppp/chap-secrets
surebackup pptpd allyourbase *
I did not add the DNS config to /etc/ppp/options.pptpd as we don't really need it. So now the only thing left to do is to start the service and configure it to boot at power on.
systemctl enable pptpd
systemctl restart pptpd

v95Backup configuration

With the server being done, we can now head over to the backup server. You can just add a new VPN and put it to PPTP.


So the connection is called robo1 and we use a PPTP connection. Specify the username surebackup and password allyourbase. I also changed the adapter settings. By default the pptp connection will create a default route. That means that you will not be able to connect to other networks anymore, once you connected to the appliance. To fix that, you can disable this behavior


In the adapter settings > networking tab > ipv4 > advanced, uncheck "use default gateway". I also put the automatic metric off, and gave in the number 5. Now because, you disabled the default gateway, the Backup server does not use this connection anymore except for the "192.168.3.x" range. So it will no longer to talk to the vlab router. To fix it, add a persistent route, so that you can discover the remotevlab router.
route -p add 192.168.4.0 mask 255.255.255.0 192.168.3.1 metric 3 if 24
It should be straight forward, except "if 24". Basically, this says, route it over interface 24, which in this example was the robo1 interface, as shown below (using "route print" to discover your number).


Now you have to make sure that the connection is of course on when you start the Surebackup test. One way to do this, is by scheduling a tasks, that constantly checks the connection and restart the connection if it failed For my test, I just disabled the surebackup schedule and made a small powershell script. It basically dials the connection and than start the job:

asnp veeampssnapin
rasdial robo1 surebackup allyourbase
get-vsbjob -name "surebackup job 2" | start-vsbjob -runasync

You can see a strange scheduling time, that is because I configured the tasks 10min later, and then restarted the backup server, just to see if it would work if nobody is logged in. Very importantly, like with other tasks, make sure that you have the correct right to start the vsbjob. I configured the task to have administrator rights.



The result: it just works. Here are some screenshots:


The Virtual Lab Configuration. You can see that it is connected to the internalnet. Very important is that you point to the connection broker (192.168.4.1) as the default gateway.


The vSphere network

Robo1 connected


The routing table on the backup server. Here you can see the static route 192.168.4.x going to PPTP connection. What is even nicer is that, because we defined the 192.168.4.x, when Surebackup adds the 192.168.5.x, windows routes it correctly to the 192.168.3.2 network because of the persistent static route;


Finally, a succeeded test


Conclusion

The lab setup works and setup is relatively easy. If you made an OVF or Livecd of the setup, it would pretty easy to duplicate this setup if you have multiple locations. You might need to consider smaller IP ranges. 

PPTP might not be the best protocol, so other VPN solutions might be considered. For example, you might remove another subnet, if you could bridge the VPN port to internal network directly or if you could create a stretched layer 2 connection. However, my test was more to see what needs to be done to get this to work. What I liked the most is that it has good compatibility between Windows and Linux and I didn't need to setup special software on the backup server.

One other use case is that you could also allow other laptops in the network to access the virtual lab for testing. If they don't really need internet (or you need to setup the correct masquerading/dns in iptables/pptp), they could just connect to the network with a predefined VPN connection in Windows, even if they are not connected to the same segment as the backup server (something those network also would really appreciate).

Appendix : Hardening with IPTables

For a bit more hardening and to document the ports, I also enabled IPtables (instead of firewalld). For my install firewalld was not installed/configured but you might need to remove it. Check out http://www.faqforge.com/linux/how-to-use-iptables-on-centos-7/

The iptables configuration, I based on the Archlinux documentation found here https://wiki.archlinux.org/index.php/PPTP_server#iptables_firewall_configuration

First we need to install the service and enable it at boot
yum install iptables-services
systemctl enabled iptables 
Then I modified the config. Here is a dump of the config
#[root@vlabcon ~]# cat /etc/sysconfig/iptables
# sample configuration for iptables service
# you can edit this manually or use system-config-firewall
# please do not ask us to add additional ports/services to this default configuration
*filter
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
-A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
-A INPUT -p icmp -j ACCEPT
-A INPUT -i lo -j ACCEPT
-A INPUT -p tcp -m state --state NEW -m tcp --dport 22 -j ACCEPT
-A INPUT -p tcp -m state --state NEW -m tcp --dport 1723 -j ACCEPT
-A INPUT -p 47 -j ACCEPT
-A FORWARD -i ppp+ -o eno33559296 -j ACCEPT
-A FORWARD -o ppp+ -i eno33559296 -j ACCEPT
-A INPUT -j REJECT --reject-with icmp-host-prohibited
-A FORWARD -j REJECT --reject-with icmp-net-unreachable
COMMIT

Basically I added two input rules. For the PPTP connection, open up TCP port 1723. You also need to open up the GRE protocol (-p 47). This shows a weakness of PPTP. You need to ask your firewall guys to open the connection, but more importantly, it will probably not survive any NAT/PAT. The good thing is that overhead should be minimal although this is not so important for the simple tests. To allow routing to occure, forwarding must be allowed between the ppp connection and the eno3 interface. Simply start iptables with
systemctl start iptables
If everything is configured correctly, the setup should still work, but people that are able to connect to the connection broker can not necessarily get to the virtual lab. They first need to make the PPTP connection. 

Notice that, not masquerading has been setup towards the remotevlab router (as in the Archlinux doc). That is because the remotevlab router uses the connection broker as the default gateway, so when it replies, it will always do so redirecting the request back to the connection broker.

2016/06/10

Veeam repository in an LXC container

In the past, I have always wanted to investigate the concept of a Linux repositories with extra safety like a chroot. However I never took the time to work on a example. With the rise of docker, I was thinking, could I make a repository out of a docker container. After some research, I understood that docker has the whole abstraction layer of storage which I really didn't needed. Next to that, docker containers run in privileged mode. So I decided to try out LXC and go a bit deeper into the container technology.

Now before we start, this is just an experimental setup in a lab, so do proper testing before you put this in production. It is just an idea but not a "reference architecture", so I imagine improvements can be done. So here is a diagram of the set up which will discuss in this blog.


The first thing you might notice is that we will set up a bridge between the host ethernet card and the containers virtual network ports. In some lxc setups, NAT is preferred between the real world and a disconnected virtual bridge. I guess that works for web servers but in this example, Veeam needs to be able to connect to each container via SSH and you might need to open up multiple ports. When you bridge your containers virtual network ports and your outside ethernet port, you basically create a software switch. This give the advantage that you can assign an individual IP to every individual container as if it was a standalone machine

Next to that we will make a separate LVM volume group. You might notice that the root of every tenant is colored green. That's because we will use lxc-clone with snapshot functionality. That means, set up the root (container) once, and than clone it via an LVM snapshot so you can instantly have a new container/tenant. Finally, an LVM volume called "_repo" is assigned to each individual container and mounted under /mnt. This is where you will store the backups itself, separated from the root system.

The first thing is of course install debian. Not going to show it as it is basically following the wizard which is straightforward. I do want to say that I assigned 5GiB for the root, but it turns out, after all the configuration, I only use 1.5GiB. So if you want to save some GB's, you could assign for example only 3GiB. 

Maybe one important note, the volume group name for storing the container roots need to be called different than the container name in order for lvm-clone to work correctly. I ran into the issue where cloning did not work because of it.  So for example, call the volume group  "vg_tenstore" and the containers/logical volumes "tenant" . During the initial install, only setup the volume group. The logical volumes will be made by lxc during configuration.

So after the install, I just installed drivers and updates by executing the following. If you don't run it in VMware, you of course do not need the tools. You might also go leightweight by not installing the dkms version.
apt-get update
apt-get upgrade
apt-get install open-vm-tools-dkms
apt-get install openssh-server
reboot
After the system has rebooted, you are able to start an SSH session to it. Now let's install the LXC software. (based on https://wiki.debian.org/LXC)
apt-get install lxc bridge-utils libvirt-bin debootstrap
Once that is done, let's set up the bridge. For this edit /etc/network/interfaces. Here is a copy of my configuration
# This file describes the network interfaces available on your system
# and how to activate them. For more information, see interfaces(5).
source /etc/network/interfaces.d/*
# The loopback network interface
auto lo
iface lo inet loopback
# The primary network interface
allow-hotplug eth0
#auto eth0
#iface eth0 inet static
#       address 192.168.204.7
#       netmask 255.255.255.0
#       network 192.168.204.0
#       broadcast 192.168.204.255
#       gateway 192.168.204.1
#       # dns-* options are implemented by the resolvconf package, if installed
#       dns-nameservers 192.168.204.2
#       dns-search t.lab
auto br0
iface br0 inet static
        bridge_fd 0
        bridge_stp off
        bridge_maxwait 0
        bridge_ports eth0
        address 192.168.204.7
        netmask 255.255.255.0
        broadcast 192.168.204.255
        gateway 192.168.204.1
        # dns-* options are implemented by the resolvconf package, if installed
        dns-nameservers 192.168.204.2
        dns-search t.lab
Notice that I kept the old configuration part in comments. You can see that the whole address configuration is assigned to the bridge (br0). So the bridge literally gets the host IP. After modding, I restarted the OS, just to check if the network settings would stick, but I guess you can also restart the networking via "/etc/init.d/networking restart" . The result should be something like this



Ok so that was rather easy. Now lets add some lines to the default config so that every container will be connected to this bridge by default. To do so, edit  /etc/lxc/default.conf and add
lxc.network.type = veth
lxc.network.flags = up
lxc.network.link = br0
In fact, if you look at previous screenshot, you can see that there are already 2 containers running, as 2 "vethXXXXX" interfaces already exist.

Now let's setup the root for our container. If you forgot your volume group name, use 'lvm vgdisplay" to display all your volumes groups. Then execute the create command
lxc-create -n tenant -t debian -B lvm --lvname tenant --vgname vg_tenstore
This will create a new container tenant based on the debian template. A template is a preconfigured script to make a certain flavor of containers. In this case, I used a debian flavor, but you can find other flavor here "/usr/share/lxc/templates". The backing store is lvm and the logical volume that will be created is called "tenant" and will be created in the volume group "vg_tenstore". Keep the name of the container and the logical volume the same as mentioned before.

Now the container will be created, you can actually edit it's environment before starting it. I did a couple of edits, so that everything boots smoothly. First mount the filesystem
mount /dev/vg_tenstore/tenant /mnt
First I edited the networking file /mnt/etc/network/interfaces and set a static ip
#....not the whole file
iface eth0 inet static
address 192.168.204.6
netmask 255.255.255.0
gateway 192.168.204.1
dns-nameservers 192.168.204.2
Then I edited /mnt/etc/resolv.conf
search t.lab
nameserver 192.168.204.2
Finally I made a huge security hole by allowing root login via SSH in /mnt/etc/ssh/sshd_config
PermitRootLogin yes
You might want to avoid this but I wanted something quick and dirty for testing. Now umount the filesystem and let's boot the parent container. I used -d to daemonize so you don't get an interactive container you can't escape
umount /dev/vg_tenstore/tenant
lxc-start --name tenant -d
Once started, use lxc-attach to attach directly to the console with root and execute passwd to setup a password. You can go out by typing exit after you setup password
lxc-attach --name=tenant
passwd
exit
Now test your password by going into the console. While you are there, you can check if everything is ok and than halt the container
lxc-console --name tenant
#test what you want after login
halt
You can also halt a container from the host by using "lxc-stop -n tenant". Now that is done, we can actually create a clone. I made a small wrapper script but you can of course run the commands manually. First I made a file "maketenant" and used "chmod +x maketenant" to make it executable. Here is the content of the script
tenant=$1
ipend=$2

if [ ! -z "$tenant" -a "$tenant" != "tenstore" -a ! -z "$ipend" ]; then
        trepo=$tenant"_repo"
        echo "Creating $tenant"
        lxc-clone -s tenant $tenant
        lvm lvcreate -L 10g -n"$trepo" vg_tenstore
        mkfs.ext4 "/dev/vg_tenstore/$trepo"
        echo "/dev/vg_tenstore/$trepo mnt ext4 defaults 0 0" >> /var/lib/lxc/$tenant/fstab
        mount "/dev/vg_tenstore/$tenant" /mnt
        sed -i "s/address [0-9.]*/address 192.168.204.$ipend/" /mnt/etc/network/interfaces
        umount /mnt
        echo "lxc.start.auto = 1" >> /var/lib/lxc/$tenant/config
        echo "Starting"
        lxc-start --name $tenant -d
        #check if repo is mounted
        echo "Check if repo is mounted"
        lxc-attach --name $tenant -- df -h | grep repo
        echo "Check ip"
        lxc-attach --name $tenant -- cat /etc/network/interfaces | grep "address"
else
        echo "Supply tenant name"
fi
Ok so what does it do:
  • lxc-clone -s tenant tenantname : makes a clone of our container based on a (LVM) snapshot
  • lvm lvcreate -L 10g -n"tenantname_repo" vg_tenstore : create a new logical volume for the tenant to store it's backups in
  • mkfs.ext4 /dev/vg_tenstore/tenantname_repo : format that volume with ext4
  •  echo "/dev/vg_tenstore/tenantname_repo mnt ext4 defaults 0 0" >> /var/lib/lxc/tenantname/fstab : Tells LXC to mount the volume to the container under /mnt
  • We then mount the FS again of the new tenant to change the ip
    • sed -i "s/address [0-9.]*/address 192.168.204.$ipend/" /mnt/etc/network/interfaces : replaces the IP with a unique one for the tenant
  • echo "lxc.start.auto = 1" >> /var/lib/lxc/tenantname/config : tells LXC that we want to start the container at boot time
Finally we start the container  and check if indeed the repository was mounted and the IP is correctly edited. The result? :

We have a new tenant (tenant2) is up and running. Let check the logical volumes. You can see that tenant2 is running from a snapshot and that a new logical volume was made as a repository


And the container is properly bridged to our br0


Now you can add it to Veeam based on the IP. If populate, you will see only what you assigned to the container. 


You can than select the /mnt point


And you are all set up. Let's do a backup copy job to it


And let's check the result with "lxc-attach --name tenant2 -- find /mnt/"


That's it, you can now make more containers to have separate chroots. You can also think about security like Iptables or Apparmor but since the container is not privileged, it should already be a huge separation from just separate  users / "/home directories" with sudo rights.

2016/06/06

RPS 0.3.3

It took some time to release this version because it packs a lot of changes which hopefully makes the tool more useful. This release focuses on more "export" functionality.

So the first major change is that the "Simulate" button has been lowered. This makes more sense as you will probably first put in the info and than run the simulation. But the main reason why it was lowered is the additional export functionality. You will now see a couple of checkbox next to the simulate button.


The first checkbox is the "export" functionality. When you check it and run the simulation, an input field will appear with an URL. If you click somewhere in the field, the complete text should be selected, which you can than easily copy with for example ctrl+c. When you reuse the URL, your simulation will automatically execute with all the previous inputs. This way you can share your simulation without having to screenshot everything. Make sure to push the Simulate button before copy pasting as this will refresh the URL field.

But what if you still want a clean screenshot of the result? I can not tell you how many screenshots I already saw of the RPS output in mails/documentation/etc.. However, screenshotting the output might be challenging. First of all you need specific software to cut it out. Next, if the simulation is longer, you should screenshot multiple times and than concat the result. So in this release I'm introducing "Canvas rendering". This will render the result in an hidden HTML5 canvas and than replace a visible image with a copy of the HTML5 output. The result should be a cleaner output, that you can use in documentation. Also opted to reduce the amount info on the output as dates etc. make little sense when a partner wants to send a result to an end customer.



If you are running Firefox or Chrome, you will be able to save the picture by clicking the Download link. The advantage is that the link will push a formatted name with the current date and time. However if you are not using one of both (like IE or Safari), you should still be able to right click it and save it as picture. The reason why the download link does not work in every browser is because I'm using an unofficial "download" attribute. Let's hope in the future, Edge, IE, Safari, etc. will also support it.


Another new future would be support for Active Full support that was added in Veeam v9. This future required some testing to check if the result made sense. Hopefully I got it right, and I would love to hear your feedback!

Finally, a very small request that has been implemented is Replica support. In this first version, I only added support for VMware, but this might change in the future


2016/05/10

BytesPerSec from WMI Raw class via Powershell

If you ever tried to query data from WMI, you might have noticed that there is preformatted data and raw data classes. Recently I tried to make due with the preformatted data but after a while, I saw it was pretty inaccurate for monitoring as it only shows you that specific moment. Especially for disk writing, which seems to be buffered, meaning you get hugh spikes and hugh dips because of buffers emptying all at once.

Well I tried the raw classes and couldn't make sense of it at first. After my google-kungfu did not seems to yield any result (mostly people asking the same question), I tried to make sense of the data via the good old "trial-and-error" method and see if I could squeeze some reasonable results out of it.

The biggest issue with raw classes is that you take samples, and the values are just augmented with new values over time by the system. So you get a hugh number that doesn't mean anything. What you need to do is take 2 samples, lets call them "old" and "new". Your real value over the interval would be
(new val - old val) / (new timestamp - old timestamp)

Well with "BytesPerSec", I could not get it to work until I realised, bytes per sec is already written in an interval. So for "BytesPerSec", it seems you have to look at the "Timestamp_Sys100NS". To convert this to seconds, you multipy it with "*1e-7". (google "1*1e-7 seconds to nanoseconds" to understand why). So what you get is :
(New BytesPerSec - Old BytesPerSec) /((new Timestamp_Sys100NS - old Timestamp_Sys100NS)*1e-7)
So that seems strange because BytesPerSec is already in seconds. On a 1 second interval, you would not need to divide because the difference between Timestamps would be around 1 anyway. However, consider a 5 second sample interval. In this case, the system seems to add 5 samples of "bytespersec". By dividing it by 5, you get an average over the interval. Well it seems to be more complex than that. If you put the sample interval on 100ms, the formula still seems to work, which basically tells me the system is constantly adding to the number but adjusting it to"Bytes Per Seconds". For example, in the script below, I sleep 900ms because that allows powershell to do around 100ms of querying/calculations.

Now, my method of discovery is not very scientific (I could not correlate to any doc), but it does seems to add up if you live compare it to the task manager. So below is a link to a sample powershell script, you can use to check the values. Although I'm writing a small program in C#, I can only recommend powershell to play with WMI, as it allows you to play with WMI without having to recompile all the time, and discover the values via for example the Powershell_ISE.

https://github.com/tdewin/veeampowershell/blob/master/wmi_raw_data_bytespersec.ps1

The script queries info from "Win32_PerfRawData_PerfDisk_PhysicalDisk" and "Win32_PerfRawData_Tcpip_NetworkInterface". If you are looking for a fitting class, you can actually use this oneliner:
$lookfor = "network";gwmi -List | ? { $_.name -imatch $lookfor } | select name,properties | fl
 Adjust $lookfor, for what you are actually looking for.


Update: 
There is acutally a a "scientific method" to do it. More playing and googling turns up interesting results.

 First of lookup your class and counter. So lets say I want "BytesPerSec" from "Win32_PerfRawData_PerfDisk_PhysicalDisk". You would google it, and hopefully you get to this class page:

https://msdn.microsoft.com/en-us/library/aa394308%28v=vs.85%29.aspx

This would tell you about the counter:
DiskBytesPerSec
Data type: uint64
Access type: Read-only
Qualifiers: CounterType (272696576) , DefaultScale (-4) , PerfDetail (200)

If you click enough, you would end up this page:
https://msdn.microsoft.com/en-us/library/aa389383%28v=vs.85%29.aspx

That type is actually 272696576 PERF_COUNTER_BULK_COUNT. If you google "PERF_COUNTER_BULK_COUNT", you might end up here:

https://msdn.microsoft.com/en-us/library/ms804018.aspx

This would tell you to use the following formula:
(N1 - N0) / ( (D1 - D0) / F, where the numerator (N) represents the number of operations performed during the last sample interval, the denominator (D) represent the number of ticks elapsed during the last sample interval, and the variable F is the frequency of the ticks.

This counter type shows the average number of operations completed during each second of the sample interval. Counters of this type measure time in ticks of the system clock. The variable F represents the number of ticks per second. The value of F is factored into the equation so that the result is displayed in seconds. This counter type is the same as the PERF_COUNTER_COUNTER type, but it uses larger fields to accommodate larger values.

 This might still be not so trivial but the nominator should be fairly clear. It is the same value we used before, namely newvalue - oldvalue. The dominator, actually is  ( (D1 - D0) / F). Which would be (newticks-oldticks)/frequency. This turns out to translated to :
($new.Timestamp_PerfTime - $old.Timestamp_PerfTime)/($new.Frequency_PerfTime).

Interesting enough "$new.Frequency_PerfTime" is always the same because it is actually the hz speed of your processor. So it basically tells you how much ticks it can handle per second. The timestamp_PerfTime, is I guess, how many ticks have already passed. So by deducting the old from the new, you get the amount of ticks that have been done between your sample. If you divided that through hz, you get how much "seconds" have past (can be a float). That means you don't have to convert to nanoseconds, and you can use the formula directly like this:
$time = ($new.Timestamp_PerfTime - $old.Timestamp_PerfTime)/($new.Frequency_PerfTime)

So the total formula would be
$dbs = $new.DiskBytesPersec - $old.DiskBytesPersec
$time = ($new.Timestamp_PerfTime - $old.Timestamp_PerfTime)/($new.Frequency_PerfTime)
$cookeddbs = $dbs/$time

Running the mentioned method in the script and this method give you almost the same results but I guess the tiny differences have to do with rounding. Anyway this method in the update should be the most accurate as this is what Microsoft describes as using themself for cooking up the data. Should also give you a more "stable" way to calculate other values, instead of trial and error

2016/04/13

Veeam RESTful API via Powershell

In this blog post I'll show you how you can play around with Veeam RESTful API via Powershell.  This post will show you how to find a job and start it. You might wonder why would you do such a thing? Well in my case it is to showcase the interaction with the API (per line of code), very similar as you with do with wget or curl. If you want an interactive way of playing with the api, know that you can always replace the /api with /web/#api/ (for example http://localhost:9399/web/#/api/) to get an interactive browser. However, via Powershell you get the real sense that you are interacting with the API and all methods used here should be portable to any other language. That is why I've not chosen to use "invoke-restmethod", but rather a raw HTTP call.


So the first thing (which might not be required), is to ignore the self signed certificate. If you access the API via  FQDN on the server itself, the certificate should be trusted, but that would make my code less generic.
add-type @"
    using System.Net;
    using System.Security.Cryptography.X509Certificates;
    public class TrustAllCertsPolicy : ICertificatePolicy {
        public bool CheckValidationResult(
            ServicePoint srvPoint, X509Certificate certificate,
            WebRequest request, int certificateProblem) {
            return true;
        }
    }
"@
[System.Net.ServicePointManager]::CertificatePolicy = New-Object TrustAllCertsPolicy
So with that code executed, you have told dotnet to thrust everything. Next step is to get the API version
$r_api = Invoke-WebRequest -Method Get -Uri "https://localhost:9398/api/"
$r_api_xml = [xml]$r_api.Content
$r_api_links = @($r_api_xml.EnterpriseManager.SupportedVersions.SupportedVersion | ? { $_.Name -eq "v1_2" })[0].Links
With the first request, we basically do a get request to the API page. The Veeam REST API uses XML in favor of  JSON. So we can just convert the content itself to XML. Once that is done, we can browse the XML. The cool thing about Powershell is that it allows you to browse the structure and autocompletes. Just execute $r_api_xml and you will get the root element. By adding a dot and the start-element, you can see what's underneath this node. You can repeat this process to "explore" the XML (or you can just print out the $r_api.Content without conversion to see the plain XML).

Under the root container (EnterpriseManager), we have a list of all SupportedVersion. By applying a filter we get the v1_2 (v9) API version. This one has one Link which indicates how you can logon
PS C:\Users\Timothy> $r_api_links.Link | fl

Href : https://localhost:9398/api/sessionMngr/?v=v1_2
Type : LogonSession
Rel  : Create
So the Href show the link we have to follow. The type tells use the name of the link and then finally, Rel explains us which http method we should use. Create means we need to do a Post.

Most of the times:
  • get method is if you want to get details but don't want do a real action. 
  • post method is used if you want to do a real action
  • put method if you want to update
  • delete method if you want to destroy something
When in doubt, check the manual :  https://helpcenter.veeam.com/backup/rest/requests.html 

Ok for authentication, we have to do something special, we need to send the credentials via basic authentication. This is a pure HTTP standard, so I'll show you two ways to do it
$r_login = Invoke-WebRequest -method Post -Uri $r_api_links.Link.Href -Credential (Get-Credential -Message "Basic Auth" -UserName "rest")

#even more raw

$auth = "Basic " + [System.Convert]::ToBase64String([System.Text.Encoding]::UTF8.GetBytes("mylogin:myadvancedpassword"))
$r_login = Invoke-WebRequest -method Post -Uri $r_api_links.Link.Href -Headers @{"Authorization"=$auth}
Well the first method uses Powershell built-in functionality for doing BASIC authentication. The second method, actually shows what is really going on in the HTTP request. "username:password" is encoded as base64. then "Basic " and this encoded string are concatenated. This result is then set in the Authorization header of the request.

The result is that (if we logged on successfully), we get a logon session, which has links to almost all main resources. Before we go any farther, we do need to analyze a bit the response.
if ($r_login.StatusCode -lt 400) {
If you do a call, you can check the StatusCode or return code. You are expecting a number between 200-204 which means success. If you want to know the exact meaning of the HTTP return code in the Veeam REST API : https://helpcenter.veeam.com/backup/rest/http_response_codes.html

The next thing now is to extract the Rest session id. Instead of sending over the username and password the whole time, you need to send over this header to authenticate. The header is returned after you succesfully logged in.
    #get session id which we need to do subsequent request
    $sessionheadername = "X-RestSvcSessionId"
    $sessionid = $r_login.Headers[$sessionheadername]
So now we have the session id extracted, lets do something useful
    #content
    $r_login_xml = [xml]$r_login.Content
    $r_login_links = $r_login_xml.LogonSession.Links.Link
    $joblink = $r_login_links | ? { $_.Type -eq "JobReferenceList" }
First we take the logon session, convert it to XML and we browse the links. We are looking for the link with name "JobReferenceList". Let's follow this link. In the process, don't forget to configure in your header the session id.
    #get jobs with id we have
    $r_jobs = Invoke-WebRequest -Method Get -Headers @{$sessionheadername=$sessionid} -Uri $joblink.Href
    $r_jobs_xml = [xml]$r_jobs.Content
    $r_job = $r_jobs_xml.EntityReferences.Ref | ? { $_.Name -Match "myjob" }
    $r_job_alt = $r_job.Links.Link | ? { $_.Rel -eq "Alternate" }
So the first line is just getting and converting the XML. The page which we requested is a list of a the job in reference format. The reference format is a compacted way of representing the object that you requested, basically showing the name and the ID of the job and some links. If you add the "?format=Entity" to such a list (or request of an object/objectlist). You get the full details of the job.

So why the reference representation? Well it is a pretty similar concept to the GUI. If you open the Backup & Replication GUI and you select the job list, you don't get the complete details of all the jobs. That would be kind of overwhelming. But when you click a specific job and try to edit it, you get all the details. Similar, if you want to built an overview of all the jobs, you wouldn't want the API to give you all the unnecessary details as this would make the "processing time" of the request much bigger (downloading the data, parsing it, extracting what you need, ..)

So in the 3rd line, what we do is look for the job (or rather the reference to a job), which names matches "myjob". We then take a look at the links of this job and look for the alternate link. Basically, this is the jobid + "?format=Entity" to get the complete details of the job. Here is the the output of $r_job_alt
PS C:\Users\Timothy> $r_job_alt | fl
Href : https://localhost:9398/api/jobs/f7d731be-53f7-40ca-9c45-cbdaf29e2d99?format=Entity
Name : myjob
Type : Job
Rel  : Alternate
Now ask for the details of the job
    $r_job_entity = Invoke-WebRequest -Method Get -Headers @{$sessionheadername=$sessionid} -Uri $r_job_alt.Href
    $r_job_entity_xml = [xml]$r_job_entity.Content
    $r_job_start = $r_job_entity_xml.Job.Links.Link | ? { $_.Rel -eq "Start" }
By now, the first 2 lines should be well understood. In the third line we are looking for a link on this object with name start. This is basically the method we need to execute to start the job. Start is a real action and if you look it up, you will see that you need a POST method it to call it
 #start the job
    $r_start = Invoke-WebRequest -Method Post -Headers @{$sessionheadername=$sessionid} -Uri $r_job_start.Href
    $r_start_xml =  [xml]$r_start.Content

    #check of command is succesfully delegated
    while ( $r_start_xml.Task.State -eq "Running") {
        $r_start = Invoke-WebRequest -Method Get -Headers @{$sessionheadername=$sessionid} -Uri $r_start_xml.Task.Href
        $r_start_xml =  [xml]$r_start.Content
        write-host $r_start_xml.Task.State
        Start-Sleep -Seconds 1
    }
    write-host $r_start_xml.Task.Result
Ok so that a bunch of code, but still wanted to post it. So first we follow the start method and parse the XML. The result is actually a "Task". A Task in the end is representing a process that is running on the RESTful api, that you can refresh, to check the actual status of the process. What is important, it is the REST process but not the Backup Server process. That means that if a task is finished for REST, it doesn't mean necessarily that the action at the backup server is finished. What is finished is that the API has passed your command to the backup server.

So in this example, we check if the task is "Running", we will refresh the task and write out the State, sleep 1 second, and  then do the whole thing again while it is in "Running" state. Once it is finished, we write out the Task.Result. Again, if this task is Finished, it does not mean the job is finished, but the backup server has "started" the job hopefully succefully

So finally we need to log out. That rather easy. In the logon session, you will find the URL to do so. Since you are logging out or deleting your session, you need to use the method "DELETE". You can check that by checking the relationship of the link.
    $logofflink = $r_login_xml.LogonSession.Links.Link | ? { $_.type -match "LogonSession" }
    Invoke-WebRequest -Method Delete -Headers @{$sessionheadername=$sessionid} -Uri $logofflink.Href
I have uploaded the complete code here:
https://github.com/tdewin/veeampowershell/blob/master/restdemo.ps1 
There is some more code in that example that does the actual follow up on the process. But you can skip or analyze the code if you want.