Timo's Techie Corner: 2015-05-03

One of the most common statements I hear is that Veeam is good for SMB. Maybe with good reason. The GUI is simple and sleek. It doesn't require you to study 2 weeks just to install and configure some jobs. This was done by intention and makes it a good solution for SMB, but not exclusively for that segment.

A lot of green magic is going on under the hood. The application is created so that it works out of the box in smaller environments. However, when you are in a bigger environment, it might be interesting to dig a little deeper and understand the global architecture. Most of the problems I see when talking to people are just mis-configurations or not understanding how it all works together.

When I start talking about proxies and repositories, to my surprise, I still see a lot of people wondering what it all means. Hopefully after this blog posts, or everything is clear, or you are so fascinated that you want to know more.

Before we think about backup, let's look at the scalability of a movie theater. If we look at people handling the ticket process, we can see a couple of people that are important.

First of all, we have people that are selling tickets. They are actually executing all the hard work. They do a constant repetitive process of taking money and exchange it for a ticket. There job is fairly simple in what they need do, but they are quite busy.

At the gate of the movie room, there is somebody checking those tickets, and letting people inside the room. Maybe they can handle multiple tickets, but only person (or group) is allowed into the screening room at the time because otherwise it will be all messy. It is clear, that scanning or checking a tickets is a much simpler process then selling the whole tickets. Less interactions is required. Thus the ratio, cashiers vs doormen is different

At the heart of the system is someone that regulates the whole chain, let's say the floor manager. He decides where cashiers are sitting and thus in effect which customers they are handling. He is instructing the doormen when they need to come to work and when they can take a break. Most importantly, he is not doing any of the real physical work but it is the brain of the operations.Without him instructing, nobody isn't doing anything.

Scaling out is easy. If you run a small theater, you might have only 1 person doing all the jobs. This of course simplifies stuff. However people are not so good in multi threading. So if you need to scale out, you can just hire more cashiers and doormen. These do the hard physical work but require minimal training because they just repeat the same job over and over.

Well Veeam has a very similar chain of responsibility. They are not always apparent but scale out is possible. So how does a movie theater map to Veeam? Well in v6 , Veeam introduced Veeam Agents (VA) or Datamovers (Enterprise Scalability). These small binaries are like the cashiers and doormen. Actually a VA has the logic for both being the doorman or being the cashier at the same time. Still these routines are pretty simple and the Veeam Agent does not have any GUI or anything.

So what do VA's do. Well the task or source agent will read the data from production, execute compression and vmdk level deduplication and send it to the job or target or job agent. The job agent takes the data, deduplicates at the job level and writes it to the backup file. Above it all is the job manager. It will instruct task agents and job agents to work together. However except for scheduling work, it is not really interacting a lot with the VMDK data itself. It just checks that all VM's and VMDK's are succesfully backed up that are configured in this specific job. Don't confuse the job manager with the backup server itself. The Job Manager is just middle management. It is the backup service that will notice that a job is ready to be scheduled. But instead of handling multiple jobs itself, it delegates this to different floor managers. However, he has the overview of theater and ensures that everything is running smoothly.

Already we can see a couple of interesting stuff:

A source agent handles individual VMDK's
The target agent takes multiple VMDK's or tasks for that specific job and writes it to one file. The reason is quite simple. Writing from multiple process to one file can create a lot of confusion about who is updating one segment of the file, especially if we look at metadata updates. Also job level deduplication is quite difficult if different process are writing to the same file. In case you need multiple streams, you need to power on multiple jobs at the same time.
The Job Manager runs on the backup server
The VA's can run on separate servers, called proxies or repositories. However, for VMware environments, by default Veeam creates a proxy server and a repository server on the main backup server itself. This gives the impression that Veeam is not scalable because it works out of the box. Don't be confused, if you want to run everything on one physical big physical servers, that's fine. However the scalability is there. For Hyper-V environments, Veeam will actually put the VA on the Hyper-V host itself by default. This is what is called an "on-host" proxy.
I once got the remark about a customer that he needed an VA for each VM he had in his organization. This is of course not the case. It would be ideal of course. Imagine that you go to the movie theater and that for each customer there is a cashier. That means you will never ever have to queue. However, we all know that is not how it works. When a customers enters, it queue in one of the available lines, and it when it is his turn, he will be served. Even better, maybe there is somebody at the front checking the available resources and then balancing the load over the available queues. Same for Backup & Replication, you will need an amount of VA's and Veeam will load balance the load over them in a smart way.
Maybe not so clear, but after the all visitors went into the screening room, the door needs to be closed so that no light and noise from outside the room goes in. For Veeam, it is the same, the job is not over after all VMDKs have been backed up. The job agent still needs to consider retention. If you configured 14 points, but after the backup, 15 points are on disk, maybe some clean up action is needed. Since the job agent is close to the repository, it will be the one deleting the files, executing merges or synthetic fulls.
A VA for Windows can be a proxy and a repository. A VA for Linux can only act as a repository. Finally you can imagine that you can not run the job agent on a CIFS repository. So Veeam picks out another server automatically to run the job agent. Sometimes, this green magic, does not 100% have all the information it needs to make a rational decisions. Imagine the CIFS share is on another site, and the connection between both sites are limited. In this case it makes sense to run the Job agent near the CIFS share so it can do the merge or synthetic full locally. So if you have multiple sites, it might be wise to configure the gateway server manually (aka, where do I run the job agent)

Another thing that might be interesting to know is that the VA is just a simple binary. It is not being loaded as a service. So how does it get started. Well that depends on the server. For Windows, when you add a repository, the first thing you actually do is a add a "managed server". Adding a managed server for Windows means Veeam will pus:

An installer service : to install or update the transport service
Transport service : the service running on a windows managed server that kickstarts the VA

You can also check this. If you already configured windows backup proxies and repositories before, you can go to backup infrastructure. Under managed servers, you will see your proxies pop up. If you open up the GUI, you can see under ports, the installer service and the transport service. On the server itself, you will be able to see the Veeam services as well via the services.msc.

Pushing those services is done via the administrative share. However, if due to security configurations, you are not able to do so, you can actually manually install "the installer service". Support can help you with this

For Linux server, Veeam just uses SSH to upload the agent and start the agent dynamically. SSH is a pretty stable protocol for uploading and managing Linux server. This way, Veeam doesn't have to integrate with the different init.d, upstart, etc. mechanisms.

So for this part, lets put all this info into practice. On a Veeam Backup Server with default proxy, I started two jobs

Then I started a small script I wrote to monitor Veeam processes .

First of all, you will see a lot of green magic going on. I'm not going to discuss all the services now but maybe pick out some specifics.

The first one with ID 11604. This is the Veeam Backup Service or the local branch manager. It is started at boot time (parent is the services.exe daemon).

The next 2 processes are job manager. We have two Jobs running so of course, the backup services has started two floor managers to handle those. You can see them with ID 87228 (Backup Job Linux) and 84604 (Backup Job Windows). How do I know they are corresponding to those jobs? Well in the command line, there are random hashes. The first one is actually the Job ID, which you can find in the logs. For example, at the end of the Backup Job Linux log I found
Job has been stopped successfully. Name: [Backup Job Linux], JobId: [82062049-544e-44fc-9bb8-f3727e3464ac]
The second one is the Session ID, which changes every time the job runs. You can also find these IDs in the database. For example, if you need a reference table, you can check BJobs for the matching job ID vs job name

Since I'm running everything on the same server, you can also see the VAs. You will that there are more VAs active than described above. For example per VM there is also a VA running. But let's check the ones we discussed.

For proxy VA's, you can check ID 85072 (sharepoint vmdk) and 86760 (ad vmdk). What I like is that you can also check is where they are running and logging to. Also, it is important that even though they are running on the backup server, they have not been started by the backup server but rather by the VeeamTransportSVC daemon. So the job manager has contacted this Transport service to kickstart the whole process. Finally, although the Linux Job was running, it didn't have any resource ready to backup the VMDKs (proxy slots) free.

At the target side you can see the target agent or repository agent running with ID 87744 (Backup Windows Job) and ID 86656 (Backup Linux Job). As discussed, there is only one target writer for every job.

Timo's Techie Corner

2015/05/05

Veeam Job Managers, Veeam Agents, Tasks and more..