2015/11/30

Extending Surebackup with custom scripts : Sharepoint

Often I visit customers and ask them about there restore tests. Most common answer? We test the backups when we do the actual restores. To the question why not test more frequently, the most common answer would be "time and resources".

A couple of months ago, I actually visited a customer that tried to do a restore from backup. It failed, B&R was able to restore the backup but the data inside seemed to be corrupt. The SQL server refused to mount the database. Exploring multiple restore points, this gave the same issue. It was a strange issue because all backup files where consistent (no storage corruption), and the backup job did not have any failed states. The conclusion was Changed Block Tracking corruption. In light of the recent bugs in CBT, I wanted to emphasize again how critical it is to validate your backups. If the customer would have tested his backups with for example, the SQL test script included in v8 , they might have caught the error before the actual restore failed.

This shows another thing I want to highlight. Surebackup is a framework but your "verification" is only as good as your test. By default, Surebackup application tests are just portscans. This tells you that the service has started (it was able to bind to the port and it is answering), but doesn't tell you anything about how good the service is performing. For example, the SQL service / instance could start, but maybe some databases where not able to be mounted inside the instance.

Few people visit this topic, but you can actually extend the framework. The fact that is supports Powershell makes it quite simple to write more extensive test.

So here is a small test for Sharepoint. I hacked it together today, so please reread the whole script to "Surebackup" my coding skills. It is rather basic, but you could use it for any kind of webservice actually. It simply reads the content of a txt file in a certain site. If the content matches a predefined value, you know that
a) The database was mounted inside the instance
b) Sharepoint is able to talk to the instance and query it
c) The webservice is responding to requests

So how do you get started? Well first upload a txt file with some content. In my case, I uploaded file contesttest.txt with the content "sharepoint is working succesfully" as shown below:


You can right click the link and copy it's location. Test if you can really access it this way as shown below


Now get the powershell goodness from https://github.com/tdewin/veeampowershell/blob/master/suresharepoint.ps1 and put it somewhere on the backup server. Now edit the file.



First of all you can see that everything can be passed as a parameter (e.g. commandline call, use -server "ip" to change the ip address). Change the username and plaintext password to the user that will be used to authenticate against sharepoint. Preferably and account with read only rights and not the administrator as in my screenshot, this way you are sure it doesn't break anything ;).

You might wonder, do I need to provide the password in plaintext? No you don't have to actually, you can also follow this procedure but it might make things more complex. Instead of plaintext passwords, you can use Powershell encrypted passwords but understand that if you want to decrypt the password, you need to be the same user as the one that encrypted the password (the whole point of encrypting it, right?). When Surebackup runs, it is actually being ran by the backup service. So the account that is being used to decrypt the password is the service account used to run this service (as shown in the screenshot below)



If this is not Local System account but a service account, you can use the following cmd script to create an encrypted password:
https://github.com/tdewin/veeampowershell/blob/master/encryptedpasstoclip/encryptedpasstoclip.cmd

Change the username in the bat file, run it, give in the password for the service account and finally give in the password for the account you want to use to authenticate to sharepoint. The result should be that an encrypted password is put on your clipboard. replace the whole password statement in the file. for example:
$pass = "01000000d08c9ddf0115d1118c7a00c04fc297eb01000000c9b320ead0059d409978380353923e8000000000020000000000106600000001000020000000b1816dffef13bc70672b55dfcee25a41488d5bb395ae28242b70afeb90938db9000000000e8000000002000020000000bd7da1d0d06893bed8b035c411c34f181b000aa9f0e4f46658eb3efe3e73c06840000000948652774f7f82848ba3065af8193c23fe25b773cea3ecf65957bdc12cdcc71868a82ba11d0475e65b321056a900d0571a05184b89132c0f21452642033c918340000000e8fcabb194c06c78ad01ee2192b73bf7ba799630adfedb6091dc1a629dc9d5a2a6025a64fcf74fe8a89d4a579a54c3538928ee0d22a57f22f6e50da240deaa62"
If you go this far (or you skipped the whole password encryption part because your backup server is a Fort Knox anyway), we can now configure the script. Go to Veeam B&R and configure the test script as shown below in the application group (or in the linked jobs part):



Notice that I also configured "Arguments" as "-server %vm_ip%". This will pass the sandbox IP to the script directly.

Before you actually startup Surebackup, you can test the script with your production environment. If it doesn't work against production environment, it will probably fail against your lab environment. In case you configured an encrypted password with another account, you can temporarily override it with the following command (In case you did not, you can just run as script.ps1 -server )
PS C:\Users\demo2> C:\scripts\suresp.ps1 -server -password (read-host -assecurestring -prompt "pass" | convertfrom-securestring)


Now if everything is green and you got a match, run Surebackup, and validate if you can get the same output in your lab


If it failed, you can actually check the logs for the output the script gave. Go to "%programdata%\Veeam\Backup\". It should contain a folder with your name. In this folder, there should be a log called Job.. You can open it with notepad


Scroll all the way down in the log and look for "[console]"


This should give you the output of the console. In this case, everything was ok!

2015/11/27

Veeam Application Report

As many as you know, although Veeam B&R has an agentless approach it still makes sure that all the applications are consistently flushed just before the backup starts. To do this, Veeam B&R leverages VSS. One thing it also does is, it tries to detect what applications are installed in what VM. This data is collected so that during restore, you don't have to figure out which VM is holding what application and where exactly the application database is stored inside of the VM (for example for Exchange, it will detect the path(s) leading to the EDB(s)).

Now a fellow SE colleague requested to add this "application detection" to the main GUI. They wanted to leverage the detection to sort out with VMs have what application installed. Adding it to the main GUI would however make it more complex but you can actually leverage the data via Powershell.

So here is a sample script you can use as a starting point:
https://raw.githubusercontent.com/tdewin/veeampowershell/master/veeam-per-app-detect.ps1

It generates a nice clean report with all the VMs that have detected applications (yes even Oracle so it is v9 ready), grouped per application. The output should look something like shown in the screenshot below:


Enjoy!

2015/10/09

RPS 0.3.2

Just a small update (which required some re-engineering under the hood).

First of all, when you click the backup files size, you get a small "pop-window". This will tell you the uncompressed and compressed bandwidth usage over multiple interval. It should help you to understand how much processing power you need for a certain amount of input. The first 2 lines are uncompressed in byte and bit. the second two lines is the data 'compressed' in bit an bits. The columns indicate your time window. Notice that clicking on the full file or the incremental file gives different output so you can check full runs vs incremental runs real easily.


The second feature is available when you click the total file size. It will give you a table overview of the output, which you can easily copy it to excel or calc. The numbers are all in GB so they give a predictable output.


The final feature is a very small one but I really like it because it took literally 10minutes to code but will remove some frustration. During a recent conf call with my colleague Johan Huttenga, I noticed he was struggling with inputting 30Tb of data. He needed to calculate it by multiplying 1024 by 30 to get exactly 30TB and not 29,99 TB. So in this version, you can input 30TB, and it will be automatically converted to "30720". Same for 1PB to "1048576". The input is case insesitive so tb,Tb,TB,pb,PB,pB, etc. should all work. For example, fill in 1TB like shown below


As soon as you push enter, tab to another input or click on the simulate button, the input will be dynamically changed.

2015/08/25

Global backup report in Powershell

A lot of partners, try to go that extra step and also manage onsite Veeam backup environments. They mostly want a report with all jobs status from all Veeam backup servers, instead of plowing through 100's of emails sent by multiple backup servers

Enterprise manager allows you to do that. You can add multiple backup servers and it will give you a global overview. However, it also acts like a license manager. So if you have different customers, with different licenses, you can not add them all together in one Enterprise manager.

A way around it would be to create some Restful API integration. That would be the cleanest way to do it in my humble opinion. However if you want to have a quick hack, you can also do it by using Powershell. Just launch a remote session to all those backups servers and collect the data.

Now a lot of people just need a small "sample" script, just to get started. So here is a basic "sample". It is surely not future complete and has very poor error handling. But it can get you started.

So the first part defines the instances. Granted it would be cleaner to take the table, convert it to a csv, and then import it at the beginning of the script. The instances table exists out of objects that define: the customer, the backupserver, the username and then the password in an encrypted form. Not sure how to get the password in an encrypted form? Just use the code at the top to generate what you need. However make sure that the whole password doesn't have any line breaks when you copy past!


Resulting in the pre-created code


After correct copy/pasting and removing line breaks, you should get something like this


If you then run the code, it should connect to all the instances, execute some Veeam PS Code, built a table and then collect it centrally. The end result? A $globaljob table, which you can then use to build a csv report, html report, one big email, etc.. Hope it can be useful to somebody as a starting point!

2015/08/17

Getting the correct input into RPS

In a first article about the Restore Point Simulator (RPS) I talked about the history of RPS and why it was created. I want to take the time now to explain the correct input parameters. Although I added some tool-tips in 0.3.1, people are sometimes confused how it all ties together.

TL;DR? The RPS GUI maps quite directly to the Veeam interface, check the screenshots to see how, if you already understand how Veeam works.

For those that didn't read the previous article, I'll repeat the formula here which we use at Veeam to do rough estimations. Why? Because RPS is directly based on it:
Backup size = C * (F*Data + R*D*Data)
Data = sum of processed VMs size by the specific job (actually used, not provisioned)
C = average compression/dedupe ratio (depends on too many factors, compression and dedupe can be very high, but we use 50% - worst case)
F = number of full backups in retention policy (1, unless backup mode with periodic fulls is used)
R = number of rollbacks (or increments) according to retention policy (14 by default)
D = average amount of VM disk changes between cycles in percent (we use 10% right now, but will change it to 5% in v5 based on feedback... reportedly for most VMs it is just 1-2%, but active Exchange and SQL can be up to 10-20% due to transaction logs activity - so 5% seems to be good average)
This formula and RPS have 3 parameters in common. Data would be the first one and it maps to "Used Size GB". To give you and example, if you would have a VM with one VMDK, the used size would be the amount of blocks the VM has already written to. So for example if you would have a thick provisioned VMDK of 50GB but you only use 20GB inside the guest, the used size would be around 20GB. A more correct definition would be what the VM is actually using when it would be thin provisioned. Because Veeam backs up at the block level, a thin provisioned disk would be exactly what Veeam needs to process during a full backup (and some meta data).

D or delta is the amount of data that changes between backups. In RPS the parameter is called change rate. This is dependent on 2 parameters. First of all the frequency of backups which is in most companies "daily". Secondly the application. Don't put this parameter to low. Veeam backups up at the block level. A small update, can flag a bigger block on the VMDK level than you estimated. If you fill up a disk sequentially like a file server, you won't notice this  so much because 10 sequential changes could be only flagging 1 block. However, if you are doing a lot of small random I/O on various locations, the number can quickly rise. So from my experience,  a number of 5% changes is fairly optimistic while 10% is rather conservative. I personally prefer a more conservative approach.

Finally C or compression which is called "Data left after reduction". I have had tons of people discussing this parameter with me because you can interpret this one in various ways. However look at the formula and it will all make sense. The formula basically says:
Backup Data = C * (Total Data In)
So C is a factor that should make "Total Data In" smaller. So the smaller the number, the smaller the backup. Compression factor is thus a percentage that tells how much data is left after it has done it job. If you define 40% (40/100) and consider a "data in" of 100, the backup data would be 40 = (40/100)*100 . If you define 60%, you actually tell the engine that you expect worse results because the end result would be 60 = (60/100)*100. So the lower the compression value, the better the compression.

For some people this feels counter-intuitive. If you prefer compression factors like 2x or 3x instead, you can easily convert those by dividing 1/(compression factor). So for example 2x would be 1/2 = 50%. If we put that in the previous formula, we would get 50 backup data = 50%*(100 total data in). So data was compressed to 2x. For 3x, you would get 1/3 or 33% =~ 35%. Again if we fill this into the formula you would get 35 = 35%*(100). Use this link if you really want to use 33%

If you want to disable compression, put data reduction to 100%.

In the formula there are 2 parameters left, F & R. Like discussed in the previous article, these one are hard to calculate, and that is exactly what RPS does for you. So for retention points, you should just define your policy. If you need 14 daily restore points, put in 14 for "Retention points" and daily for "Interval". You will see that in some case, you might end up with more restore points than you configured. This can be normal as Veeam considers your retention policy but also dependencies between Fulls and Incrementals. So don't try to adjust the retention points because you feel it miscalculated, instead take a look at the example given in previous blog article.

What also influences the amount of parameters is the style parameter. This actually maps directly to Veeam GUI.  For example, "Backup Copy Job" (BCJ) should be quite easy to understand. It refers directly to a Backup Copy Job . Selecting should it should also show you BCJ specific configurations


For the other "styles" called Incremental and Reverse, the other setting map mainly to the advanced settings of a regular job except for "Retention Points"


If you selected Incremental without any active fulls or synthetic fulls, you would get a forever incremental job. It also maps directly to the GUI but more specifically to the advanced settings of a regular job. Granted, the checkboxes are under the buttons "Days" and "Months". Also there is no global "enable" checkbox for synthetic or active but do understand that checking one of the boxes enables "Active" or "Synthetic". Finally "Monthly" has preference over "Weekly". In the Veeam GUI, you can selected only one, here you can enable checkbox for both but weekly will be ignored when you enabled one Month.


So if you want a weekly synthetic backup, it would be like shown below. I did not implement "Transform", and with good reason. Until the day of today, I have received 0 requests for it.


A Monthly Active Backup would like shown below. Important here is to know this does not defines GFS policies. Those can only be defined in a Backup Copy Job. I once had a guy that checked only January because he want to have a "yearly full backup for archiving". He was quite surprised by the result. Basically you just told the engine that you want to have a yearly chain which only get resets in January.


Finally "Reverse" incremental, can be found in the same place.


That leaves only 1 parameter left, and that is the growth simulation. It is a recent addition to RPS, and personally I think it is one of the coolest things added to it. Let me explain what it does and how it works. If you need to size for the following 3 years, and you know that you have a growth rate of 10% on a yearly basis, you can just add that to the RPS. What it does is, it takes your "used data", and grows it on a daily basis via: Future Used Data = Used Data * ( 1 + 10% ) ^ (Day X/365). Thus on the last days of the simulation -after 3 years-, it would put Future Used Data = Used Data * ( 1 + 10%) ^ (1095/365).

So lets imagine you have chosen reverse incremental, 3 years simulation, 10% year on year data growth and 1000GB of data. For simplicity, let's disable compression. Calculating this manually would give you roughly 1000*(1+10%)^(1095/365) = 1000*(110%)^3 = 1000*(1,10)^3 = 1000*1,10*1,10*1,10 = 1331. Now check the Full backup in the configured example. Also interesting is that the increments from 2 days  ago (retention point 3), has a smaller "Future Used data" set. It used 3y - 2 days in the formula, thus Future Used Data = 1000*(1+10%)^(1093/365) =~ 1330,30.


The differences are small in regular jobs but once you go GFS, the time growth calculation can have a dramatic impact which is hard to calculate with excel.


With this, I have discussed all parameters. Some might ask, what about "Quick Presets", what does that do? Well they are just quickly preconfigured scenario's that you can use. For example, if you want to have a monthly active backup, you can click all 12 select boxes or you can just select the matching style + monthly active full to quickly configure this scenario

If you made it this far, thanks for reading and enjoy playing with RPS.

2015/08/05

A brief history of the Restore Point Simulator

During the development of the restore point simulator, I often have encountered questions from users that led me to believe that it is not always clear how to use the tool and what it can do for you. In this blog article series, I want to take the time and explain you  why RPS was developed in the first place and how you can use it.

In the beginning there was nothing, just our famous formula to calculate repository spaces. I'll quote it here because it is still the main idea behind RPS. Many Veeam SEs had there own excel configuration sheet to quickly spit out some numbers, some more pretty than others.

Backup size = C * (F*Data + R*D*Data)
Data = sum of processed VMs size by the specific job (actually used, not provisioned)
C = average compression/dedupe ratio (depends on too many factors, compression and dedupe can be very high, but we use 50% - worst case)
F = number of full backups in retention policy (1, unless backup mode with periodic fulls is used)
R = number of rollbacks (or increments) according to retention policy (14 by default)
D = average amount of VM disk changes between cycles in percent (we use 10% right now, but will change it to 5% in v5 based on feedback... reportedly for most VMs it is just 1-2%, but active Exchange and SQL can be up to 10-20% due to transaction logs activity - so 5% seems to be good average)

This formula has some difficulties. First of all the (C)ompression ratio and the (D)elta are difficult parameters to estimate but it does give you some hints what we at Veeam use inside and a fairly good explanation why these values are chosen. But more difficult are F and R. These values define how much full backups you will need or how many incrementals you need. With reverse incremental / forever incremental, that is quite easy to calculate, you'll have F = 1 and R = rps - F. 

However when you talk about weekly synthetics or active fulls, the number is rather difficult to calculate. Even Veeam users do not always understand the effect of a certain policy. For example, if you configure a forward incremental with weekly full and 2 restore points (rps), you can expect up to 9 rps, because of dependencies. I had countless discussions with customers arguing that Veeam did (does) not respect their rps policy, when in fact it does its absolute best to respect your policy. If you run the simulation, you can actually see the dependency. In the fist column (called Retention), you will see something like 3 (2) or 4 (2). This means that point 3 or 4 are both kept because point (2) is dependent on it.

Now if you want to excellify this, you can come up with something like F = #Weeks + 1, R = (F*7*#DailyBackups - F).  Imagine 14 rps with daily backup, that would be F = 2+1 =3 and R = 3*7*1 - 3 = 21 - 3 = 18. Well that would be really close to what RPS says, but explaining that to people does take some time and it is not always accurate but more guesstimation.

Another common misconceptions is that a monthly backup would require less space than a weekly backup. While this can be the case, remember that a monthly backup would create a chain of 30 points. If you configure a policy of 14 points in forward incremental with monthly full, the worst case scenario would be 12 days after a second full backup is created. This because you got 12 increments dependent on the current full, but you need to keep the whole previous chain, because the oldest restore point is an increment dependent on a previous full backup and a chain of 30 increments. If you configured weekly full, a chain would be maximally 7 days, so less would be stored. This can exponentially grow when you do for example a backup every 12 hours or even more. However if configure for example 60 restore points, a monthly full backup can be cheaper than a weekly full backup. The more days worth of restore points configured, the more likely a monthly full backup will actually consume less space.

These 2 examples, show exactly why RPS was made. Different customers cases require different approaches. Also, it reconfirms that assumption is the mother of all mistakes. So explaining how retention works without very difficult formula's was actually my main goal when the first edition of RPS was made.

Another example is my new all time favorite that shows why what feels natural is not always reality. Some months ago, a partner thought a forever incremental backup chain of 365 would be more efficient than a GFS policy with 12 fulls.  This even surprised me the first time I ran it because incremental backups feels more lightweight. I remembered from my v7 SE training that GFS should be more efficient, but just running the simulation reconfirms this.

It is true that forever incremental versus weekly full is so much more efficient in terms of disk space savings. However 30 incrementals, quickly add up and for long time retention, a monthly full could be more efficient. There is one caveat. With 365 increments, you do have more granularity than 12 full monthly points in time. However, I do want to remind you that those 12 full backups are completely independent of each other. So a single bit rot corruption would only impact one point, while in a 365 restore point chain, this potentially impacts the whole chain. So I think in the majority of cases the more efficient disk usages and the in-dependency of points is better than a very long chain of increments, but hey, it is up to a company to decide their policy.

Finally, I remember one of the major updates was adding GFS support. Calculating and explaining GFS policies is nearly impossible with Excel. Why? Imagine you configured weekly backups to keeps the restore points of Sunday and you configure monthly backups to keep the restore points of the first Sunday of the month. In this case, the first Sunday of the month backup, could be used to satisfy the weekly backup policy as well as the monthly backup. In fact this is what Veeam does. So if you configure for example, 12 weeklies and 3 monthlies, you would assume that the amount of full is 12+3+1 (1 for the simple retention policy). However this is not the case. If you configured your policy correctly so that weekly and monthly points can coincide (schedule button), you will actually get less points. You can see this common points again in the retention column. "10W 3M 0Q 0Y" means that it represents the 10th weekly point but also the 3th monthly point.

@poulpreben (if you don't follow him on twitter, do it now) and I spend hours discussing how we could calculate this with formula's. We concluded that the only way to actually do it was to emulate what happens inside B&R on a daily basis for some time. In fact that is what RPS does. If you configure a retention policy, it will try to predict a period of time in which the worst case scenario should occur (most data on disk). This is why, when you configure 5 yearly backups, it takes some time to calculate, because it will run over 2000 days trying to simulate the behaviour of B&R.

So TL;DR? Don't just assume, run it through RPS. Be critical with the results of RPS (Software can contain bugs) but also try to understand why something is different than you first estimated.

2015/05/05

Veeam Job Managers, Veeam Agents, Tasks and more..

One of the most common statements I hear is that Veeam is good for SMB. Maybe with good reason. The GUI is simple and sleek. It doesn't require you to study 2 weeks just to install and configure some jobs. This was done by intention and makes it a good solution for SMB, but not exclusively for that segment.

A lot of green magic is going on under the hood. The application is created so that it works out of the box in smaller environments. However, when you are in a bigger environment, it might be interesting to dig a little deeper and understand the global architecture. Most of the problems I see when talking to people are just mis-configurations or not understanding how it all works together.

When I start talking about proxies and repositories, to my surprise, I still see a lot of people wondering what it all means. Hopefully after this blog posts, or everything is clear, or you are so fascinated that you want to know more.

Before we think about backup, let's look at the scalability of a movie theater. If we look at people handling the ticket process, we can see a couple of people that are important.

First of all, we have people that are selling tickets. They are actually executing all the hard work. They do a constant repetitive process of taking money and exchange it for a ticket. There job is fairly simple in what they need do, but they are quite busy.

At the gate of the movie room, there is somebody checking those tickets, and letting people inside the room. Maybe they can handle multiple tickets, but only person (or group) is allowed into the screening room at the time because otherwise it will be all messy. It is clear, that scanning or checking a tickets is a much simpler process then selling the whole tickets. Less interactions is required. Thus the ratio, cashiers vs doormen is different

At the heart of the system is someone that regulates the whole chain, let's say the floor manager. He decides where cashiers are sitting and thus in effect which customers they are handling. He is instructing the doormen when they need to come to work and when they can take a break. Most importantly, he is not doing any of the real physical work but it is the brain of the operations.Without him instructing, nobody isn't doing anything.

Scaling out is easy. If you run a small theater, you might have only 1 person doing all the jobs. This of course simplifies stuff. However people are not so good in multi threading. So if you need to scale out, you can just hire more cashiers and doormen. These do the hard physical work but require minimal training because they just repeat the same job over and over.




Well Veeam has a very similar chain of responsibility. They are not always apparent but scale out is possible. So how does a movie theater map to Veeam? Well in v6 , Veeam introduced Veeam Agents (VA) or Datamovers (Enterprise Scalability). These small binaries are like the cashiers and doormen. Actually a VA has the logic for both being the doorman or being the cashier at the same time. Still these routines are pretty simple and the Veeam Agent does not have any GUI or anything.

So what do VA's do. Well the task or source agent will read the data from production, execute compression and vmdk level deduplication and send it to the job or target or job agent. The job agent takes the data, deduplicates at the job level and writes it to the backup file. Above it all is the job manager. It will instruct task agents and job agents to work together. However except for scheduling work, it is not really interacting a lot with the VMDK data itself. It just checks that all VM's and VMDK's are succesfully backed up that are configured in this specific job. Don't confuse the job manager with the backup server itself. The Job Manager is just middle management. It is the backup service that will notice that a job is ready to be scheduled. But instead of handling multiple jobs itself, it delegates this to different floor managers. However, he has the overview of theater and ensures that everything is running smoothly.

Already we can see a couple of interesting stuff:
  • A source agent handles individual VMDK's
  • The target agent takes multiple VMDK's or tasks for that specific job and writes it to one file. The reason is quite simple. Writing from multiple process to one file can create a lot of confusion about who is updating one segment of the file, especially if we look at metadata updates. Also job level deduplication is quite difficult if different process are writing to the same file. In case you need multiple streams, you need to power on multiple jobs at the same time.
  • The Job Manager runs on the backup server
  • The VA's can run on separate servers, called proxies or repositories. However, for VMware environments, by default Veeam creates a proxy server and a repository server on the main backup server itself. This gives the impression that Veeam is not scalable because it works out of the box. Don't be confused, if you want to run everything on one physical big physical servers, that's fine. However the scalability is there. For Hyper-V environments, Veeam will actually put the VA on the Hyper-V host itself by default. This is what is called an "on-host" proxy.
  • I once got the remark about a customer that he needed an VA for each VM he had in his organization. This is of course not the case. It would be ideal of course. Imagine that you go to the movie theater and that for each customer there is a cashier. That means you will never ever have to queue. However, we all know that is not how it works. When a customers enters, it queue in one of the available lines, and it when it is his turn, he will be served. Even better, maybe there is somebody at the front checking the available resources and then balancing the load over the available queues. Same   for Backup & Replication, you will need an amount of VA's and Veeam will load balance the load over them in a smart way.
  • Maybe not so clear, but after the all visitors went into the screening room, the door needs to be closed so that no light and noise from outside the room goes in. For Veeam, it is the same, the job is not over after all VMDKs have been backed up. The job agent still needs to consider retention. If you configured 14 points, but after the backup, 15 points are on disk, maybe some clean up action is needed. Since the job agent is close to the repository, it will be the one deleting the files, executing merges or synthetic fulls.
  • A VA for Windows can be a proxy and a repository. A VA for Linux can only act as a repository. Finally you can imagine that you can not run the job agent on a CIFS repository. So Veeam picks out another server automatically to run the job agent. Sometimes, this green magic, does not 100% have all the information it needs to make a rational decisions. Imagine the CIFS share is on another site, and the connection between both sites are limited. In this case it makes sense to run the Job agent near the CIFS share so it can do the merge or synthetic full locally. So if you have multiple sites, it might be wise to configure the gateway server manually (aka, where do I run the job agent)
Another thing that might be interesting to know is that the VA is just a simple binary. It is not being loaded as a service. So how does it get started. Well that depends on the server. For Windows, when you add a repository, the first thing you actually do is a add a "managed server". Adding a managed server for Windows means Veeam will pus:
  • An installer service : to install or update the transport service
  • Transport service : the service running on a windows managed server that kickstarts the VA
You can also check this. If you already configured windows backup proxies and repositories before, you can go to backup infrastructure. Under managed servers, you will see your proxies pop up. If you open up the GUI, you can see under ports, the installer service and the transport service. On the server itself, you will be able to see the Veeam services as well via the services.msc.

Pushing those services is done via the administrative share. However, if due to security configurations, you are not able to do so, you can actually manually install "the installer service". Support can help you with this



For Linux server, Veeam just uses SSH to upload the agent and start the agent dynamically. SSH is a pretty stable protocol for uploading and managing Linux server. This way, Veeam doesn't have to integrate with the different init.d, upstart, etc. mechanisms.

So for this part, lets put all this info into practice. On a Veeam Backup Server with default proxy, I started two jobs





Then I started a small script I wrote to monitor Veeam processes .



First of all, you will see a lot of green magic going on. I'm not going to discuss all the services now but maybe pick out some specifics.

The first one with ID 11604. This is the Veeam Backup Service or the local branch manager. It is started at boot time (parent is the services.exe daemon).

The next 2 processes are job manager. We have two Jobs running so of course, the backup services has started two floor managers to handle those. You can see them with ID  87228 (Backup Job Linux) and  84604 (Backup Job Windows). How do I know they are corresponding to those jobs? Well in the command line, there are random hashes. The first one is actually the Job ID, which you can find in the logs. For example, at the end of the Backup Job Linux log I found
Job has been stopped successfully. Name: [Backup Job Linux], JobId: [82062049-544e-44fc-9bb8-f3727e3464ac]
The second one is the Session ID, which changes every time the job runs. You can also find these IDs in the database. For example, if you need a reference table, you can check BJobs for the matching job ID vs job name

Since I'm running everything on the same server,  you can also see the VAs. You will that there are more VAs active than described above. For example per VM there is also a VA running. But let's check the ones we discussed.

For proxy VA's, you can check ID 85072 (sharepoint vmdk) and 86760 (ad vmdk). What I like is that you can also check is where they are running and logging to. Also, it is important that even though they are running on the backup server, they have not been started by the backup server but rather by the VeeamTransportSVC daemon. So the job manager has contacted this Transport service to kickstart the whole process. Finally, although the Linux Job was running, it didn't have any resource ready to backup the VMDKs (proxy slots) free.

At the target side you can see the target agent or repository agent running with ID 87744 (Backup Windows Job) and ID 86656 (Backup Linux Job). As discussed, there is only one target writer for every job.