2016/01/14

Extending Surebackup in v9

Now that everybody has posted there favorite new features in Veeam v9, I want to take the time to highlight one particular feature. This is the credential manger part in Surebackup. This extra tab can be found when you configure your VM in the application group.


So why this extra tab? Well you can read my Surebackup Sharepoint validation script and instantly see the biggest problem. Storing your credentials in a secure way is 60% of the whole blog article. This is because of in v8, all scripts are started by the backup service and thus inherent this account and permissions.

Enter v9, the credentials tab is added. My first assumption was that all scripts will run under the configured account. That turned out to be incorrect. The script will be started up with the backup service account, but the Network Credentials are changed. This has one big advantage, even if your backup server is not in the domain, you can still use these credentials. Think of it as using "runas /netonly" to start up an application (this is how Product Management explained it to me). The credentials are only applied when connecting to a remote server.

So for the fun of it, I have already looked into some example scripts. They might not be all that stable and it is better to change them to your liking, but they should give you an idea on where to start.

First of all, you can find an update version of the Sharepoint script. The only parameters to set are:
  •     -server [yourserver, %vm_ip% for example]
  •     -path [to content you want to check, by default : /Shared%20Documents/contenttest.txt]
  •     -content [the content to check, by default  "working succesfully"]
If you then setup the account correctly that can access the webservice, it will authenticate succesfully with the network credentials, download the file and match the content. The real magic? "$web.UseDefaultCredentials = $true"

But the fun doesn't stop there. I also tried to make a SQL script. You only need to pass:
  • -server [yourserver, %vm_ip% for example]
  • -instance [by default MSSQLSERVER]
It will logon to the instance, issue a "use" and "query the tables" to all databases. Finally it check the state of the databases in "sys.databases". The use, makes sure that SQL Server actually tries to mount the database. But the cool thing is, you can easily alter the example to execute a full blown sql query and then check if the output satisfies your need. The real magic? "Server=$instancefull;Integrated Security=True;"

Also added a template for just any plain Powershell script  . This might look trivial (it doesn't do anything but logon and write the hostname), but I spend some time figuring out you need " -Authentication Negotiate" and that there is no need to setup SSL. However, do check if the firewall allows remote connection from outside the domain if you want to use this one.

So no more excuses for writing those extensive application test scripts!

Final tip, if you are customizing these examples, you can do a Powershell "write-host" at anytime. The output can be found in the matching surebackup log. By default in:
  %programdata%\veeam\Backup\\Job..log

For example, for the SQL script, you would find something like:
[08.01.2016 15:57:59]  Info     [SureBackup] [SQLandSP] [ScriptTests] [Console] online : master

2015/11/30

Extending Surebackup with custom scripts : Sharepoint

Often I visit customers and ask them about there restore tests. Most common answer? We test the backups when we do the actual restores. To the question why not test more frequently, the most common answer would be "time and resources".

A couple of months ago, I actually visited a customer that tried to do a restore from backup. It failed, B&R was able to restore the backup but the data inside seemed to be corrupt. The SQL server refused to mount the database. Exploring multiple restore points, this gave the same issue. It was a strange issue because all backup files where consistent (no storage corruption), and the backup job did not have any failed states. The conclusion was Changed Block Tracking corruption. In light of the recent bugs in CBT, I wanted to emphasize again how critical it is to validate your backups. If the customer would have tested his backups with for example, the SQL test script included in v8 , they might have caught the error before the actual restore failed.

This shows another thing I want to highlight. Surebackup is a framework but your "verification" is only as good as your test. By default, Surebackup application tests are just portscans. This tells you that the service has started (it was able to bind to the port and it is answering), but doesn't tell you anything about how good the service is performing. For example, the SQL service / instance could start, but maybe some databases where not able to be mounted inside the instance.

Few people visit this topic, but you can actually extend the framework. The fact that is supports Powershell makes it quite simple to write more extensive test.

So here is a small test for Sharepoint. I hacked it together today, so please reread the whole script to "Surebackup" my coding skills. It is rather basic, but you could use it for any kind of webservice actually. It simply reads the content of a txt file in a certain site. If the content matches a predefined value, you know that
a) The database was mounted inside the instance
b) Sharepoint is able to talk to the instance and query it
c) The webservice is responding to requests

So how do you get started? Well first upload a txt file with some content. In my case, I uploaded file contesttest.txt with the content "sharepoint is working succesfully" as shown below:


You can right click the link and copy it's location. Test if you can really access it this way as shown below


Now get the powershell goodness from https://github.com/tdewin/veeampowershell/blob/master/suresharepoint.ps1 and put it somewhere on the backup server. Now edit the file.



First of all you can see that everything can be passed as a parameter (e.g. commandline call, use -server "ip" to change the ip address). Change the username and plaintext password to the user that will be used to authenticate against sharepoint. Preferably and account with read only rights and not the administrator as in my screenshot, this way you are sure it doesn't break anything ;).

You might wonder, do I need to provide the password in plaintext? No you don't have to actually, you can also follow this procedure but it might make things more complex. Instead of plaintext passwords, you can use Powershell encrypted passwords but understand that if you want to decrypt the password, you need to be the same user as the one that encrypted the password (the whole point of encrypting it, right?). When Surebackup runs, it is actually being ran by the backup service. So the account that is being used to decrypt the password is the service account used to run this service (as shown in the screenshot below)



If this is not Local System account but a service account, you can use the following cmd script to create an encrypted password:
https://github.com/tdewin/veeampowershell/blob/master/encryptedpasstoclip/encryptedpasstoclip.cmd

Change the username in the bat file, run it, give in the password for the service account and finally give in the password for the account you want to use to authenticate to sharepoint. The result should be that an encrypted password is put on your clipboard. replace the whole password statement in the file. for example:
$pass = "01000000d08c9ddf0115d1118c7a00c04fc297eb01000000c9b320ead0059d409978380353923e8000000000020000000000106600000001000020000000b1816dffef13bc70672b55dfcee25a41488d5bb395ae28242b70afeb90938db9000000000e8000000002000020000000bd7da1d0d06893bed8b035c411c34f181b000aa9f0e4f46658eb3efe3e73c06840000000948652774f7f82848ba3065af8193c23fe25b773cea3ecf65957bdc12cdcc71868a82ba11d0475e65b321056a900d0571a05184b89132c0f21452642033c918340000000e8fcabb194c06c78ad01ee2192b73bf7ba799630adfedb6091dc1a629dc9d5a2a6025a64fcf74fe8a89d4a579a54c3538928ee0d22a57f22f6e50da240deaa62"
If you go this far (or you skipped the whole password encryption part because your backup server is a Fort Knox anyway), we can now configure the script. Go to Veeam B&R and configure the test script as shown below in the application group (or in the linked jobs part):



Notice that I also configured "Arguments" as "-server %vm_ip%". This will pass the sandbox IP to the script directly.

Before you actually startup Surebackup, you can test the script with your production environment. If it doesn't work against production environment, it will probably fail against your lab environment. In case you configured an encrypted password with another account, you can temporarily override it with the following command (In case you did not, you can just run as script.ps1 -server )
PS C:\Users\demo2> C:\scripts\suresp.ps1 -server -password (read-host -assecurestring -prompt "pass" | convertfrom-securestring)


Now if everything is green and you got a match, run Surebackup, and validate if you can get the same output in your lab


If it failed, you can actually check the logs for the output the script gave. Go to "%programdata%\Veeam\Backup\". It should contain a folder with your name. In this folder, there should be a log called Job.. You can open it with notepad


Scroll all the way down in the log and look for "[console]"


This should give you the output of the console. In this case, everything was ok!

2015/11/27

Veeam Application Report

As many as you know, although Veeam B&R has an agentless approach it still makes sure that all the applications are consistently flushed just before the backup starts. To do this, Veeam B&R leverages VSS. One thing it also does is, it tries to detect what applications are installed in what VM. This data is collected so that during restore, you don't have to figure out which VM is holding what application and where exactly the application database is stored inside of the VM (for example for Exchange, it will detect the path(s) leading to the EDB(s)).

Now a fellow SE colleague requested to add this "application detection" to the main GUI. They wanted to leverage the detection to sort out with VMs have what application installed. Adding it to the main GUI would however make it more complex but you can actually leverage the data via Powershell.

So here is a sample script you can use as a starting point:
https://raw.githubusercontent.com/tdewin/veeampowershell/master/veeam-per-app-detect.ps1

It generates a nice clean report with all the VMs that have detected applications (yes even Oracle so it is v9 ready), grouped per application. The output should look something like shown in the screenshot below:


Enjoy!

2015/10/09

RPS 0.3.2

Just a small update (which required some re-engineering under the hood).

First of all, when you click the backup files size, you get a small "pop-window". This will tell you the uncompressed and compressed bandwidth usage over multiple interval. It should help you to understand how much processing power you need for a certain amount of input. The first 2 lines are uncompressed in byte and bit. the second two lines is the data 'compressed' in bit an bits. The columns indicate your time window. Notice that clicking on the full file or the incremental file gives different output so you can check full runs vs incremental runs real easily.


The second feature is available when you click the total file size. It will give you a table overview of the output, which you can easily copy it to excel or calc. The numbers are all in GB so they give a predictable output.


The final feature is a very small one but I really like it because it took literally 10minutes to code but will remove some frustration. During a recent conf call with my colleague Johan Huttenga, I noticed he was struggling with inputting 30Tb of data. He needed to calculate it by multiplying 1024 by 30 to get exactly 30TB and not 29,99 TB. So in this version, you can input 30TB, and it will be automatically converted to "30720". Same for 1PB to "1048576". The input is case insesitive so tb,Tb,TB,pb,PB,pB, etc. should all work. For example, fill in 1TB like shown below


As soon as you push enter, tab to another input or click on the simulate button, the input will be dynamically changed.

2015/08/25

Global backup report in Powershell

A lot of partners, try to go that extra step and also manage onsite Veeam backup environments. They mostly want a report with all jobs status from all Veeam backup servers, instead of plowing through 100's of emails sent by multiple backup servers

Enterprise manager allows you to do that. You can add multiple backup servers and it will give you a global overview. However, it also acts like a license manager. So if you have different customers, with different licenses, you can not add them all together in one Enterprise manager.

A way around it would be to create some Restful API integration. That would be the cleanest way to do it in my humble opinion. However if you want to have a quick hack, you can also do it by using Powershell. Just launch a remote session to all those backups servers and collect the data.

Now a lot of people just need a small "sample" script, just to get started. So here is a basic "sample". It is surely not future complete and has very poor error handling. But it can get you started.

So the first part defines the instances. Granted it would be cleaner to take the table, convert it to a csv, and then import it at the beginning of the script. The instances table exists out of objects that define: the customer, the backupserver, the username and then the password in an encrypted form. Not sure how to get the password in an encrypted form? Just use the code at the top to generate what you need. However make sure that the whole password doesn't have any line breaks when you copy past!


Resulting in the pre-created code


After correct copy/pasting and removing line breaks, you should get something like this


If you then run the code, it should connect to all the instances, execute some Veeam PS Code, built a table and then collect it centrally. The end result? A $globaljob table, which you can then use to build a csv report, html report, one big email, etc.. Hope it can be useful to somebody as a starting point!

2015/08/17

Getting the correct input into RPS

In a first article about the Restore Point Simulator (RPS) I talked about the history of RPS and why it was created. I want to take the time now to explain the correct input parameters. Although I added some tool-tips in 0.3.1, people are sometimes confused how it all ties together.

TL;DR? The RPS GUI maps quite directly to the Veeam interface, check the screenshots to see how, if you already understand how Veeam works.

For those that didn't read the previous article, I'll repeat the formula here which we use at Veeam to do rough estimations. Why? Because RPS is directly based on it:
Backup size = C * (F*Data + R*D*Data)
Data = sum of processed VMs size by the specific job (actually used, not provisioned)
C = average compression/dedupe ratio (depends on too many factors, compression and dedupe can be very high, but we use 50% - worst case)
F = number of full backups in retention policy (1, unless backup mode with periodic fulls is used)
R = number of rollbacks (or increments) according to retention policy (14 by default)
D = average amount of VM disk changes between cycles in percent (we use 10% right now, but will change it to 5% in v5 based on feedback... reportedly for most VMs it is just 1-2%, but active Exchange and SQL can be up to 10-20% due to transaction logs activity - so 5% seems to be good average)
This formula and RPS have 3 parameters in common. Data would be the first one and it maps to "Used Size GB". To give you and example, if you would have a VM with one VMDK, the used size would be the amount of blocks the VM has already written to. So for example if you would have a thick provisioned VMDK of 50GB but you only use 20GB inside the guest, the used size would be around 20GB. A more correct definition would be what the VM is actually using when it would be thin provisioned. Because Veeam backs up at the block level, a thin provisioned disk would be exactly what Veeam needs to process during a full backup (and some meta data).

D or delta is the amount of data that changes between backups. In RPS the parameter is called change rate. This is dependent on 2 parameters. First of all the frequency of backups which is in most companies "daily". Secondly the application. Don't put this parameter to low. Veeam backups up at the block level. A small update, can flag a bigger block on the VMDK level than you estimated. If you fill up a disk sequentially like a file server, you won't notice this  so much because 10 sequential changes could be only flagging 1 block. However, if you are doing a lot of small random I/O on various locations, the number can quickly rise. So from my experience,  a number of 5% changes is fairly optimistic while 10% is rather conservative. I personally prefer a more conservative approach.

Finally C or compression which is called "Data left after reduction". I have had tons of people discussing this parameter with me because you can interpret this one in various ways. However look at the formula and it will all make sense. The formula basically says:
Backup Data = C * (Total Data In)
So C is a factor that should make "Total Data In" smaller. So the smaller the number, the smaller the backup. Compression factor is thus a percentage that tells how much data is left after it has done it job. If you define 40% (40/100) and consider a "data in" of 100, the backup data would be 40 = (40/100)*100 . If you define 60%, you actually tell the engine that you expect worse results because the end result would be 60 = (60/100)*100. So the lower the compression value, the better the compression.

For some people this feels counter-intuitive. If you prefer compression factors like 2x or 3x instead, you can easily convert those by dividing 1/(compression factor). So for example 2x would be 1/2 = 50%. If we put that in the previous formula, we would get 50 backup data = 50%*(100 total data in). So data was compressed to 2x. For 3x, you would get 1/3 or 33% =~ 35%. Again if we fill this into the formula you would get 35 = 35%*(100). Use this link if you really want to use 33%

If you want to disable compression, put data reduction to 100%.

In the formula there are 2 parameters left, F & R. Like discussed in the previous article, these one are hard to calculate, and that is exactly what RPS does for you. So for retention points, you should just define your policy. If you need 14 daily restore points, put in 14 for "Retention points" and daily for "Interval". You will see that in some case, you might end up with more restore points than you configured. This can be normal as Veeam considers your retention policy but also dependencies between Fulls and Incrementals. So don't try to adjust the retention points because you feel it miscalculated, instead take a look at the example given in previous blog article.

What also influences the amount of parameters is the style parameter. This actually maps directly to Veeam GUI.  For example, "Backup Copy Job" (BCJ) should be quite easy to understand. It refers directly to a Backup Copy Job . Selecting should it should also show you BCJ specific configurations


For the other "styles" called Incremental and Reverse, the other setting map mainly to the advanced settings of a regular job except for "Retention Points"


If you selected Incremental without any active fulls or synthetic fulls, you would get a forever incremental job. It also maps directly to the GUI but more specifically to the advanced settings of a regular job. Granted, the checkboxes are under the buttons "Days" and "Months". Also there is no global "enable" checkbox for synthetic or active but do understand that checking one of the boxes enables "Active" or "Synthetic". Finally "Monthly" has preference over "Weekly". In the Veeam GUI, you can selected only one, here you can enable checkbox for both but weekly will be ignored when you enabled one Month.


So if you want a weekly synthetic backup, it would be like shown below. I did not implement "Transform", and with good reason. Until the day of today, I have received 0 requests for it.


A Monthly Active Backup would like shown below. Important here is to know this does not defines GFS policies. Those can only be defined in a Backup Copy Job. I once had a guy that checked only January because he want to have a "yearly full backup for archiving". He was quite surprised by the result. Basically you just told the engine that you want to have a yearly chain which only get resets in January.


Finally "Reverse" incremental, can be found in the same place.


That leaves only 1 parameter left, and that is the growth simulation. It is a recent addition to RPS, and personally I think it is one of the coolest things added to it. Let me explain what it does and how it works. If you need to size for the following 3 years, and you know that you have a growth rate of 10% on a yearly basis, you can just add that to the RPS. What it does is, it takes your "used data", and grows it on a daily basis via: Future Used Data = Used Data * ( 1 + 10% ) ^ (Day X/365). Thus on the last days of the simulation -after 3 years-, it would put Future Used Data = Used Data * ( 1 + 10%) ^ (1095/365).

So lets imagine you have chosen reverse incremental, 3 years simulation, 10% year on year data growth and 1000GB of data. For simplicity, let's disable compression. Calculating this manually would give you roughly 1000*(1+10%)^(1095/365) = 1000*(110%)^3 = 1000*(1,10)^3 = 1000*1,10*1,10*1,10 = 1331. Now check the Full backup in the configured example. Also interesting is that the increments from 2 days  ago (retention point 3), has a smaller "Future Used data" set. It used 3y - 2 days in the formula, thus Future Used Data = 1000*(1+10%)^(1093/365) =~ 1330,30.


The differences are small in regular jobs but once you go GFS, the time growth calculation can have a dramatic impact which is hard to calculate with excel.


With this, I have discussed all parameters. Some might ask, what about "Quick Presets", what does that do? Well they are just quickly preconfigured scenario's that you can use. For example, if you want to have a monthly active backup, you can click all 12 select boxes or you can just select the matching style + monthly active full to quickly configure this scenario

If you made it this far, thanks for reading and enjoy playing with RPS.

2015/08/05

A brief history of the Restore Point Simulator

During the development of the restore point simulator, I often have encountered questions from users that led me to believe that it is not always clear how to use the tool and what it can do for you. In this blog article series, I want to take the time and explain you  why RPS was developed in the first place and how you can use it.

In the beginning there was nothing, just our famous formula to calculate repository spaces. I'll quote it here because it is still the main idea behind RPS. Many Veeam SEs had there own excel configuration sheet to quickly spit out some numbers, some more pretty than others.

Backup size = C * (F*Data + R*D*Data)
Data = sum of processed VMs size by the specific job (actually used, not provisioned)
C = average compression/dedupe ratio (depends on too many factors, compression and dedupe can be very high, but we use 50% - worst case)
F = number of full backups in retention policy (1, unless backup mode with periodic fulls is used)
R = number of rollbacks (or increments) according to retention policy (14 by default)
D = average amount of VM disk changes between cycles in percent (we use 10% right now, but will change it to 5% in v5 based on feedback... reportedly for most VMs it is just 1-2%, but active Exchange and SQL can be up to 10-20% due to transaction logs activity - so 5% seems to be good average)

This formula has some difficulties. First of all the (C)ompression ratio and the (D)elta are difficult parameters to estimate but it does give you some hints what we at Veeam use inside and a fairly good explanation why these values are chosen. But more difficult are F and R. These values define how much full backups you will need or how many incrementals you need. With reverse incremental / forever incremental, that is quite easy to calculate, you'll have F = 1 and R = rps - F. 

However when you talk about weekly synthetics or active fulls, the number is rather difficult to calculate. Even Veeam users do not always understand the effect of a certain policy. For example, if you configure a forward incremental with weekly full and 2 restore points (rps), you can expect up to 9 rps, because of dependencies. I had countless discussions with customers arguing that Veeam did (does) not respect their rps policy, when in fact it does its absolute best to respect your policy. If you run the simulation, you can actually see the dependency. In the fist column (called Retention), you will see something like 3 (2) or 4 (2). This means that point 3 or 4 are both kept because point (2) is dependent on it.

Now if you want to excellify this, you can come up with something like F = #Weeks + 1, R = (F*7*#DailyBackups - F).  Imagine 14 rps with daily backup, that would be F = 2+1 =3 and R = 3*7*1 - 3 = 21 - 3 = 18. Well that would be really close to what RPS says, but explaining that to people does take some time and it is not always accurate but more guesstimation.

Another common misconceptions is that a monthly backup would require less space than a weekly backup. While this can be the case, remember that a monthly backup would create a chain of 30 points. If you configure a policy of 14 points in forward incremental with monthly full, the worst case scenario would be 12 days after a second full backup is created. This because you got 12 increments dependent on the current full, but you need to keep the whole previous chain, because the oldest restore point is an increment dependent on a previous full backup and a chain of 30 increments. If you configured weekly full, a chain would be maximally 7 days, so less would be stored. This can exponentially grow when you do for example a backup every 12 hours or even more. However if configure for example 60 restore points, a monthly full backup can be cheaper than a weekly full backup. The more days worth of restore points configured, the more likely a monthly full backup will actually consume less space.

These 2 examples, show exactly why RPS was made. Different customers cases require different approaches. Also, it reconfirms that assumption is the mother of all mistakes. So explaining how retention works without very difficult formula's was actually my main goal when the first edition of RPS was made.

Another example is my new all time favorite that shows why what feels natural is not always reality. Some months ago, a partner thought a forever incremental backup chain of 365 would be more efficient than a GFS policy with 12 fulls.  This even surprised me the first time I ran it because incremental backups feels more lightweight. I remembered from my v7 SE training that GFS should be more efficient, but just running the simulation reconfirms this.

It is true that forever incremental versus weekly full is so much more efficient in terms of disk space savings. However 30 incrementals, quickly add up and for long time retention, a monthly full could be more efficient. There is one caveat. With 365 increments, you do have more granularity than 12 full monthly points in time. However, I do want to remind you that those 12 full backups are completely independent of each other. So a single bit rot corruption would only impact one point, while in a 365 restore point chain, this potentially impacts the whole chain. So I think in the majority of cases the more efficient disk usages and the in-dependency of points is better than a very long chain of increments, but hey, it is up to a company to decide their policy.

Finally, I remember one of the major updates was adding GFS support. Calculating and explaining GFS policies is nearly impossible with Excel. Why? Imagine you configured weekly backups to keeps the restore points of Sunday and you configure monthly backups to keep the restore points of the first Sunday of the month. In this case, the first Sunday of the month backup, could be used to satisfy the weekly backup policy as well as the monthly backup. In fact this is what Veeam does. So if you configure for example, 12 weeklies and 3 monthlies, you would assume that the amount of full is 12+3+1 (1 for the simple retention policy). However this is not the case. If you configured your policy correctly so that weekly and monthly points can coincide (schedule button), you will actually get less points. You can see this common points again in the retention column. "10W 3M 0Q 0Y" means that it represents the 10th weekly point but also the 3th monthly point.

@poulpreben (if you don't follow him on twitter, do it now) and I spend hours discussing how we could calculate this with formula's. We concluded that the only way to actually do it was to emulate what happens inside B&R on a daily basis for some time. In fact that is what RPS does. If you configure a retention policy, it will try to predict a period of time in which the worst case scenario should occur (most data on disk). This is why, when you configure 5 yearly backups, it takes some time to calculate, because it will run over 2000 days trying to simulate the behaviour of B&R.

So TL;DR? Don't just assume, run it through RPS. Be critical with the results of RPS (Software can contain bugs) but also try to understand why something is different than you first estimated.