2017/11/23

RPS Workspace

Many people have been using http://rps.dewin.me and honestly it gives me great pleasure that people like it and use it so often. I tried to make the tool as straightforward as possible but one thing people do not seem to understand is the line "Work Space". So on a regular basis I get the question, what the hell is "Workspace" and how is it calculated.

In the early days of RPS, it didn't have this Workline space. However, during some discussions, some fellow SE's where concerned that there was no buffer space for:

  • Occasionally running a manual full
  • Not filling the Filesystem for 100% cause that is just not best practice
  • Space that is used during the backup process itself


So the fist two ones, I hope, are pretty clear. The second one is not always clear. So imagine that you are running a forever incremental. You configured 3 points, and that is what you will get after the backup is done. However, during the backup, the first thing that happens is that an incremental point is created. After the incremental backup is done, the merge process happens. However, that also means that during that "working period", you actually have 4 restore points on disk (1 full + 3 incrementals). Thus you need to have some extra space.

That hopefully explains the why. Now the how. This one is a bit more complicated. The initial workspace was pretty simple, take a full backup additionally. While this is great in smaller environments we pretty soon came to the conclusion that if you have 200TB of "full data" (all fulls together), you probably do not need 200TB of workspace. Especially because typically there is not one humongous job that covers the complete environment. Probably you have split up the configuration in a couple of jobs and those jobs are probably not running all at the same exact time.

So the workspace has some kind of bucket system where the first bucket has a higher rate then the last one. Once the first bucket is filled, it overflows to the next one. This means that the workspace does not grow lineair with the amount of used space.

Here are the buckets themselves:
0-10 TB = source data will be compressed and then multiplied with a factor of 1.05
10-20 TB = source data will be compressed and then multiplied with a factor of 0.66
20 - 100 TB = source data will be compressed and then multiplied with a factor of 0.4
100 - 500 TB = source data will be compressed and then multiplied with a factor of 0.25
500 TB+ = source data will be compressed and then multiplied with a factor of 0.10

Let me give you some examples. If you have 5TB of source data, that 5TB will fit exactly in the first bucket. Thus the calculation is rather easy. If you use a compression factor of 50% (the default), you will get:
5TB x 50/100 x 1.05 =~ 2.6 TB Workspace

If you have a source data of 50TB however, it does not fit in the first bucket. It has to split the data over 3 buckets. The first 10TB in the first bucket, the next 10TB in the second bucket and the last 30TB in the third bucket. Thus the calculation would be roughly:
10 TB x 50/100 x 1.05 + 10 TB x 50/100 x 0.66 + 30 TB x 50/100 x 0.4 =~ 5 + 3 + 6 = 14TB Workspace

You can verify that here:
http://rps.dewin.me/?m=1&s=51200&r=14&c=50&d=10&i=D&dgr=10&dgy=1&dg=0&e

Finally if you have a big customer or you are a big customer and you have 500TB. You will see a split of 10,10,80,400. Thus the calculation would be:
10 TB x 50/100 x 1.05 + 10 TB x 50/100 x 0.66 + 80 TB x 50/100 x 0.4 + 400 TB x 50/100 x 0.25 =~ 5 + 3 + 16 + 50 = 74TB Workspace

You can verify that here:
http://rps.dewin.me/?m=1&s=512000&r=14&c=50&d=10&i=D&dgr=10&dgy=1&dg=0&e

So instead of saying that with 500TB, you will need 250TB of workspace, it is drastically lowered to 74TB. And again, that makes sense, the environment will be split up in multiple jobs, so those will not be running all at the same time and you will probably not run an active full on all of them at the same time.

For those want to play with it and see those buckets in action, I created a small jsfiddle here:
https://jsfiddle.net/btyzvxen/2/

Just change the workspaceTB and click run to update the output