Search This Blog

Tuesday, July 24, 2007

Staging folders

Do not mix staging folders and prestaging. They are completely different things!

I got the below from my friend Thomas Bittner, it was written by his team for the APS (All Purpose Server) guide.

Staging folders are used to isolate the files from the changes on the file system, and amortize the cost of compression and computing RDC hashes across multiple partners.
Here is some background on current staging space management. There are three values that are important to staging space management.

· Staging size in MB (configured per-replicated folder in AD)
· Staging low watermark percentage (configured per-server via WMI, applies to all replicated folders on the server)
· Staging high watermark percentage (configured per-server via WMI, applies to all replicated folders on the server)

DFS Replication will do roughly the following when trying to stage a file:
· Request a reservation for staging space for the file based on an estimate of the file size.
· If the currently used staging space is less than the configured staging size, the file is allowed to stage regardless of the reservation amount. This allows large files to replicate and not get stuck with the familiar “huge file” replication blocker on FRS. The reservation amount is accounted for in the used staging space.
· After staging completes, DFS Replication fixes up the reservation amount by using the actual used amount. Note that due to compression, there could been different file sizes.
· If the used staging space is higher than the high watermark, staging space cleanup is triggered. Staging space cleanup will clean up until it hits the low watermark or there are no more files that are candidates for cleanup i.e., all files in staging are actively being used. Note that the cleanup is on a per replicated folder scope.

There are several factors that affect the size of staging. Without going into theories, here are some rules of thumb:
· It is desirable to set the staging folder to be as large as possible (as available space) and comparable to the size of the replicated folder. Hence if the size of the replicated folder is 24.5 GB, then ideally a staging folder of comparable size is desirable. Note that this amortizes the cost of staging and hash calculation over all connections. It is also a best practice to locate the staging folder on a different spindle to prevent disk contention.
· If staging cannot be set comparable to the size of the replicated folder, then reduce the size by 20%. Depending on how well the data compresses, staging files will be 30-50% of the original file size.
· Note that the mentioned two recommendations are particularly important if all the data is preexisting and DFS Replication must process all content at the same time during initial replication. On the other hand, if the replicated folder is relatively empty and gradually grows over time, the recommendation is to determine the projected size of the replicated folder and size the staging appropriately.
· If the size of the staging folder cannot be set proportional to the size of the replicated folder, then increase the size of the staging folder to be equal to the five largest files in the replicated folder.

No comments: