In a few older applications that rely on file system-based databases, like old Usenet news spool servers for instance, it was a good idea to match the storage volume's cluster size to the smallest file size if possible. For that old Usenet server, 1 KB clusters were good for those less-than 1 KB posts. This made the best use of the space, though it made the file system hellacious to manage when things went wrong.
Now fast-forward to today. I have a new 20 TB volume to put a dedupe store on, and NTFS on Server 2008 R2 has a minimum cluster size of 8 KB for a volume that size. I can make the clusters larger though. Is there any benefit to matching the dedupe store volume's cluster size to the data stream size to maximize data deduplication and minimize storage waste? For instance, choose a cluster size of 32 KB and data stream size of 32 KB?
(By the way, there's no OS selection for Server 2008 or 2008 R2. It says "2008) not applicable."
Solved! Go to Solution.
The volume is on a FC SAN so stripe size isn't an option I can control. I can change the cluster size though, when I format the volume.
The tech note didn't mention data stream size however. If the cluster size is going to be 64 KB, would it make sense to change the stream size to match?
I've always wondered how large a deduplicated block is, or if that was configurable at all.