Post by Stephen Foskett (thank you)
The next version of Microsoft Windows Server includes integrated data deduplication technology. Microsoft is positioning this as a boon for server virtualization and claims it has very little performance impact. But how exactly does Microsoft’s de-duplication technology work?
Introducing Windows 8 Deduplication
Let’s make one thing clear right from the start: Microsoft started from a clean sheet and invented their own deduplication technology. This is not a licensed, cloned, or copied feature as far as I can tell. There are some clever aspects to it, along with a few head scratchers for folks like me who’ve seen lots of different deduplication approaches.
Microsoft’s deduplication is layered onto NTFS in Windows 8, and will be a feature add-on for Server users. It is implemented as a filter driver on a per volume basis, with each volume a complete, self describing unit. It is cluster aware, and fully crash consistent on all operations. This is a pretty neat trick: As is typical for Microsoft, deduplication will be a simple, transparent feature.
Now let’s talk for a moment about what Windows 8 deduplication is not.
- It is a server-only feature, like so many of Microsoft’s storage developments. But perhaps we might see it deployed in low-end or home servers in the future.
- It is not supported on boot or system volumes.
- Although it should work just fine on removable drives, deduplication requires NTFSso you can forget about FAT or exFAT. And of course the connected system must be running a server edition of Windows 8.
- Although deduplication does not work with clustered shared volumes, it is supported in Hyper-V configurations that do not use CSV.
- Finally, deduplication does not function on encrypted files, files with extended attributes, tiny (less than 64 kB) files, or re-parse points.
Read on here for the technical details