Domino and Disk Fragmentation

Domino is by nature an I/O intensive application. While R8 drastically improved I/O it is still a lot of writing going on. A lot of Domino servers run on Windows and use NTFS as file system. NTFS isn't exactly designed to run large ever (size) changing files (this is why a RDBMS usually pre allocates the table space to be used - and it is done on new machines). Now just imagine that you would need to ask your admin to pre allocate your 2GB mail file quota for every user (your disk sales guy would send you to heaven for that) regardless of current need. So an NSF reflects the actual size. It's kind being in-between a rock and a hard place. Enter performance improvement strategies. An old myth is that compact -C will improve performance. While it makes the NSF neat and tidy it splatters the file segments all over the place. Adam Osborn explains it nicely in an blog entry. There is a video about it. So what should you do:

Learn! Know the Domino Server Performance Troubleshooting Cookbook inside out. Watch Andrew's presentation on Domino performance. Ask Auntie Google
Move your temporary filed onto another disk (I would love to hear how that works if you use a RAM disk or a solid state disk). It has 2 effects: I/O is better distributed and the temporary files don't contribute to the data disk's fragmentation. You need to set two notes.ini parameter: NOTES_TEMPDIR and View_Rebuild_Dir
Build a high performance Domino Server (remember RAID10 looks good) and tune it well
If Windows is your server operating system: defrag, defrag, defrag! There is an excellent presentation (don't mind the fonts) by Albert Buendia of the Spanish Lotus User Group SLUG about it. Realistically you have two choices. The free (and "find-you-own-support") DominoDefrag on OpenNTF (managed by Andrew Luder, Australia) and the commercial DefragNSF from the Australian IBM Business Partner Preemptive Consulting (run by Adam Osborne, Australia) Defrag.exe on Windows isn't an option for executing (only analysing) on your 24x7 box.

It would be interesting to see a fragmentation shootout between NTFS, JFS and EXT4 (anybody listening @ developerworks?)

Posted by Stephan H Wissel on 12 January 2010 | Comments (5) | categories: Show-N-Tell Thursday

posted by Philip Storry on Tuesday 12 January 2010 AD:
A defragmentation shootout between NTFS/JFS/EXT4 would be difficult.

Mostly because the tools for EXT4 defragmentation are still in dveelopment, and whilst JFS architecturally supports defragmentation the Linux implementation never ported the tools for it - so unless you can mount the drive with an OS/2 system, it's not really going to happen.

However, I'd probably put my money on EXT4 fragmenting the least, JFS sitting in the middle, and NTFS being worst.

The delayed allocation in EXT4 allows growing files to be better allocated space. I don't think JFS does delayed allocation, but it is fairly smart about how it allocates space anyway (it has to be as it supports spares and dense files).

One thing that's worth noting is that application usage - and writing your application well - is the best way to avoid fragmentation. For example, if your application is a web browser and downloads a 100Mb file, then creating the temporary file before starting the download and telling the filesystem it'll be 100Mb large will help it avoid fragmentation as it allocates space.
If you create a sparse file on NTFS like that and then begin your download, it doesn't fragment much. But if you just create a new file and keep growing it, it will probably fragment.
Worse, create two new sparse files and begin each download concurrently, and they'll be minimally fragmented. But create two normal files and begin the download concurrently, and NTFS will usually end up "intertwining" the two files, generating almost as many fragments as there are blocks allocated.

Given that, the best way to avoid defragmentation for Domino would be a variant of Nathan's suggestion here: { Link }
Except that you don't allocate the space, you pre-allocate as a sparse file. Make it an option that requires a -c compact, and we're done.
Actually, make it an option on templates too. "New DB minimum sparse file size". Then it'd be really rocking.

Sadly, sparse files still fragment - they're not much more than a warning to the allocator. Nathan's idea of actual minimum sizes might be better, depending on the filesystem. We'd have to experiment...
Emoticon sad.gif

posted by Albert Buendia on Tuesday 12 January 2010 AD:
Australia is the key!

Emoticon biggrin.gif

posted by Stephan H. Wissel on Tuesday 12 January 2010 AD:
@Philip Thx for the elaborate comment. Actually I wasn't thinking about JFS on Linux, but on AIX where JFS is the default file system and AFAIK the defrag tools are available.
I concur with you that the sparse file option would be good specifically for shared apps. (Someone in Australia listening?)
Emoticon smile.gif

stw

posted by Philip Storry on Thursday 14 January 2010 AD:
Ah, yes, AIX.

Kind of silly of me to forget that you could test it on AIX! In my defence, I only have access to Linux and Windows boxes... Emoticon wink.gif

posted by Nico57 on Tuesday 01 February 2011 AD:
The problem really lies in Domino's utterly stupid disk allocator.

Whenever an NSF file needs more storage, its size is extended by... 64KB!
Yeah, that's it, really, *** sixty four kilobytes ***!
Of course if it needs more than this, it will get 64KB more, and again, and again, until there's enough space to fit your new document of view index in.

The problem is, allocating 64KB+64KB+64KB... 100 times, is really not the same as allocating 6400KB.
In one case, you're asking for one hundred small disk blocks, while in the other you're asking for just one large block.
Being stupid vs asking for what you need.
And you get what you ask for: the filesystem itself can really do that much about guessing what the application intends to do with allocated space.
(Experience seems to show Domino even does a file sync every 64KB, in order to maximize its fragmentation odds even more.)

Of course it gets even worse when you're writing a lot of small documents, or to a lot of databases concurrently, or both, like, uh, on a mail server, which is probably what Domino still is most used for today.
As a wild guess, I'd say my Domino mail servers allocate generate a new fragment every 5-10 new mail arrives, and I could still be optimistic about it.

So 64KB is Domino's basic "chunk" size, and it's been so for about 15 years.
During that time, disk storage capacity has grown 1000-fold.
Domino chunk size: x 1.
Disk storage: x 1000.
What else needs to be said?

Unfortunately, a lot of smart people don't get a grasp at disk fragmentation problems, there's a lot of misinformation on the topic, and it's not something easy to explain to others either.
I've raised the issue a number of times, on the Domino forums, or at face to face meetings with French IBMers, but nothing ever came out. Not being a native English speaker does not help. :(

Most of Domino's disk fragmentation problems could be fixed spending 10min in the source code.
Just replace 64KB, with, say, 4MB.
Suddenly, your servers generate 20-50 times less file fragments a day, I/O increases dramatically, and memory caches get some fresh new air!
That would be the single largest enhancement to Domino performance in its whole existence!
There's probably more to be done, e.g. we still need a smart compact option that does in-place compaction, but still disallocates free space when there's too much of it, while keeping some of it, just a few MB.
But still Domino performance would get to levels never seen before.
Or we can go on spending time and energy working around this for years to come.

Well, anyway. Here's my experienced sysadmin advice to developpers: as a rule of thumb, if your application has disk fragmentation problems, then it's doing something stupid with disk allocation in the first place.

(Also, I've yet to see a disk defragmenter that does a decent job at working with large files, even though that's the easiest kind of files to deal with!
Almost every single defrag program I know of will try hard to lay each of your file as a single block of data.
That's total nonsense! Reading 100x 100MB fragments is in no way slower than reading a single 10GB block.
Suppose your perfectly defragmented NSF file is layed out as a single "fragment" at the beginning of the day, and you have a few dozen of them at the end of the day when Domino has trashed it all over.
The right thing to do would be to leave the very large first fragment alone and pack all the remaining sub-100MB frags as a single block. This's not THAT smart, this is just not being stupid. Still, no defragmenter on earth will do that.)