Gentoo Archives: gentoo-user

From: Steve <gentoo_sjh@×××××××.uk>
To: gentoo-user@l.g.o
Subject: [gentoo-user] Solid state disks...
Date: Sun, 22 Feb 2009 13:48:40
Message-Id: 49A157BA.6010903@shic.co.uk
I'm playing around with an application that requires me to manage a 
large (multi-gigabyte to terabyte), bespoke, frequently-updating data 
structure in real-time... key concerns are for durability and 
efficiency.  While a traditional approach might be to employ an 
expensive DBMS on expensive hardware... I'm looking to be more 
innovative.  I want to achieve big-iron beating performance on a 
shoestring budget... and I'm optimistic since the problem domain doesn't 
translate well to traditional RDBMS approaches.

An obvious alternative to a DBMS is to use the file-system directly... 
in principle this could work - but it would be a laborious process 
fraught with potential pitfalls with respect to atomicity of updates, 
transactional recovery (in case of a fail-stop while processing a large 
update) etc.  Another issue is that in order to establish an efficient 
and reliable implementation, it becomes necessary to second guess 
details about the implementation of file-systems... this vastly 
complicates any implementation and might render it unacceptably fragile 
(subject to unexpected deviations in behaviour as the implementation is 
moved between hardware/OS-versions etc.

I've recently discovered that SSDs are becoming more affordable... and 
this might present new options.  There were major hurdles in attempting 
to establish a strategy to interact with hard-disk block devices... 
including, but not limited to, a significant difficulty in establishing 
the extent to which locality of reference affected performance.  Another 
worry was that it might be difficult to establish that a write had 
actually completed (i.e. the data reliably and durably stored - not just 
that the responsibility for recording the data was now exclusively with 
the drive.)  My hope is that SSD technology simplifies some of these 
concerns - allowing a clear model for access performance that should 
allow an efficient and reliable implementation.

I'd like to hear about anyone who has experience with configuring SSDs 
for use with (Gentoo) Linux - and especially from anyone who's 
investigated performance issues.  I've read that SSDs typically have a 
64Kib block size... this would work fine for me (though I understand 
that it is a significant impediment for high performance with existing 
file systems.  I'd be interested to know if anyone has done performance 
analysis of SSDs at the device level under Linux... and am intrigued if 
there is more to interacting with them than establishing the block size 
from manufacturer data - then reading/writing appropriately many bytes 
from block devices... and/or flushing appropriately aligned and sized 
blocks of memory mapped data.  For example, is there an interface to 
quiz an SSD about its block-size?  I'm intrigued to establish if I can 
rely upon my data being durably stored on an SSD when a flush/write returns.

In a practical sense, I'd like to experiment with some SSD hardware - 
but there seems to be a lot to chose from.  For development purposes, 
I'd not need more than, say, 32GB - and I'm not all that fussed about 
absolute performance - as long as the relative performance of various 
interactions will increase proportionally were I to move to more 
expensive SSDs in future.  I'm interested to establish any practical 
anecdotes (or hard statistical data) about the relative merits of 
various interfaces for SSDs - and to establish if RAID needs to be taken 
into account when establishing a performance model.

Any feedback would be appreciated... especially from any gentooist who 
is interested in SSD performance/reliability/configuration.