1 |
I'm playing around with an application that requires me to manage a |
2 |
large (multi-gigabyte to terabyte), bespoke, frequently-updating data |
3 |
structure in real-time... key concerns are for durability and |
4 |
efficiency. While a traditional approach might be to employ an |
5 |
expensive DBMS on expensive hardware... I'm looking to be more |
6 |
innovative. I want to achieve big-iron beating performance on a |
7 |
shoestring budget... and I'm optimistic since the problem domain doesn't |
8 |
translate well to traditional RDBMS approaches. |
9 |
|
10 |
An obvious alternative to a DBMS is to use the file-system directly... |
11 |
in principle this could work - but it would be a laborious process |
12 |
fraught with potential pitfalls with respect to atomicity of updates, |
13 |
transactional recovery (in case of a fail-stop while processing a large |
14 |
update) etc. Another issue is that in order to establish an efficient |
15 |
and reliable implementation, it becomes necessary to second guess |
16 |
details about the implementation of file-systems... this vastly |
17 |
complicates any implementation and might render it unacceptably fragile |
18 |
(subject to unexpected deviations in behaviour as the implementation is |
19 |
moved between hardware/OS-versions etc. |
20 |
|
21 |
I've recently discovered that SSDs are becoming more affordable... and |
22 |
this might present new options. There were major hurdles in attempting |
23 |
to establish a strategy to interact with hard-disk block devices... |
24 |
including, but not limited to, a significant difficulty in establishing |
25 |
the extent to which locality of reference affected performance. Another |
26 |
worry was that it might be difficult to establish that a write had |
27 |
actually completed (i.e. the data reliably and durably stored - not just |
28 |
that the responsibility for recording the data was now exclusively with |
29 |
the drive.) My hope is that SSD technology simplifies some of these |
30 |
concerns - allowing a clear model for access performance that should |
31 |
allow an efficient and reliable implementation. |
32 |
|
33 |
I'd like to hear about anyone who has experience with configuring SSDs |
34 |
for use with (Gentoo) Linux - and especially from anyone who's |
35 |
investigated performance issues. I've read that SSDs typically have a |
36 |
64Kib block size... this would work fine for me (though I understand |
37 |
that it is a significant impediment for high performance with existing |
38 |
file systems. I'd be interested to know if anyone has done performance |
39 |
analysis of SSDs at the device level under Linux... and am intrigued if |
40 |
there is more to interacting with them than establishing the block size |
41 |
from manufacturer data - then reading/writing appropriately many bytes |
42 |
from block devices... and/or flushing appropriately aligned and sized |
43 |
blocks of memory mapped data. For example, is there an interface to |
44 |
quiz an SSD about its block-size? I'm intrigued to establish if I can |
45 |
rely upon my data being durably stored on an SSD when a flush/write returns. |
46 |
|
47 |
In a practical sense, I'd like to experiment with some SSD hardware - |
48 |
but there seems to be a lot to chose from. For development purposes, |
49 |
I'd not need more than, say, 32GB - and I'm not all that fussed about |
50 |
absolute performance - as long as the relative performance of various |
51 |
interactions will increase proportionally were I to move to more |
52 |
expensive SSDs in future. I'm interested to establish any practical |
53 |
anecdotes (or hard statistical data) about the relative merits of |
54 |
various interfaces for SSDs - and to establish if RAID needs to be taken |
55 |
into account when establishing a performance model. |
56 |
|
57 |
Any feedback would be appreciated... especially from any gentooist who |
58 |
is interested in SSD performance/reliability/configuration. |