1 |
On Fri, May 22, 2020 at 12:47 PM antlists <antlists@××××××××××××.uk> wrote: |
2 |
> |
3 |
> What puzzles me (or rather, it doesn't, it's just cost cutting), is why |
4 |
> you need a *dedicated* cache zone anyway. |
5 |
> |
6 |
> Stick a left-shift register between the LBA track and the hard drive, |
7 |
> and by switching this on you write to tracks 2,4,6,8,10... and it's a |
8 |
> CMR zone. Switch the register off and it's an SMR zone writing to all |
9 |
> tracks. |
10 |
|
11 |
Disclaimer: I'm not a filesystem/DB design expert. |
12 |
|
13 |
Well, I'm sure the zones aren't just 2 tracks wide, but that is worked |
14 |
around easily enough. I don't see what this gets you though. If |
15 |
you're doing sequential writes you can do them anywhere as long as |
16 |
you're doing them sequentially within any particular SMR zone. If |
17 |
you're overwriting data then it doesn't matter how you've mapped them |
18 |
with a static mapping like this, you're still going to end up with |
19 |
writes landing in the middle of an SMR zone. |
20 |
|
21 |
> The other thing is, why can't you just stream writes to a SMR zone, |
22 |
> especially if we try and localise writes so lets say all LBAs in Gig 1 |
23 |
> go to the same zone ... okay - if we run out of zones to re-shingle to, |
24 |
> then the drive is going to grind to a halt, but it will be much less |
25 |
> likely to crash into that barrier in the first place. |
26 |
|
27 |
I'm not 100% following you, but if you're suggesting remapping all |
28 |
blocks so that all writes are always sequential, like some kind of |
29 |
log-based filesystem, your biggest problem here is going to be |
30 |
metadata. Blocks logically are only 512 bytes, so there are a LOT of |
31 |
them. You can't just freely remap them all because then you're going |
32 |
to end up with more metadata than data. |
33 |
|
34 |
I'm sure they are doing something like that within the cache area, |
35 |
which is fine for short bursts of writes, but at some point you need |
36 |
to restructure that data so that blocks are contiguous or otherwise |
37 |
following some kind of pattern so that you don't have to literally |
38 |
remap every single block. Now, they could still reside in different |
39 |
locations, so maybe some sequential group of blocks are remapped, but |
40 |
if you have a write to one block in the middle of a group you need to |
41 |
still read/rewrite all those blocks somewhere. Maybe you could use a |
42 |
COW-like mechanism like zfs to reduce this somewhat, but you still |
43 |
need to manage blocks in larger groups so that you don't have a ton of |
44 |
metadata. |
45 |
|
46 |
With host-managed SMR this is much less of a problem because the host |
47 |
can use extents/etc to reduce the metadata, because the host already |
48 |
needs to map all this stuff into larger structures like |
49 |
files/records/etc. The host is already trying to avoid having to |
50 |
track individual blocks, so it is counterproductive to re-introduce |
51 |
that problem at the block layer. |
52 |
|
53 |
Really the simplest host-managed SMR solution is something like f2fs |
54 |
or some other log-based filesystem that ensures all writes to the disk |
55 |
are sequential. Downside to flash-based filesystems is that they can |
56 |
disregard fragmentation on flash, but you can't disregard that for an |
57 |
SMR drive because random disk performance is terrible. |
58 |
|
59 |
> Even better, if we have two independent heads, we could presumably |
60 |
> stream updates using one head, and re-shingle with the other. But that's |
61 |
> more cost ... |
62 |
|
63 |
Well, sure, or if you're doing things host-managed then you stick the |
64 |
journal on an SSD and then do the writes to the SMR drive |
65 |
opportunistically. You're basically describing a system where you |
66 |
have independent drives for the journal and the data areas. Adding an |
67 |
extra head on a disk (or just having two disks) greatly improves |
68 |
performance, especially if you're alternating between two regions |
69 |
constantly. |
70 |
|
71 |
-- |
72 |
Rich |