From: Michael <confabulate@kintzios.com>
To: gentoo-user@lists.gentoo.org
Subject: Re: [gentoo-user] Seagate hard drives with dual actuators.
Date: Sat, 16 Nov 2024 19:47:02 +0000 [thread overview]
Message-ID: <1836185.3VsfAaAtOV@rogueboard> (raw)
In-Reply-To: <CAGfcS_nEcdt6vcGWWmU-pT4rneJtALi9bNSD2OK5dt8kh-WB+Q@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 6391 bytes --]
On Saturday 16 November 2024 14:36:02 GMT Rich Freeman wrote:
> On Sat, Nov 16, 2024 at 6:02 AM Michael <confabulate@kintzios.com> wrote:
> > I assume (simplistically) with DM-SMRs the
> > discard-garbage collection is managed wholly by the onboard drive
> > controller, while with HM-SMRs the OS will signal the drive to start
> > trimming when the workload is low in order to distribute the timing
> > overheads to the system's idle time.
>
> I'll admit I haven't looked into the details as I have no need for SMR
> and there aren't any good FOSS solutions for using it that I'm aware
> of (just a few that might be slightly less terrible). However, this
> doesn't seem correct for two reasons:
>
> First, I'm not sure why HM-SMR would even need a discard function.
> The discard command is used to tell a drive that a block is safe to
> overwrite without preservation. A host-managed SMR drive doesn't need
> to know what data is disposable and what data is not. It simply needs
> to write data when the host instructs it to do so, destroying other
> data in the process, and it is the host's job to not destroy anything
> it cares about. If a write requires a prior read, then the host needs
> to first do the read, then adjust the written data appropriately so
> that nothing is lost.
As I understand it from reading various articles, the constraint of having to
write sequentially a whole band when a random block changes within a band
works the same on both HM-SMR and the more common DM-SMR. What differs with
HM-SMR instructions is the host is meant to take over the management of random
writes and submit these as sequential whole band streams to the drive to be
committed without a read-modify-write penalty. I suppose for the host to have
to read the whole band first from the drive, modify it and then submit it to
the drive to write it as a whole band will be faster than letting the drive
manage this operation internally and getting its internal cache full. This
will not absolve the drive firmware from having to manage its own trim
operations and the impact metadata changes could have on the drive, but some
timing optimisation is perhaps reasonable. I can't recall where I read this
bit - perhaps some presentation on XFS or ext4 - not sure.
> Second, there is no reason that any drive of any kind (SMR or SSD)
> NEEDS to do discard/trim operations when the drive is idle, because
> discard/trim is entirely a metadata operation that doesn't require IO
> with the drive data itself. Now, some drives might CHOOSE to
> implement it that way, but they don't have to. On an SSD, a discard
> command does not mean that the drive needs to erase or move any data
> at all. It just means that if there is a subsequent erase that would
> impact that block, it isn't necessary to first read the data and
> re-write it afterwards. A discard could be implemented entirely in
> non-volatile metadata storage, such as with a bitmap. For a DM-SMR
> using flash for this purpose would make a lot of sense - you wouldn't
> need much of it.
I don't know if SMRs use flash to record their STL status and data allocation
between their persistent cache and shingled storage space. I would think yes,
or at least they ought to. Without metadata written to different media, for
such a small random write to take place atomically a whole SMR band will be
read, modified in memory, written to a new temporary location and finally
overwrite the original SMR band.
> This is probably why you have endless arguing online about whether
> discard/trim is helpful for SSDs. It completely depends on how the
> drive implements the command. The drives I've owned can discard
> blocks without any impact on IO, but I've heard some have a terrible
> impact on IO. It is just like how you can complete the same sort
> operation in seconds or hours depending on how dumb your sorting
> algorithm is.
I have an old OCZ which would increase IO latency to many seconds if not
minutes whenever trim was running, to the point where users started
complaining I had 'broken' their PC. As if I would do such a thing. LOL!
Never mind trying to write anything, reading from the disk would take ages and
the drive IO LED on the case stayed on for many minutes while TRIM was
running. I reformatted with btrfs, overprovisioned enough spare capacity and
reduced the cron job for trim to once a month, which stopped them complaining.
I don't know if the firmware was trying to write zeros to the drive
deterministically, instead of just de-allocating the trimmed blocks.
> In any case, to really take advantage of SMR the OS needs to
> understand exactly how to structure its writes so as to not take a
> penalty, and that requires information about the implementation of the
> storage that isn't visible in a DM-SMR.
Yes, I think all the OS can do is seek to minimise random writes and from what
I read a SMR-friendlier fs will try to do this.
> Sure, some designs will do
> better on SMR even without this information, but I don't think they'll
> ever be all that efficient. It is no different from putting f2fs on a
> flash drive with a brain-dead discard implementation - even if the OS
> does all its discards in nice consolidated contiguous operations it
> doesn't mean that the drive will handle that in milliseconds instead
> of just blocking all IO for an hour - sure, the drive COULD do the
> operation quickly, but that doesn't mean that the firmware designers
> didn't just ignore the simplest use case in favor of just optimizing
> around the assumption that NTFS is the only filesystem in the world.
For all I know consumer grade USB sticks with their cheap controller chips use
no wear levelling at all:
https://support-en.sandisk.com/app/answers/detailweb/a_id/25185/~/learn-about-trim-support-for-usb-flash%2C-memory-cards%2C-and-ssd-on-windows-and
Consequently, all flash friendly fs can do is perhaps compress and write in
batched mode to minimise write ops.
I can see where an SMR drive would be a suitable solution for storing media
files, but I don't know if the shingled bands would cause leakage due to their
proximity and eventually start losing data. I haven't seen any reliability
reports on this technology.
[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
next prev parent reply other threads:[~2024-11-16 19:47 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-11-13 23:10 [gentoo-user] Seagate hard drives with dual actuators Dale
2024-11-14 0:46 ` Matt Jolly
2024-11-14 13:05 ` Dale
2024-11-14 7:55 ` Wols Lists
2024-11-14 16:48 ` Dale
2024-11-15 0:18 ` [OT] " Peter Humphrey
2024-11-15 8:41 ` [gentoo-user] Hollerith (was: Seagate hard drives with dual actuators) karl
2024-11-15 9:51 ` [OT] Re: [gentoo-user] Seagate hard drives with dual actuators Wols Lists
2024-11-14 11:21 ` Michael
2024-11-14 17:00 ` Dale
2024-11-14 19:12 ` Michael
2024-11-14 19:51 ` Frank Steinmetzger
2024-11-14 19:55 ` Frank Steinmetzger
2024-11-14 23:14 ` Peter Humphrey
2024-11-14 20:33 ` Dale
2024-11-14 20:57 ` Rich Freeman
2024-11-14 23:10 ` Dale
2024-11-15 0:59 ` Rich Freeman
2024-11-15 5:53 ` Dale
2024-11-15 10:09 ` Michael
2024-11-15 11:59 ` Dale
2024-11-15 15:35 ` Michael
2024-11-15 16:36 ` Dale
2024-11-15 22:13 ` Rich Freeman
2024-11-16 11:02 ` Michael
2024-11-16 14:36 ` Rich Freeman
2024-11-16 19:47 ` Michael [this message]
2024-11-16 20:13 ` Rich Freeman
2024-11-16 23:21 ` Wol
2024-11-17 11:22 ` Michael
2024-11-17 21:26 ` Rich Freeman
2024-11-17 23:04 ` Jack
2024-11-18 0:23 ` Rich Freeman
2024-11-18 2:32 ` Matt Jolly
2024-11-15 10:38 ` Frank Steinmetzger
2024-11-15 12:19 ` Dale
2024-11-14 22:38 ` Wols Lists
2024-11-15 9:35 ` Michael
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1836185.3VsfAaAtOV@rogueboard \
--to=confabulate@kintzios.com \
--cc=gentoo-user@lists.gentoo.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox