From: Frank Steinmetzger <Warp_7@gmx.de>
To: gentoo-user@lists.gentoo.org
Subject: Re: [gentoo-user] Re: Package compile failures with "internal compiler error: Segmentation fault".
Date: Fri, 6 Sep 2024 23:41:33 +0200 [thread overview]
Message-ID: <Ztt3DeXRicmN1iCF@tp> (raw)
In-Reply-To: <15276605.tv2OnDr8pf@rogueboard>
[-- Attachment #1: Type: text/plain, Size: 5734 bytes --]
Am Fri, Sep 06, 2024 at 01:21:20PM +0100 schrieb Michael:
> > > find path-to-directory/ -type f | xargs md5sum > digest.log
> > >
> > > then to compare with a backup of the same directory you could run:
> > >
> > > md5sum -c digest.log | grep FAILED
I had a quick look at the manpage: with md5sum --quiet you can omit the grep
part.
> > > Someone more knowledgeable should be able to knock out some clever python
> > > script to do the same at speed.
And that is exactly what I have written for myself over the last 11 years. I
call it dh (short for dirhash). As I described in the previous mail, I use
it to create one hash files per directory. But it also supports one hash
file per data file and – a rather new feature – one hash file at the root of
a tree. Have a look here: https://github.com/felf/dh
Clone the repo or simply download the one file and put it into your path.
> > I'll be honest here, on two points. I'd really like to be able to do
> > this but I have no idea where to or how to even start. My setup for
> > series type videos. In a parent directory, where I'd like a tool to
> > start, is about 600 directories. On a few occasions, there is another
> > directory inside that one. That directory under the parent is the name
> > of the series.
In its default, my tool ignores directories which have subdirectories. It
only hashes files in dirs that have no subdirs (leaves in the tree). But
this can be overridden with the -f option.
My tool also has an option to skip a number of directories and to process
only a certain number of directories.
> > Sometimes I have a sub directory that has temp files;
> > new files I have yet to rename, considering replacing in the main series
> > directory etc. I wouldn't mind having a file with a checksum for each
> > video in the top directory, and even one in the sub directory. As a
> > example.
> >
> > TV_Series/
> >
> > ├── 77 Sunset Strip (1958)
> > │ └── torrent
> > ├── Adam-12 (1968)
> > ├── Airwolf (1984)
So with my tool you would do
$ dh -f -F all TV_Series
`-F all` causes a checksum file to be created for each data file.
> > What
> > I'd like, a program that would generate checksums for each file under
> > say 77 Sunset and it could skip or include the directory under it.
Unfortunately I don’t have a skip feature yet that skips specific
directories. I could add a feature that looks for a marker file and then
skips that directory (and its subdirs).
> > Might be best if I could switch it on or off. Obviously, I may not want
> > to do this for my whole system. I'd like to be able to target
> > directories. I have another large directory, lets say not a series but
> > sometimes has remakes, that I'd also like to do. It is kinda set up
> > like the above, parent directory with a directory underneath and on
> > occasion one more under that.
>
> As an example, let's assume you have the following fs tree:
>
> VIDEO
> ├──TV_Series/
> | ├── 77 Sunset Strip (1958)
> | │ └── torrent
> | ├── Adam-12 (1968)
> | ├── Airwolf (1984)
> |
> ├──Documentaries
> ├──Films
> ├──etc.
>
> You could run:
>
> $ find VIDEO -type f | xargs md5sum > digest.log
>
> The file digest.log will contain md5sum hashes of each of your files within
> the VIDEO directory and its subdirectories.
>
> To check if any of these files have changed, become corrupted, etc. you can
> run:
>
> $ md5sum -c digest.log | grep FAILED
>
> If you want to compare the contents of the same VIDEO directory on a back up,
> you can copy the same digest file with its hashes over to the backup top
> directory and run again:
>
> $ md5sum -c digest.log | grep FAILED
My tool does this as well. ;-)
In check mode, it recurses, looks for hash files and if it finds them,
checks all hashes. There is also an option to only check paths and
filenames, not hashes. This allows to quickly find files that have been
renamed or deleted since the hash file was created.
> > One thing I worry about is not just memory problems, drive failure but
> > also just some random error or even bit rot. Some of these files are
> > rarely changed or even touched. I'd like a way to detect problems and
> > there may even be a software tool that does this with some setup,
> > reminds me of Kbackup where you can select what to backup or leave out
> > on a directory or even individual file level.
Well that could be covered with ZFS, especially with a redundant pool so it
can repair itself. Otherwise it will only identify the bitrot, but not be
able to fix it.
> > Right now, I suspect my backup copy is likely better than my main copy.
The problem is: if they differ, how do you know which one is good apart from
watching one from start to finish? You could use vbindiff to first find the
part that changed. That will at least tell you where the difference is, so
you could seek to the area of the position in the video.
> This should work in rsync terms:
>
> rsync -v --checksum --delete --recursive --dry-run SOURCE/ DESTINATION
>
> It will output a list of files which have been deleted from the SOURCE and
> will need to be deleted at the DESTINATION directory.
If you look at changed *and* deleted files in one run, better use -i instead
of -v.
--
Grüße | Greetings | Salut | Qapla’
Please do not share anything from, with or about me on any social network.
If two processes are running concurrently,
the less important will take processor time away from the more important one.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
next prev parent reply other threads:[~2024-09-06 21:41 UTC|newest]
Thread overview: 55+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-09-03 23:28 [gentoo-user] Package compile failures with "internal compiler error: Segmentation fault" Dale
2024-09-04 0:12 ` [gentoo-user] " Grant Edwards
2024-09-04 0:39 ` Dale
2024-09-04 4:16 ` corbin bird
2024-09-06 20:15 ` Dale
2024-09-06 23:17 ` Michael
2024-09-07 3:02 ` Dale
2024-09-07 22:12 ` Wols Lists
2024-09-08 1:59 ` Dale
2024-09-08 13:32 ` Michael
2024-09-08 9:15 ` Michael
2024-09-08 20:19 ` Wol
2024-09-04 7:53 ` Raffaele Belardi
2024-09-04 4:26 ` [gentoo-user] " Eli Schwartz
2024-09-04 10:48 ` [gentoo-user] " Dale
2024-09-04 11:05 ` Frank Steinmetzger
2024-09-04 11:21 ` Dale
2024-09-04 15:57 ` Peter Humphrey
2024-09-04 19:09 ` Grant Edwards
2024-09-04 21:08 ` Frank Steinmetzger
2024-09-04 21:22 ` Grant Edwards
2024-09-04 21:53 ` Dale
2024-09-04 22:07 ` Grant Edwards
2024-09-04 22:14 ` Dale
2024-09-04 22:38 ` Michael
2024-09-05 0:11 ` Dale
2024-09-05 8:05 ` Michael
2024-09-05 8:36 ` Dale
2024-09-05 8:42 ` Michael
2024-09-05 10:53 ` Dale
2024-09-05 11:08 ` Michael
2024-09-05 11:30 ` Dale
2024-09-05 18:55 ` Frank Steinmetzger
2024-09-05 22:06 ` Michael
2024-09-06 0:43 ` Dale
2024-09-06 12:21 ` Michael
2024-09-06 21:41 ` Frank Steinmetzger [this message]
2024-09-07 9:37 ` Michael
2024-09-07 16:28 ` Frank Steinmetzger
2024-09-07 17:08 ` Mark Knecht
2024-09-14 19:46 ` Dale
2024-09-15 22:29 ` Frank Steinmetzger
2024-09-16 10:24 ` Dale
2024-09-07 22:48 ` Wols Lists
2024-09-08 9:37 ` Michael
2024-09-05 9:08 ` Frank Steinmetzger
2024-09-05 9:36 ` Michael
2024-09-05 10:01 ` Frank Steinmetzger
2024-09-05 10:59 ` Dale
2024-09-04 14:21 ` Grant Edwards
2024-09-04 11:37 ` Dale
2024-09-04 14:23 ` Grant Edwards
2024-09-04 15:58 ` Peter Humphrey
2024-09-04 19:28 ` Dale
2024-09-25 20:41 ` Dale
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Ztt3DeXRicmN1iCF@tp \
--to=warp_7@gmx.de \
--cc=gentoo-user@lists.gentoo.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox