1 |
On Monday 25 September 2006 22:55, Robert Persson <ireneshusband@×××××.com> |
2 |
wrote about '[gentoo-user] I have 146,000 files in lost+found. How do I |
3 |
sort them?': |
4 |
> Am I likely to find many usable files in that /lost+found directory? |
5 |
|
6 |
Maybe. I tried to recover a corrupted ext3 boot recently and was unable to |
7 |
pull anything useful out of lost and found that was larger than a |
8 |
symlink. :( If a number of files NOT in lost+found were corrupt, it's |
9 |
likely most of the files in lost+found are corrupt as well. |
10 |
|
11 |
That said, /boot data is generally easy to replace, so I put no effort into |
12 |
recovering files that were corrupted. If the data was valuable, if might |
13 |
be worth it to spend some time sorting those out. |
14 |
|
15 |
> If I can, how can I best sift through them? |
16 |
|
17 |
Carefully. :) |
18 |
|
19 |
> Is there a utility, or |
20 |
> something I could drop into a simple bash script, that would look at the |
21 |
> first few bytes of the file and, say, identify it as a jpeg or an xml |
22 |
> file, so that it could be given an appropriate file extension, deleted |
23 |
> or moved? |
24 |
|
25 |
As the other poster mentioned, the file utility is useful for identifying |
26 |
the type of file. Keep in mind though that is only looks at the first few |
27 |
bytes of the file, if there's corruption later on file won't notice. |
28 |
|
29 |
> Or is there one that could distinguish a text file from a |
30 |
> binary? |
31 |
|
32 |
Of course, file does this to some extent. A MIME type of text/* is |
33 |
generally text, while anything else is binary. But, file's output (by |
34 |
default) isn't a simple "binary" or "text" string. |
35 |
|
36 |
Some of the GNU utilities that are meant for text files will complain |
37 |
before operating on a binary file, so you could use those for this task, |
38 |
possibly. (I'm thinking of less and grep.) In particular, |
39 |
grep '[^[:print:]]' should return true when run against a file that |
40 |
contains non-printable characters (like control characters or NUL, and, |
41 |
depending on locale, non-7-bit-clean characters). |
42 |
|
43 |
> Are there any other strategies I could use to sift through these files |
44 |
> (assuming it would be worth doing)? |
45 |
|
46 |
Well, before you write some sort of bash script around file to rename |
47 |
stuff, you'll probably want to remove anything that is clearly trash, like |
48 |
device nodes or 0-length files. Something like: |
49 |
find lost+found \! \( -type f -o -type d \ -o -type l \) -o -empty -delete |
50 |
should work if you are using GNU find. |
51 |
|
52 |
-- |
53 |
"If there's one thing we've established over the years, |
54 |
it's that the vast majority of our users don't have the slightest |
55 |
clue what's best for them in terms of package stability." |
56 |
-- Gentoo Developer Ciaran McCreesh |