1 |
Brandon Vargo <brandon.vargo@×××××.com> writes: |
2 |
|
3 |
> do. When I go to find code that I have written, I do not remember |
4 |
> variable names, lines of code, etc that I can match with a regular |
5 |
> expression. Thus, that kind of search is pointless for me. I remember |
6 |
> what the code does, the project for which I wrote the code, and |
7 |
> approximately where the code is located within the project. I remember |
8 |
> function calls for libraries that I probably used. If I cannot find what |
9 |
> I am looking for, I use grep on the name of a function call I remember, |
10 |
> or I have a ctags file containing all the information I need about |
11 |
> function definitions. |
12 |
|
13 |
Again, thanks for a thorough answer... just a note on the above |
14 |
comment. |
15 |
|
16 |
I often find myself searching for a technique... NOT variable names or |
17 |
sub function names because who knows what I might call stuff in any |
18 |
particular script. |
19 |
|
20 |
For example... I once was shown how to compile as regular expression |
21 |
an element of @ARGV in perl, in one step: |
22 |
|
23 |
my $what_re = qr/@{[shift]}/; |
24 |
|
25 |
I liked that and have used it many times... but only recently could I |
26 |
remember at a moments notice how to write it. |
27 |
|
28 |
I used `grep -r' or 'egrep -r' as you've mentioned, now I use a |
29 |
my own perl script (recently written [since posting original query]) |
30 |
that uses regex and File::Find, where user feeds the regex and the |
31 |
approximate location to begin the search, on the cmd line. |
32 |
|
33 |
In my case that would be an nfs share /projects/reader/perl which is |
34 |
kept in my ENV as $perlp |
35 |
|
36 |
So: |
37 |
script.pl 'qr/.*?@' $perlp |
38 |
|
39 |
Will find a number of examples of using that particular technique. |
40 |
|
41 |
What prompted my query here, was looking for a way to search several |
42 |
thousand html pages that are a collection of Perl books on CD. |
43 |
|
44 |
These are 2 of the Oreilly Perl CDbooks. (I spent $150 for the first |
45 |
one, and I think the second was a little cheaper, it was yrs ago) The |
46 |
Books on CD have built in search tools but those only work on a |
47 |
windows OS and aren't up to much anyway. |
48 |
|
49 |
I've since downloaded the data from the CDS onto an opensolaris zfs |
50 |
server and access them through NFS. |
51 |
|
52 |
I was attempting to use `webglimpse' |
53 |
(http://webglimpse.net/download.php) for the task, hence the interest |
54 |
in indexing. But I suspect a search for a particular technique I read |
55 |
about, but have forgotten how to code, would be best searched for |
56 |
using regular expressions. This would be long after I've forgotten |
57 |
which section or even which book I read about it in. |
58 |
|
59 |
The tool I've written can be made to strip html if necessary and can |
60 |
be made to include (by regex) only certain kinds of filenames, but |
61 |
uses no index so consequently is pretty slow... but still very useful |
62 |
and is fully perl regex capable. |
63 |
|
64 |
It returns up to 4 lines of context, 2 above the line with the hit, |
65 |
and 1 below (where possible), along with the page number and the |
66 |
absolute filename where the hit was found. |
67 |
|
68 |
Here is an example search being timed: |
69 |
------- --------- ---=--- --------- -------- |
70 |
(I purposely picked something that would be found many times) |
71 |
|
72 |
time ./pgrep3 /var/www/localhost/htdocs/lcweb/cdbk+/AllPerl/ hash |
73 |
|
74 |
(So above we are searching a collection from the Oreilly CDbooks for |
75 |
the term `hash'..) |
76 |
|
77 |
(Just one example of the thousands of lines returned) |
78 |
[...] |
79 |
|
80 |
/var/www/localhost/htdocs/lcweb/cdbk+/AllPerl/perlnut/index/idx_p.htm |
81 |
135 dereferencing with : [104]4.8.2. Dereferencing |
82 |
136 modulus operator : [105]4.5.3. Arithmetic Operators |
83 |
137 prototype symbol (hash) : [106]4.7.5. Prototypes |
84 |
138 %= (assignment) operator : [107]4.5.6. Assignment Operators |
85 |
--- |
86 |
|
87 |
[...] |
88 |
|
89 |
Total files searched: 522 |
90 |
Total lines searched: 431689 |
91 |
real 1m48.344s |
92 |
user 1m25.234s |
93 |
sys 0m14.336s |
94 |
|
95 |
------- --------- ---=--- --------- -------- |
96 |
Almost 2 minutes to search 431689 lines |
97 |
|
98 |
So it is slow, maybe even very slow by comparison to tools using an |
99 |
indexed search. |
100 |
|
101 |
I don't really mind the sloth, but of course it would not be scalable |
102 |
very much above the scope of use I'm doing with it. I do like the |
103 |
precision search capability and plenty of context. All of the above is |
104 |
also possible with grep, egrep... and friends too, of course, but only |
105 |
with quite a lot more cmdline manipulation and piping. |
106 |
|
107 |
I'm currently working on using something like this basic search script |
108 |
to return URLS linking to the page and lines found, and working the |
109 |
whole thing into something that can be carried out with a web browser. |
110 |
|
111 |
Something pretty similar to webglimpse, I guess but without the |
112 |
benefit of indexing. |
113 |
|
114 |
Also webglimpe relies on glimpse which is not capable of full regex |
115 |
search but does have a rich mixture of regex, regex like and boolean |
116 |
query capability. |