Gentoo Archives: gentoo-portage-dev

From:	Brian Dolbec <dolsen@g.o>
To:	gentoo-portage-dev@l.g.o
Subject:	Re: [gentoo-portage-dev] [PATCH] fuzzy search: weigh category similarity independently (bug 623648)
Date:	Sun, 09 Jul 2017 14:16:02
Message-Id:	`20170709071601.3fa165bd.dolsen@gentoo.org`
In Reply to:	[gentoo-portage-dev] [PATCH] fuzzy search: weigh category similarity independently (bug 623648) by Zac Medico

1	On Sat, 8 Jul 2017 13:03:58 -0700
2	Zac Medico <zmedico@g.o> wrote:
3
4	> Weigh the similarity of category and package names independently,
5	> in order to avoid matching lots of irrelevant packages in the same
6	> category when the package name is much shorter than the category
7	> name.
8	>
9	> X-Gentoo-bug: 623648
10	> X-Gentoo-bug-url: https://bugs.gentoo.org/show_bug.cgi?id=623648
11	> ---
12	> pym/_emerge/search.py \| 24 +++++++++++++++++++++---
13	> 1 file changed, 21 insertions(+), 3 deletions(-)
14	>
15	> diff --git a/pym/_emerge/search.py b/pym/_emerge/search.py
16	> index 20a0c026e..dc91ad315 100644
17	> --- a/pym/_emerge/search.py
18	> +++ b/pym/_emerge/search.py
19	> @@ -264,15 +264,33 @@ class search(object):
20	> if self.fuzzy:
21	> fuzzy = True
22	> cutoff =
23	> float(self.search_similarity) / 100
24	> - seq_match = difflib.SequenceMatcher()
25	> -
26	> seq_match.set_seq2(self.searchkey.lower())
27	> + if match_category:
28	> + # Weigh the similarity of
29	> category and package
30	> + # names independently, in
31	> order to avoid matching
32	> + # lots of irrelevant
33	> packages in the same category
34	> + # when the package name is
35	> much shorter than the
36	> + # category name.
37	> + part_split = portage.catsplit
38	> + else:
39	> + part_split = lambda
40	> match_string: (match_string,)
41	> - def fuzzy_search(match_string):
42	> + part_matchers = []
43	> + for part in
44	> part_split(self.searchkey):
45	> + seq_match =
46	> difflib.SequenceMatcher()
47	> +
48	> seq_match.set_seq2(part.lower())
49	> +
50	> part_matchers.append(seq_match) +
51	> + def fuzzy_search_part(seq_match,
52	> match_string): seq_match.set_seq1(match_string.lower())
53	> return
54	> (seq_match.real_quick_ratio() >= cutoff and seq_match.quick_ratio()
55	> >= cutoff and seq_match.ratio() >= cutoff)
56	>
57	> + def fuzzy_search(match_string):
58	> + return
59	> all(fuzzy_search_part(seq_match, part)
60	> + for seq_match, part
61	> in zip(
62	> + part_matchers,
63	> part_split(match_string))) +
64	> for package in self._cp_all():
65	> self._spinner_update()
66	>
67
68
69	looks fine
70
71	--
72	Brian Dolbec <dolsen>

Replies

Subject	Author
Re: [gentoo-portage-dev] [PATCH] fuzzy search: weigh category similarity independently (bug 623648)	Zac Medico <zmedico@g.o>

Report Message

Find on MARC Find on Google Groups