Gentoo Archives: gentoo-portage-dev

From: Zac Medico <zmedico@g.o>
To: gentoo-portage-dev@l.g.o
Cc: Zac Medico <zmedico@g.o>
Subject: [gentoo-portage-dev] [PATCH] fuzzy search: weigh category similarity independently (bug 623648)
Date: Sat, 08 Jul 2017 20:07:25
Message-Id: 20170708200358.32204-1-zmedico@gentoo.org
1 Weigh the similarity of category and package names independently,
2 in order to avoid matching lots of irrelevant packages in the same
3 category when the package name is much shorter than the category
4 name.
5
6 X-Gentoo-bug: 623648
7 X-Gentoo-bug-url: https://bugs.gentoo.org/show_bug.cgi?id=623648
8 ---
9 pym/_emerge/search.py | 24 +++++++++++++++++++++---
10 1 file changed, 21 insertions(+), 3 deletions(-)
11
12 diff --git a/pym/_emerge/search.py b/pym/_emerge/search.py
13 index 20a0c026e..dc91ad315 100644
14 --- a/pym/_emerge/search.py
15 +++ b/pym/_emerge/search.py
16 @@ -264,15 +264,33 @@ class search(object):
17 if self.fuzzy:
18 fuzzy = True
19 cutoff = float(self.search_similarity) / 100
20 - seq_match = difflib.SequenceMatcher()
21 - seq_match.set_seq2(self.searchkey.lower())
22 + if match_category:
23 + # Weigh the similarity of category and package
24 + # names independently, in order to avoid matching
25 + # lots of irrelevant packages in the same category
26 + # when the package name is much shorter than the
27 + # category name.
28 + part_split = portage.catsplit
29 + else:
30 + part_split = lambda match_string: (match_string,)
31
32 - def fuzzy_search(match_string):
33 + part_matchers = []
34 + for part in part_split(self.searchkey):
35 + seq_match = difflib.SequenceMatcher()
36 + seq_match.set_seq2(part.lower())
37 + part_matchers.append(seq_match)
38 +
39 + def fuzzy_search_part(seq_match, match_string):
40 seq_match.set_seq1(match_string.lower())
41 return (seq_match.real_quick_ratio() >= cutoff and
42 seq_match.quick_ratio() >= cutoff and
43 seq_match.ratio() >= cutoff)
44
45 + def fuzzy_search(match_string):
46 + return all(fuzzy_search_part(seq_match, part)
47 + for seq_match, part in zip(
48 + part_matchers, part_split(match_string)))
49 +
50 for package in self._cp_all():
51 self._spinner_update()
52
53 --
54 2.13.0

Replies