Gentoo Archives: gentoo-commits

From: "Michał Górny" <mgorny@g.o>
To: gentoo-commits@l.g.o
Subject: [gentoo-commits] repo/proj/guru:master commit in: dev-libs/fsst/
Date: Thu, 29 Apr 2021 08:23:38
Message-Id: 1619666694.f42f8ed5e0eb29a8d633471921aebf9db4f9bb08.mgorny@gentoo
1 commit: f42f8ed5e0eb29a8d633471921aebf9db4f9bb08
2 Author: Alessandro Barbieri <lssndrbarbieri <AT> gmail <DOT> com>
3 AuthorDate: Thu Apr 29 03:21:10 2021 +0000
4 Commit: Michał Górny <mgorny <AT> gentoo <DOT> org>
5 CommitDate: Thu Apr 29 03:24:54 2021 +0000
6 URL: https://gitweb.gentoo.org/repo/proj/guru.git/commit/?id=f42f8ed5
7
8 dev-libs/fsst: new package
9
10 Package-Manager: Portage-3.0.18, Repoman-3.0.3
11 Signed-off-by: Alessandro Barbieri <lssndrbarbieri <AT> gmail.com>
12
13 dev-libs/fsst/Manifest | 1 +
14 dev-libs/fsst/fsst-0_pre20200830.ebuild | 29 +++++++++++++++++++++++++++++
15 dev-libs/fsst/metadata.xml | 22 ++++++++++++++++++++++
16 3 files changed, 52 insertions(+)
17
18 diff --git a/dev-libs/fsst/Manifest b/dev-libs/fsst/Manifest
19 new file mode 100644
20 index 000000000..417198ca8
21 --- /dev/null
22 +++ b/dev-libs/fsst/Manifest
23 @@ -0,0 +1 @@
24 +DIST fsst-0_pre20200830.tar.gz 32289281 BLAKE2B 21184f7d80193ebcc279f38b8fdc2be563a65a7296ce226c8ae4da19cbd946b1bb412c5f4c661e3ad0405b03b57f83b4257ecf78f9642fb09a9eccd56616a8b1 SHA512 9dd416d0a711a6c38e8e0d8b445f328e5826096293dc1f1152ae3e67470d2f8f1d9df2bb88815f1178b67c8cd0ad130f9fa9b59a9547bcc272d37782c239d7b7
25
26 diff --git a/dev-libs/fsst/fsst-0_pre20200830.ebuild b/dev-libs/fsst/fsst-0_pre20200830.ebuild
27 new file mode 100644
28 index 000000000..6c49b03ba
29 --- /dev/null
30 +++ b/dev-libs/fsst/fsst-0_pre20200830.ebuild
31 @@ -0,0 +1,29 @@
32 +# Copyright 2021 Gentoo Authors
33 +# Distributed under the terms of the GNU General Public License v2
34 +
35 +EAPI=7
36 +
37 +inherit cmake
38 +
39 +COMMIT="fffb613071cb44319c0d6b743a8d6eafc2ed2ad7"
40 +DESCRIPTION="Fast Static Symbol Table: fast text compression that allows random access"
41 +HOMEPAGE="https://github.com/cwida/fsst"
42 +SRC_URI="https://github.com/cwida/fsst/archive/${COMMIT}.tar.gz -> ${P}.tar.gz"
43 +
44 +LICENSE="MIT"
45 +SLOT="0"
46 +KEYWORDS="~amd64"
47 +
48 +BDEPEND="app-admin/chrpath"
49 +RDEPEND="${DEPEND}"
50 +
51 +S="${WORKDIR}/${PN}-${COMMIT}"
52 +
53 +src_install() {
54 + chrpath -d "${BUILD_DIR}/fsst" || die
55 +
56 + doheader fsst.h libfsst.hpp
57 + dolib.so "${BUILD_DIR}/libfsst.so"
58 + dobin "${BUILD_DIR}/fsst"
59 + dodoc -r README.md fsst-presentation* fsstcompression.pdf
60 +}
61
62 diff --git a/dev-libs/fsst/metadata.xml b/dev-libs/fsst/metadata.xml
63 new file mode 100644
64 index 000000000..2f7e6891c
65 --- /dev/null
66 +++ b/dev-libs/fsst/metadata.xml
67 @@ -0,0 +1,22 @@
68 +<?xml version="1.0" encoding="UTF-8"?>
69 +<!DOCTYPE pkgmetadata SYSTEM 'http://www.gentoo.org/dtd/metadata.dtd'>
70 +<pkgmetadata>
71 + <longdescription lang="en">
72 +FSST: Fast Static Symbol Table compression
73 +see the PVLDB paper https://github.com/cwida/fsst/raw/master/fsstcompression.pdf
74 +
75 +FSST is a compression scheme focused on string/text data: it can compress strings from distributions with many different values (i.e. where dictionary compression will not work well). It allows *random-access* to compressed data: it is not block-based, so individual strings can be decompressed without touching the surrounding data in a compressed block. When compared to e.g. LZ4 (which is block-based), FSST further achieves similar decompression speed and compression speed, and better compression ratio.
76 +
77 +FSST encodes strings using a symbol table -- but it works on pieces of the string, as it maps "symbols" (1-8 byte sequences) onto "codes" (single-bytes). FSST can also represent a byte as an exception (255 followed by the original byte). Hence, compression transforms a sequence of bytes into a (supposedly shorter) sequence of codes or escaped bytes. These shorter byte-sequences could be seen as strings again and fit in whatever your program is that manipulates strings. An optional 0-terminated mode (like, C-strings) is also supported.
78 +
79 +FSST ensures that strings that are equal, are also equal in their compressed form. This means equality comparisons can be performed without decompressing the strings.
80 +
81 +FSST compression is quite useful in database systems and data file formats. It e.g., allows fine-grained decompression of values in case of selection predicates that are pushed down into a scan operator. But, very often FSST even allows to postpone decompression of string data. This means hash tables (in joins and aggregations) become smaller, and network communication (in case of distributed query processing) is reduced. All of this without requiring much structural changes to existing systems: after all, FSST compressed strings still remain strings.
82 +
83 +The implementation of FSST is quite portable, using CMake and has been verified to work on 64-bits x86 computers running Linux, MacOS and Windows.
84 + </longdescription>
85 + <upstream>
86 + <bugs-to>https://github.com/cwida/fsst/issues</bugs-to>
87 + <remote-id type="github">cwida/fsst</remote-id>
88 + </upstream>
89 +</pkgmetadata>