Gentoo Archives: gentoo-portage-dev

From: "Michał Górny" <mgorny@g.o>
To: gentoo-portage-dev@l.g.o
Cc: "Michał Górny" <mgorny@g.o>
Subject: [gentoo-portage-dev] [PATCH] Contribute squashdelta syncing module
Date: Sun, 05 Apr 2015 10:08:49
Message-Id: 1428228511-22133-1-git-send-email-mgorny@gentoo.org
1 The squashdelta module provides syncing via SquashFS snapshots. For the
2 initial sync, a complete snapshot is fetched and placed in
3 /var/cache/portage/squashfs. On subsequent sync operations, deltas are
4 fetched from the mirror and used to reconstruct the newest snapshot.
5
6 The distfile fetching logic is reused to fetch the remote files
7 and verify their checksums. Additionally, the sha512sum.txt file should
8 be OpenPGP-verified after fetching but this is currently unimplemented.
9
10 After fetching, Portage tries to (re-)mount the SquashFS in repository
11 location.
12 ---
13 cnf/repos.conf | 4 +
14 pym/portage/sync/modules/squashdelta/README | 124 +++++++++++++
15 pym/portage/sync/modules/squashdelta/__init__.py | 37 ++++
16 .../sync/modules/squashdelta/squashdelta.py | 192 +++++++++++++++++++++
17 4 files changed, 357 insertions(+)
18 create mode 100644 pym/portage/sync/modules/squashdelta/README
19 create mode 100644 pym/portage/sync/modules/squashdelta/__init__.py
20 create mode 100644 pym/portage/sync/modules/squashdelta/squashdelta.py
21
22 diff --git a/cnf/repos.conf b/cnf/repos.conf
23 index 1ca98ca..062fc0d 100644
24 --- a/cnf/repos.conf
25 +++ b/cnf/repos.conf
26 @@ -6,3 +6,7 @@ location = /usr/portage
27 sync-type = rsync
28 sync-uri = rsync://rsync.gentoo.org/gentoo-portage
29 auto-sync = yes
30 +
31 +# for daily squashfs snapshots
32 +#sync-type = squashdelta
33 +#sync-uri = mirror://gentoo/../snapshots/squashfs
34 diff --git a/pym/portage/sync/modules/squashdelta/README b/pym/portage/sync/modules/squashdelta/README
35 new file mode 100644
36 index 0000000..994ae6d
37 --- /dev/null
38 +++ b/pym/portage/sync/modules/squashdelta/README
39 @@ -0,0 +1,124 @@
40 +==================
41 + squashdelta-sync
42 +==================
43 +
44 +Introduction
45 +============
46 +
47 +Squashdelta-sync provides the squashfs syncing module for Portage.
48 +When used as sync-type for the repository, it fetches the complete
49 +repository snapshot on initial sync, and then uses squashdeltas to
50 +efficiently update it.
51 +
52 +While initially intended for the daily snapshot of the Gentoo
53 +repository, the module is designed with flexibility in mind. It can be
54 +used to sync any repository, without enforcing any specific snapshotting
55 +interval or versioning rules. However, each snapshot version identifier
56 +must be unique in the scope of repository.
57 +
58 +
59 +Technical hosting details
60 +=========================
61 +
62 +The snapshot hosting needs to provide the following files:
63 +
64 +1. the current (newest) full SquashFS snapshot of the repository,
65 + and optionally M past snapshots,
66 +
67 +2. the deltas from N past snapshots to the current snapshot,
68 +
69 +3. a ``sha512sum.txt`` file containing SHA-512 checksums of all hosted
70 + files, optionally OpenPGP-signed.
71 +
72 +The following naming schemes are used for the snapshots and deltas,
73 +respectively::
74 +
75 + ${repo_name}-${version}.sqfs
76 + ${repo_name}-${old_version}-${new_version}.sqdelta
77 +
78 +where:
79 +
80 +* ``${repo_name}`` is the repository name (as specified
81 + in ``repos.conf``),
82 +* ``${version}`` specifies the snapshot version,
83 +* ``${old_version}`` specifies the snapshot version which the delta
84 + updates from,
85 +* ``${new_version}`` specifies the snapshot version which the delta
86 + updates to.
87 +
88 +Version can be an arbitrary string. It does not need to be incremental,
89 +however each version must be unique in the repository scope.
90 +For example, the version can be a date, a revision number or a commit
91 +hash.
92 +
93 +The ``sha512sum.txt`` uses the format used by the GNU coreutils
94 +``sha512sum`` program. That is, it contains one or more lines consisting
95 +of hexadecimal SHA-512 checksum followed by whitespace, followed by
96 +a filename. Lines not matching that format should be ignored.
97 +
98 +Optionally, the ``sha512sum.txt`` may be OpenPGP-signed. In that case,
99 +the file conforms to the ASCII-armored OpenPGP message format, with
100 +the checksums being stored in the message body.
101 +
102 +Additionally, the ``sha512sum.txt`` needs to contain an additional line
103 +containing the following string::
104 +
105 + Current: ${repo_name}-${version}
106 +
107 +Stating the current (newest) snapshot version. If snapshots for multiple
108 +repositories are provided in the same directory (using the same
109 +``sha512sum.txt`` file), this line can occur multiple times or list
110 +multiple snapshots, whitespace-separated. In order not to introduce
111 +stray lines in the file, it is recommended to embed this information
112 +in the OpenPGP comment field.
113 +
114 +An example script generating daily deltas for a repository can be found
115 +in squashdelta-daily-gen_ repository.
116 +
117 +.. _squashdelta-daily-gen: https://bitbucket.org/mgorny/squashdelta-daily-gen
118 +
119 +
120 +Technical syncing details
121 +=========================
122 +
123 +When performing a sync, the script first fetches the ``sha512sum.txt``
124 +and processes it in order to determine the list of files available
125 +on the mirror. It should be noted that the script will never use
126 +a snapshot or delta that is not listed there. If the file is
127 +OpenPGP-signed, the signature is verified.
128 +
129 +The script scans scans the ``sha512sum.txt`` for a line containing
130 +the following string (case-insensitive)::
131 +
132 + Current:
133 +
134 +The text following this string is split on spaces, and the resulting
135 +tokens are parsed as snapshot names. The one matching the current
136 +repository name is used to determine the current (newest) snapshot
137 +version.
138 +
139 +Afterwards, the script scans the local cache directory for the following
140 +symlink::
141 +
142 + ${repo_name}-current.sqfs
143 +
144 +If the symlink exists, the file pointed by it is assumed to be
145 +the current (newest) local snapshot. Otherwise, the script assumes
146 +initial sync.
147 +
148 +On initial sync, the script fetches the newest snapshot from mirror
149 +and places it inside cache directory. The snapshot checksum is verified
150 +using ``sha512sum.txt`` and ``${repo_name}-current.sqfs`` symlink is
151 +created.
152 +
153 +On update, the script scans the file list for a delta transforming
154 +the current local snapshot to the newest remote snapshot. If such
155 +a delta is found, it is fetched, verified and applied to obtain
156 +the new snapshot. Afterwards, the resulting snapshot checksum is
157 +verified and the ``${repo_name}-current.sqfs`` symlink is updated.
158 +
159 +If no delta matches the version pair, it is assumed that the system is
160 +outdated beyond available deltas and a new snapshot is fetched instead
161 +(alike initial sync).
162 +
163 +.. vim:ft=rst
164 diff --git a/pym/portage/sync/modules/squashdelta/__init__.py b/pym/portage/sync/modules/squashdelta/__init__.py
165 new file mode 100644
166 index 0000000..1a17dea
167 --- /dev/null
168 +++ b/pym/portage/sync/modules/squashdelta/__init__.py
169 @@ -0,0 +1,37 @@
170 +# vim:fileencoding=utf-8:noet
171 +# (c) 2015 Michał Górny <mgorny@g.o>
172 +# Distributed under the terms of the GNU General Public License v2
173 +
174 +from portage.sync.config_checks import CheckSyncConfig
175 +
176 +
177 +DEFAULT_CACHE_LOCATION = '/var/cache/portage/squashfs'
178 +
179 +
180 +class CheckSquashDeltaConfig(CheckSyncConfig):
181 + def __init__(self, repo, logger):
182 + CheckSyncConfig.__init__(self, repo, logger)
183 + self.checks.append('check_cache_location')
184 +
185 + def check_cache_location(self):
186 + # TODO: make it configurable when Portage is fixed to support
187 + # arbitrary config variables
188 + pass
189 +
190 +
191 +module_spec = {
192 + 'name': 'squashdelta',
193 + 'description': 'Syncing SquashFS images using SquashDeltas',
194 + 'provides': {
195 + 'squashdelta-module': {
196 + 'name': "squashdelta",
197 + 'class': "SquashDeltaSync",
198 + 'description': 'Syncing SquashFS images using SquashDeltas',
199 + 'functions': ['sync', 'new', 'exists'],
200 + 'func_desc': {
201 + 'sync': 'Performs the sync of the repository',
202 + },
203 + 'validate_config': CheckSquashDeltaConfig,
204 + }
205 + }
206 +}
207 diff --git a/pym/portage/sync/modules/squashdelta/squashdelta.py b/pym/portage/sync/modules/squashdelta/squashdelta.py
208 new file mode 100644
209 index 0000000..a0dfc46
210 --- /dev/null
211 +++ b/pym/portage/sync/modules/squashdelta/squashdelta.py
212 @@ -0,0 +1,192 @@
213 +# vim:fileencoding=utf-8:noet
214 +# (c) 2015 Michał Górny <mgorny@g.o>
215 +# Distributed under the terms of the GNU General Public License v2
216 +
217 +import errno
218 +import io
219 +import logging
220 +import os
221 +import os.path
222 +import re
223 +
224 +import portage
225 +from portage.package.ebuild.fetch import fetch
226 +from portage.sync.syncbase import SyncBase
227 +
228 +from . import DEFAULT_CACHE_LOCATION
229 +
230 +
231 +class SquashDeltaSync(SyncBase):
232 + short_desc = "Repository syncing using SquashFS deltas"
233 +
234 + @staticmethod
235 + def name():
236 + return "SquashDeltaSync"
237 +
238 + def __init__(self):
239 + super(SquashDeltaSync, self).__init__(
240 + 'squashmerge', 'dev-util/squashmerge')
241 +
242 + def sync(self, **kwargs):
243 + self._kwargs(kwargs)
244 + my_settings = portage.config(clone = self.settings)
245 + cache_location = DEFAULT_CACHE_LOCATION
246 +
247 + # override fetching location
248 + my_settings['DISTDIR'] = cache_location
249 +
250 + # make sure we append paths correctly
251 + base_uri = self.repo.sync_uri
252 + if not base_uri.endswith('/'):
253 + base_uri += '/'
254 +
255 + def my_fetch(fn, **kwargs):
256 + kwargs['try_mirrors'] = 0
257 + return fetch([base_uri + fn], my_settings, **kwargs)
258 +
259 + # fetch sha512sum.txt
260 + sha512_path = os.path.join(cache_location, 'sha512sum.txt')
261 + try:
262 + os.unlink(sha512_path)
263 + except OSError:
264 + pass
265 + if not my_fetch('sha512sum.txt'):
266 + return (1, False)
267 +
268 + if 'webrsync-gpg' in my_settings.features:
269 + # TODO: GPG signature verification
270 + pass
271 +
272 + # sha512sum.txt parsing
273 + with io.open(sha512_path, 'r', encoding='utf8') as f:
274 + data = f.readlines()
275 +
276 + repo_re = re.compile(self.repo.name + '-(.*)$')
277 + # current tag
278 + current_re = re.compile('current:', re.IGNORECASE)
279 + # checksum
280 + checksum_re = re.compile('^([a-f0-9]{128})\s+(.*)$', re.IGNORECASE)
281 +
282 + def iter_snapshots(lines):
283 + for l in lines:
284 + m = current_re.search(l)
285 + if m:
286 + for s in l[m.end():].split():
287 + yield s
288 +
289 + def iter_checksums(lines):
290 + for l in lines:
291 + m = checksum_re.match(l)
292 + if m:
293 + yield (m.group(2), {
294 + 'size': None,
295 + 'SHA512': m.group(1),
296 + })
297 +
298 + # look for current indicator
299 + for s in iter_snapshots(data):
300 + m = repo_re.match(s)
301 + if m:
302 + new_snapshot = m.group(0) + '.sqfs'
303 + new_version = m.group(1)
304 + break
305 + else:
306 + logging.error('Unable to find current snapshot in sha512sum.txt')
307 + return (1, False)
308 + new_path = os.path.join(cache_location, new_snapshot)
309 +
310 + # get digests
311 + my_digests = dict(iter_checksums(data))
312 +
313 + # try to find a local snapshot
314 + old_version = None
315 + current_path = os.path.join(cache_location,
316 + self.repo.name + '-current.sqfs')
317 + try:
318 + old_snapshot = os.readlink(current_path)
319 + except OSError:
320 + pass
321 + else:
322 + m = repo_re.match(old_snapshot)
323 + if m and old_snapshot.endswith('.sqfs'):
324 + old_version = m.group(1)[:-5]
325 + old_path = os.path.join(cache_location, old_snapshot)
326 +
327 + if old_version is not None:
328 + if old_version == new_version:
329 + logging.info('Snapshot up-to-date, verifying integrity.')
330 + else:
331 + # attempt to update
332 + delta_path = None
333 + expected_delta = '%s-%s-%s.sqdelta' % (
334 + self.repo.name, old_version, new_version)
335 + if expected_delta not in my_digests:
336 + logging.warning('No delta for %s->%s, fetching new snapshot.'
337 + % (old_version, new_version))
338 + else:
339 + delta_path = os.path.join(cache_location, expected_delta)
340 +
341 + if not my_fetch(expected_delta, digests = my_digests):
342 + return (4, False)
343 + if not self.has_bin:
344 + return (5, False)
345 +
346 + ret = portage.process.spawn([self.bin_command,
347 + old_path, delta_path, new_path], **self.spawn_kwargs)
348 + if ret != os.EX_OK:
349 + logging.error('Merging the delta failed')
350 + return (6, False)
351 +
352 + # pass-through to verification and cleanup
353 +
354 + # fetch full snapshot or verify the one we have
355 + if not my_fetch(new_snapshot, digests = my_digests):
356 + return (2, False)
357 +
358 + # create/update -current symlink
359 + # using external ln for two reasons:
360 + # 1. clean --force (unlike python's unlink+symlink)
361 + # 2. easy userpriv (otherwise we'd have to lchown())
362 + ret = portage.process.spawn(['ln', '-s', '-f', new_snapshot, current_path],
363 + **self.spawn_kwargs)
364 + if ret != os.EX_OK:
365 + logging.error('Unable to set -current symlink')
366 + retrurn (3, False)
367 +
368 + # remove old snapshot
369 + if old_version is not None and old_version != new_version:
370 + try:
371 + os.unlink(old_path)
372 + except OSError as e:
373 + logging.warning('Unable to unlink old snapshot: ' + str(e))
374 + if delta_path is not None:
375 + try:
376 + os.unlink(delta_path)
377 + except OSError as e:
378 + logging.warning('Unable to unlink old delta: ' + str(e))
379 + try:
380 + os.unlink(sha512_path)
381 + except OSError as e:
382 + logging.warning('Unable to unlink sha512sum.txt: ' + str(e))
383 +
384 + mount_cmd = ['mount', current_path, self.repo.location]
385 + can_mount = True
386 + if os.path.ismount(self.repo.location):
387 + # need to umount old snapshot
388 + ret = portage.process.spawn(['umount', '-l', self.repo.location])
389 + if ret != os.EX_OK:
390 + logging.warning('Unable to unmount old SquashFS after update')
391 + can_mount = False
392 + else:
393 + try:
394 + os.makedirs(self.repo.location)
395 + except OSError as e:
396 + if e.errno != errno.EEXIST:
397 + raise
398 +
399 + if can_mount:
400 + ret = portage.process.spawn(mount_cmd)
401 + if ret != os.EX_OK:
402 + logging.warning('Unable to (re-)mount SquashFS after update')
403 +
404 + return (0, True)
405 --
406 2.3.5

Replies