1 |
The squashdelta module provides syncing via SquashFS snapshots. For the |
2 |
initial sync, a complete snapshot is fetched and placed in |
3 |
/var/cache/portage/squashfs. On subsequent sync operations, deltas are |
4 |
fetched from the mirror and used to reconstruct the newest snapshot. |
5 |
|
6 |
The distfile fetching logic is reused to fetch the remote files |
7 |
and verify their checksums. Additionally, the sha512sum.txt file should |
8 |
be OpenPGP-verified after fetching but this is currently unimplemented. |
9 |
|
10 |
After fetching, Portage tries to (re-)mount the SquashFS in repository |
11 |
location. |
12 |
--- |
13 |
cnf/repos.conf | 4 + |
14 |
pym/portage/sync/modules/squashdelta/README | 124 +++++++++++++ |
15 |
pym/portage/sync/modules/squashdelta/__init__.py | 37 ++++ |
16 |
.../sync/modules/squashdelta/squashdelta.py | 192 +++++++++++++++++++++ |
17 |
4 files changed, 357 insertions(+) |
18 |
create mode 100644 pym/portage/sync/modules/squashdelta/README |
19 |
create mode 100644 pym/portage/sync/modules/squashdelta/__init__.py |
20 |
create mode 100644 pym/portage/sync/modules/squashdelta/squashdelta.py |
21 |
|
22 |
diff --git a/cnf/repos.conf b/cnf/repos.conf |
23 |
index 1ca98ca..062fc0d 100644 |
24 |
--- a/cnf/repos.conf |
25 |
+++ b/cnf/repos.conf |
26 |
@@ -6,3 +6,7 @@ location = /usr/portage |
27 |
sync-type = rsync |
28 |
sync-uri = rsync://rsync.gentoo.org/gentoo-portage |
29 |
auto-sync = yes |
30 |
+ |
31 |
+# for daily squashfs snapshots |
32 |
+#sync-type = squashdelta |
33 |
+#sync-uri = mirror://gentoo/../snapshots/squashfs |
34 |
diff --git a/pym/portage/sync/modules/squashdelta/README b/pym/portage/sync/modules/squashdelta/README |
35 |
new file mode 100644 |
36 |
index 0000000..994ae6d |
37 |
--- /dev/null |
38 |
+++ b/pym/portage/sync/modules/squashdelta/README |
39 |
@@ -0,0 +1,124 @@ |
40 |
+================== |
41 |
+ squashdelta-sync |
42 |
+================== |
43 |
+ |
44 |
+Introduction |
45 |
+============ |
46 |
+ |
47 |
+Squashdelta-sync provides the squashfs syncing module for Portage. |
48 |
+When used as sync-type for the repository, it fetches the complete |
49 |
+repository snapshot on initial sync, and then uses squashdeltas to |
50 |
+efficiently update it. |
51 |
+ |
52 |
+While initially intended for the daily snapshot of the Gentoo |
53 |
+repository, the module is designed with flexibility in mind. It can be |
54 |
+used to sync any repository, without enforcing any specific snapshotting |
55 |
+interval or versioning rules. However, each snapshot version identifier |
56 |
+must be unique in the scope of repository. |
57 |
+ |
58 |
+ |
59 |
+Technical hosting details |
60 |
+========================= |
61 |
+ |
62 |
+The snapshot hosting needs to provide the following files: |
63 |
+ |
64 |
+1. the current (newest) full SquashFS snapshot of the repository, |
65 |
+ and optionally M past snapshots, |
66 |
+ |
67 |
+2. the deltas from N past snapshots to the current snapshot, |
68 |
+ |
69 |
+3. a ``sha512sum.txt`` file containing SHA-512 checksums of all hosted |
70 |
+ files, optionally OpenPGP-signed. |
71 |
+ |
72 |
+The following naming schemes are used for the snapshots and deltas, |
73 |
+respectively:: |
74 |
+ |
75 |
+ ${repo_name}-${version}.sqfs |
76 |
+ ${repo_name}-${old_version}-${new_version}.sqdelta |
77 |
+ |
78 |
+where: |
79 |
+ |
80 |
+* ``${repo_name}`` is the repository name (as specified |
81 |
+ in ``repos.conf``), |
82 |
+* ``${version}`` specifies the snapshot version, |
83 |
+* ``${old_version}`` specifies the snapshot version which the delta |
84 |
+ updates from, |
85 |
+* ``${new_version}`` specifies the snapshot version which the delta |
86 |
+ updates to. |
87 |
+ |
88 |
+Version can be an arbitrary string. It does not need to be incremental, |
89 |
+however each version must be unique in the repository scope. |
90 |
+For example, the version can be a date, a revision number or a commit |
91 |
+hash. |
92 |
+ |
93 |
+The ``sha512sum.txt`` uses the format used by the GNU coreutils |
94 |
+``sha512sum`` program. That is, it contains one or more lines consisting |
95 |
+of hexadecimal SHA-512 checksum followed by whitespace, followed by |
96 |
+a filename. Lines not matching that format should be ignored. |
97 |
+ |
98 |
+Optionally, the ``sha512sum.txt`` may be OpenPGP-signed. In that case, |
99 |
+the file conforms to the ASCII-armored OpenPGP message format, with |
100 |
+the checksums being stored in the message body. |
101 |
+ |
102 |
+Additionally, the ``sha512sum.txt`` needs to contain an additional line |
103 |
+containing the following string:: |
104 |
+ |
105 |
+ Current: ${repo_name}-${version} |
106 |
+ |
107 |
+Stating the current (newest) snapshot version. If snapshots for multiple |
108 |
+repositories are provided in the same directory (using the same |
109 |
+``sha512sum.txt`` file), this line can occur multiple times or list |
110 |
+multiple snapshots, whitespace-separated. In order not to introduce |
111 |
+stray lines in the file, it is recommended to embed this information |
112 |
+in the OpenPGP comment field. |
113 |
+ |
114 |
+An example script generating daily deltas for a repository can be found |
115 |
+in squashdelta-daily-gen_ repository. |
116 |
+ |
117 |
+.. _squashdelta-daily-gen: https://bitbucket.org/mgorny/squashdelta-daily-gen |
118 |
+ |
119 |
+ |
120 |
+Technical syncing details |
121 |
+========================= |
122 |
+ |
123 |
+When performing a sync, the script first fetches the ``sha512sum.txt`` |
124 |
+and processes it in order to determine the list of files available |
125 |
+on the mirror. It should be noted that the script will never use |
126 |
+a snapshot or delta that is not listed there. If the file is |
127 |
+OpenPGP-signed, the signature is verified. |
128 |
+ |
129 |
+The script scans scans the ``sha512sum.txt`` for a line containing |
130 |
+the following string (case-insensitive):: |
131 |
+ |
132 |
+ Current: |
133 |
+ |
134 |
+The text following this string is split on spaces, and the resulting |
135 |
+tokens are parsed as snapshot names. The one matching the current |
136 |
+repository name is used to determine the current (newest) snapshot |
137 |
+version. |
138 |
+ |
139 |
+Afterwards, the script scans the local cache directory for the following |
140 |
+symlink:: |
141 |
+ |
142 |
+ ${repo_name}-current.sqfs |
143 |
+ |
144 |
+If the symlink exists, the file pointed by it is assumed to be |
145 |
+the current (newest) local snapshot. Otherwise, the script assumes |
146 |
+initial sync. |
147 |
+ |
148 |
+On initial sync, the script fetches the newest snapshot from mirror |
149 |
+and places it inside cache directory. The snapshot checksum is verified |
150 |
+using ``sha512sum.txt`` and ``${repo_name}-current.sqfs`` symlink is |
151 |
+created. |
152 |
+ |
153 |
+On update, the script scans the file list for a delta transforming |
154 |
+the current local snapshot to the newest remote snapshot. If such |
155 |
+a delta is found, it is fetched, verified and applied to obtain |
156 |
+the new snapshot. Afterwards, the resulting snapshot checksum is |
157 |
+verified and the ``${repo_name}-current.sqfs`` symlink is updated. |
158 |
+ |
159 |
+If no delta matches the version pair, it is assumed that the system is |
160 |
+outdated beyond available deltas and a new snapshot is fetched instead |
161 |
+(alike initial sync). |
162 |
+ |
163 |
+.. vim:ft=rst |
164 |
diff --git a/pym/portage/sync/modules/squashdelta/__init__.py b/pym/portage/sync/modules/squashdelta/__init__.py |
165 |
new file mode 100644 |
166 |
index 0000000..1a17dea |
167 |
--- /dev/null |
168 |
+++ b/pym/portage/sync/modules/squashdelta/__init__.py |
169 |
@@ -0,0 +1,37 @@ |
170 |
+# vim:fileencoding=utf-8:noet |
171 |
+# (c) 2015 Michał Górny <mgorny@g.o> |
172 |
+# Distributed under the terms of the GNU General Public License v2 |
173 |
+ |
174 |
+from portage.sync.config_checks import CheckSyncConfig |
175 |
+ |
176 |
+ |
177 |
+DEFAULT_CACHE_LOCATION = '/var/cache/portage/squashfs' |
178 |
+ |
179 |
+ |
180 |
+class CheckSquashDeltaConfig(CheckSyncConfig): |
181 |
+ def __init__(self, repo, logger): |
182 |
+ CheckSyncConfig.__init__(self, repo, logger) |
183 |
+ self.checks.append('check_cache_location') |
184 |
+ |
185 |
+ def check_cache_location(self): |
186 |
+ # TODO: make it configurable when Portage is fixed to support |
187 |
+ # arbitrary config variables |
188 |
+ pass |
189 |
+ |
190 |
+ |
191 |
+module_spec = { |
192 |
+ 'name': 'squashdelta', |
193 |
+ 'description': 'Syncing SquashFS images using SquashDeltas', |
194 |
+ 'provides': { |
195 |
+ 'squashdelta-module': { |
196 |
+ 'name': "squashdelta", |
197 |
+ 'class': "SquashDeltaSync", |
198 |
+ 'description': 'Syncing SquashFS images using SquashDeltas', |
199 |
+ 'functions': ['sync', 'new', 'exists'], |
200 |
+ 'func_desc': { |
201 |
+ 'sync': 'Performs the sync of the repository', |
202 |
+ }, |
203 |
+ 'validate_config': CheckSquashDeltaConfig, |
204 |
+ } |
205 |
+ } |
206 |
+} |
207 |
diff --git a/pym/portage/sync/modules/squashdelta/squashdelta.py b/pym/portage/sync/modules/squashdelta/squashdelta.py |
208 |
new file mode 100644 |
209 |
index 0000000..a0dfc46 |
210 |
--- /dev/null |
211 |
+++ b/pym/portage/sync/modules/squashdelta/squashdelta.py |
212 |
@@ -0,0 +1,192 @@ |
213 |
+# vim:fileencoding=utf-8:noet |
214 |
+# (c) 2015 Michał Górny <mgorny@g.o> |
215 |
+# Distributed under the terms of the GNU General Public License v2 |
216 |
+ |
217 |
+import errno |
218 |
+import io |
219 |
+import logging |
220 |
+import os |
221 |
+import os.path |
222 |
+import re |
223 |
+ |
224 |
+import portage |
225 |
+from portage.package.ebuild.fetch import fetch |
226 |
+from portage.sync.syncbase import SyncBase |
227 |
+ |
228 |
+from . import DEFAULT_CACHE_LOCATION |
229 |
+ |
230 |
+ |
231 |
+class SquashDeltaSync(SyncBase): |
232 |
+ short_desc = "Repository syncing using SquashFS deltas" |
233 |
+ |
234 |
+ @staticmethod |
235 |
+ def name(): |
236 |
+ return "SquashDeltaSync" |
237 |
+ |
238 |
+ def __init__(self): |
239 |
+ super(SquashDeltaSync, self).__init__( |
240 |
+ 'squashmerge', 'dev-util/squashmerge') |
241 |
+ |
242 |
+ def sync(self, **kwargs): |
243 |
+ self._kwargs(kwargs) |
244 |
+ my_settings = portage.config(clone = self.settings) |
245 |
+ cache_location = DEFAULT_CACHE_LOCATION |
246 |
+ |
247 |
+ # override fetching location |
248 |
+ my_settings['DISTDIR'] = cache_location |
249 |
+ |
250 |
+ # make sure we append paths correctly |
251 |
+ base_uri = self.repo.sync_uri |
252 |
+ if not base_uri.endswith('/'): |
253 |
+ base_uri += '/' |
254 |
+ |
255 |
+ def my_fetch(fn, **kwargs): |
256 |
+ kwargs['try_mirrors'] = 0 |
257 |
+ return fetch([base_uri + fn], my_settings, **kwargs) |
258 |
+ |
259 |
+ # fetch sha512sum.txt |
260 |
+ sha512_path = os.path.join(cache_location, 'sha512sum.txt') |
261 |
+ try: |
262 |
+ os.unlink(sha512_path) |
263 |
+ except OSError: |
264 |
+ pass |
265 |
+ if not my_fetch('sha512sum.txt'): |
266 |
+ return (1, False) |
267 |
+ |
268 |
+ if 'webrsync-gpg' in my_settings.features: |
269 |
+ # TODO: GPG signature verification |
270 |
+ pass |
271 |
+ |
272 |
+ # sha512sum.txt parsing |
273 |
+ with io.open(sha512_path, 'r', encoding='utf8') as f: |
274 |
+ data = f.readlines() |
275 |
+ |
276 |
+ repo_re = re.compile(self.repo.name + '-(.*)$') |
277 |
+ # current tag |
278 |
+ current_re = re.compile('current:', re.IGNORECASE) |
279 |
+ # checksum |
280 |
+ checksum_re = re.compile('^([a-f0-9]{128})\s+(.*)$', re.IGNORECASE) |
281 |
+ |
282 |
+ def iter_snapshots(lines): |
283 |
+ for l in lines: |
284 |
+ m = current_re.search(l) |
285 |
+ if m: |
286 |
+ for s in l[m.end():].split(): |
287 |
+ yield s |
288 |
+ |
289 |
+ def iter_checksums(lines): |
290 |
+ for l in lines: |
291 |
+ m = checksum_re.match(l) |
292 |
+ if m: |
293 |
+ yield (m.group(2), { |
294 |
+ 'size': None, |
295 |
+ 'SHA512': m.group(1), |
296 |
+ }) |
297 |
+ |
298 |
+ # look for current indicator |
299 |
+ for s in iter_snapshots(data): |
300 |
+ m = repo_re.match(s) |
301 |
+ if m: |
302 |
+ new_snapshot = m.group(0) + '.sqfs' |
303 |
+ new_version = m.group(1) |
304 |
+ break |
305 |
+ else: |
306 |
+ logging.error('Unable to find current snapshot in sha512sum.txt') |
307 |
+ return (1, False) |
308 |
+ new_path = os.path.join(cache_location, new_snapshot) |
309 |
+ |
310 |
+ # get digests |
311 |
+ my_digests = dict(iter_checksums(data)) |
312 |
+ |
313 |
+ # try to find a local snapshot |
314 |
+ old_version = None |
315 |
+ current_path = os.path.join(cache_location, |
316 |
+ self.repo.name + '-current.sqfs') |
317 |
+ try: |
318 |
+ old_snapshot = os.readlink(current_path) |
319 |
+ except OSError: |
320 |
+ pass |
321 |
+ else: |
322 |
+ m = repo_re.match(old_snapshot) |
323 |
+ if m and old_snapshot.endswith('.sqfs'): |
324 |
+ old_version = m.group(1)[:-5] |
325 |
+ old_path = os.path.join(cache_location, old_snapshot) |
326 |
+ |
327 |
+ if old_version is not None: |
328 |
+ if old_version == new_version: |
329 |
+ logging.info('Snapshot up-to-date, verifying integrity.') |
330 |
+ else: |
331 |
+ # attempt to update |
332 |
+ delta_path = None |
333 |
+ expected_delta = '%s-%s-%s.sqdelta' % ( |
334 |
+ self.repo.name, old_version, new_version) |
335 |
+ if expected_delta not in my_digests: |
336 |
+ logging.warning('No delta for %s->%s, fetching new snapshot.' |
337 |
+ % (old_version, new_version)) |
338 |
+ else: |
339 |
+ delta_path = os.path.join(cache_location, expected_delta) |
340 |
+ |
341 |
+ if not my_fetch(expected_delta, digests = my_digests): |
342 |
+ return (4, False) |
343 |
+ if not self.has_bin: |
344 |
+ return (5, False) |
345 |
+ |
346 |
+ ret = portage.process.spawn([self.bin_command, |
347 |
+ old_path, delta_path, new_path], **self.spawn_kwargs) |
348 |
+ if ret != os.EX_OK: |
349 |
+ logging.error('Merging the delta failed') |
350 |
+ return (6, False) |
351 |
+ |
352 |
+ # pass-through to verification and cleanup |
353 |
+ |
354 |
+ # fetch full snapshot or verify the one we have |
355 |
+ if not my_fetch(new_snapshot, digests = my_digests): |
356 |
+ return (2, False) |
357 |
+ |
358 |
+ # create/update -current symlink |
359 |
+ # using external ln for two reasons: |
360 |
+ # 1. clean --force (unlike python's unlink+symlink) |
361 |
+ # 2. easy userpriv (otherwise we'd have to lchown()) |
362 |
+ ret = portage.process.spawn(['ln', '-s', '-f', new_snapshot, current_path], |
363 |
+ **self.spawn_kwargs) |
364 |
+ if ret != os.EX_OK: |
365 |
+ logging.error('Unable to set -current symlink') |
366 |
+ retrurn (3, False) |
367 |
+ |
368 |
+ # remove old snapshot |
369 |
+ if old_version is not None and old_version != new_version: |
370 |
+ try: |
371 |
+ os.unlink(old_path) |
372 |
+ except OSError as e: |
373 |
+ logging.warning('Unable to unlink old snapshot: ' + str(e)) |
374 |
+ if delta_path is not None: |
375 |
+ try: |
376 |
+ os.unlink(delta_path) |
377 |
+ except OSError as e: |
378 |
+ logging.warning('Unable to unlink old delta: ' + str(e)) |
379 |
+ try: |
380 |
+ os.unlink(sha512_path) |
381 |
+ except OSError as e: |
382 |
+ logging.warning('Unable to unlink sha512sum.txt: ' + str(e)) |
383 |
+ |
384 |
+ mount_cmd = ['mount', current_path, self.repo.location] |
385 |
+ can_mount = True |
386 |
+ if os.path.ismount(self.repo.location): |
387 |
+ # need to umount old snapshot |
388 |
+ ret = portage.process.spawn(['umount', '-l', self.repo.location]) |
389 |
+ if ret != os.EX_OK: |
390 |
+ logging.warning('Unable to unmount old SquashFS after update') |
391 |
+ can_mount = False |
392 |
+ else: |
393 |
+ try: |
394 |
+ os.makedirs(self.repo.location) |
395 |
+ except OSError as e: |
396 |
+ if e.errno != errno.EEXIST: |
397 |
+ raise |
398 |
+ |
399 |
+ if can_mount: |
400 |
+ ret = portage.process.spawn(mount_cmd) |
401 |
+ if ret != os.EX_OK: |
402 |
+ logging.warning('Unable to (re-)mount SquashFS after update') |
403 |
+ |
404 |
+ return (0, True) |
405 |
-- |
406 |
2.3.5 |