Gentoo Archives: gentoo-user

From: "Poison BL." <poisonbl@×××××.com>
To: gentoo-user <gentoo-user@l.g.o>
Subject: Re: [gentoo-user] replacement for ftp?
Date: Sun, 30 Apr 2017 04:18:11
Message-Id: CAOTuDKqBcbz8uUVGgoZqnayNkic0tN4cHXU2Wqcqox_YMw+7NA@mail.gmail.com
In Reply to: Re: [gentoo-user] replacement for ftp? by lee
1 On Sat, Apr 29, 2017 at 9:11 PM, lee <lee@××××××××.de> wrote:
2 >
3 > "Poison BL." <poisonbl@×××××.com> writes:
4 > > Half petabyte datasets aren't really something I'd personally *ever*
5 trust
6 > > ftp with in the first place.
7 >
8 > Why not? (12GB are nowhere close to half a petabyte ...)
9
10 Ah... I completely misread that "or over 50k files in 12GB" as 50k files
11 *at* 12GB each... which works out to 0.6 PB, incidentally.
12
13 > The data would come in from suppliers. There isn't really anything
14 > going on atm but fetching data once a month which can be like 100MB or
15 > 12GB or more. That's because ppl don't use ftp ...
16
17 Really, if you're pulling it in from third party suppliers, you tend to be
18 tied to what they offer as a method of pulling it from them (or them
19 pushing it out to you), unless you're in the unique position to dictate the
20 decision for them. From there, assuming you can push your choice of product
21 on them, it becomes a question of how often the same dataset will need
22 updated from the same sources, how much it changes between updates, how
23 secure it needs to be in transit, how much you need to be able to trust
24 that the source is still legitimately who you think it is, and how much
25 verification that there wasn't any corruption during the transfer. Generic
26 FTP has been losing favor over time because it was built in a time that
27 many of those questions weren't really at the top of the list for concerns.
28
29 SFTP (or SCP) (as long as keys are handled properly) allows for pretty
30 solid certainty that a) both ends of the connection are who they say they
31 are, b) those two ends are the only ones reading the data in transit, and
32 c) the data that was sent is the same that was received (simply as a side
33 benefit of the encryption/decryption). Rsync over SSH gives the same set of
34 benefits, reduces the bandwidth used for updating the dataset (when it's
35 the same dataset, at least), and will also verify the data on both ends (as
36 it exists on disk) matches. If you're particularly lucky, the data might
37 even hit just the right mark that benefits from the in-line compression you
38 can turn on with SSH, too, cutting down the actual amount of bandwidth you
39 burn through for each transfer.
40
41 If your suppliers all have *nix based systems available, those are also
42 standard tools that they'll have on hand. If they're strictly Windows
43 shops, SCP/SFTP are still readily available, though they aren't built into
44 the OS... rsync gets a bit trickier.
45
46 > > How often does it need moved in/out of your facility, and is there no
47 way
48 > > to break up the processing into smaller chunks than a 0.6PB mass of
49 files?
50 > > Distribute out the smaller pieces with rsync, scp, or the like, operate
51 on
52 > > them, and pull back in the results, rather than trying to shift around
53 the
54 > > entire set. There's a reason Amazon will send a physical truck to a
55 site to
56 > > import large datasets into glacier... ;)
57 >
58 > Amazon has trucks? Perhaps they do in other countries. Here, amazon is
59 > just another web shop. They might have some delivery vans, but I've
60 > never seen one, so I doubt it. And why would anyone give them their
61 > data? There's no telling what they would do with it.
62
63 Amazon's also one of the best known cloud computing suppliers on the planet
64 (AWS = Amazon Web Services). They have everything from pure compute
65 offerings to cloud storage geared towards *large* data archival. The latter
66 offering is named "glacier", and they offer a service for the import of
67 data into it (usually the "first pass", incremental changes are generally
68 done over the wire) that consists of a shipping truck with a rather nifty
69 storage system in the back of it that they hook right into your network.
70 You fill it with data, and then they drive it back to one of their data
71 centers to load it into place.
72
73 --
74 Poison [BLX]
75 Joshua M. Murphy

Replies

Subject Author
Re: [gentoo-user] replacement for ftp? lee <lee@××××××××.de>