Gentoo Archives: gentoo-user

From:	"Poison BL." <poisonbl@×××××.com>
To:	gentoo-user <gentoo-user@l.g.o>
Subject:	Re: [gentoo-user] replacement for ftp?
Date:	Sun, 30 Apr 2017 04:18:11
Message-Id:	`CAOTuDKqBcbz8uUVGgoZqnayNkic0tN4cHXU2Wqcqox_YMw+7NA@mail.gmail.com`
In Reply to:	Re: [gentoo-user] replacement for ftp? by lee

1	On Sat, Apr 29, 2017 at 9:11 PM, lee <lee@××××××××.de> wrote:
2	>
3	> "Poison BL." <poisonbl@×××××.com> writes:
4	> > Half petabyte datasets aren't really something I'd personally ever
5	trust
6	> > ftp with in the first place.
7	>
8	> Why not? (12GB are nowhere close to half a petabyte ...)
9
10	Ah... I completely misread that "or over 50k files in 12GB" as 50k files
11	at 12GB each... which works out to 0.6 PB, incidentally.
12
13	> The data would come in from suppliers. There isn't really anything
14	> going on atm but fetching data once a month which can be like 100MB or
15	> 12GB or more. That's because ppl don't use ftp ...
16
17	Really, if you're pulling it in from third party suppliers, you tend to be
18	tied to what they offer as a method of pulling it from them (or them
19	pushing it out to you), unless you're in the unique position to dictate the
20	decision for them. From there, assuming you can push your choice of product
21	on them, it becomes a question of how often the same dataset will need
22	updated from the same sources, how much it changes between updates, how
23	secure it needs to be in transit, how much you need to be able to trust
24	that the source is still legitimately who you think it is, and how much
25	verification that there wasn't any corruption during the transfer. Generic
26	FTP has been losing favor over time because it was built in a time that
27	many of those questions weren't really at the top of the list for concerns.
28
29	SFTP (or SCP) (as long as keys are handled properly) allows for pretty
30	solid certainty that a) both ends of the connection are who they say they
31	are, b) those two ends are the only ones reading the data in transit, and
32	c) the data that was sent is the same that was received (simply as a side
33	benefit of the encryption/decryption). Rsync over SSH gives the same set of
34	benefits, reduces the bandwidth used for updating the dataset (when it's
35	the same dataset, at least), and will also verify the data on both ends (as
36	it exists on disk) matches. If you're particularly lucky, the data might
37	even hit just the right mark that benefits from the in-line compression you
38	can turn on with SSH, too, cutting down the actual amount of bandwidth you
39	burn through for each transfer.
40
41	If your suppliers all have *nix based systems available, those are also
42	standard tools that they'll have on hand. If they're strictly Windows
43	shops, SCP/SFTP are still readily available, though they aren't built into
44	the OS... rsync gets a bit trickier.
45
46	> > How often does it need moved in/out of your facility, and is there no
47	way
48	> > to break up the processing into smaller chunks than a 0.6PB mass of
49	files?
50	> > Distribute out the smaller pieces with rsync, scp, or the like, operate
51	on
52	> > them, and pull back in the results, rather than trying to shift around
53	the
54	> > entire set. There's a reason Amazon will send a physical truck to a
55	site to
56	> > import large datasets into glacier... ;)
57	>
58	> Amazon has trucks? Perhaps they do in other countries. Here, amazon is
59	> just another web shop. They might have some delivery vans, but I've
60	> never seen one, so I doubt it. And why would anyone give them their
61	> data? There's no telling what they would do with it.
62
63	Amazon's also one of the best known cloud computing suppliers on the planet
64	(AWS = Amazon Web Services). They have everything from pure compute
65	offerings to cloud storage geared towards large data archival. The latter
66	offering is named "glacier", and they offer a service for the import of
67	data into it (usually the "first pass", incremental changes are generally
68	done over the wire) that consists of a shipping truck with a rather nifty
69	storage system in the back of it that they hook right into your network.
70	You fill it with data, and then they drive it back to one of their data
71	centers to load it into place.
72
73	--
74	Poison [BLX]
75	Joshua M. Murphy

Replies

Subject	Author
Re: [gentoo-user] replacement for ftp?	lee <lee@××××××××.de>

Report Message

Find on MARC Find on Google Groups