1 |
kashani wrote: |
2 |
> A. Khattri wrote: |
3 |
>> On Wed, 19 Apr 2006, kashani wrote: |
4 |
>> |
5 |
>>> I'm not sold on the Google approach. |
6 |
>>> |
7 |
>>> Assuming someone was to build nine data servers we're talking |
8 |
>>> roughly |
9 |
>>> $3k per server (dual CPU, 4GB ram, raid 5 sata) or $30k with shipping |
10 |
>>> and tax. |
11 |
>> |
12 |
>> Actually Google use the cheapest hardware they can find, buy in bulk, and |
13 |
>> they assume stuff will fail so they plan accordingly. I very much doubt |
14 |
>> they spend $3K per server... |
15 |
>> |
16 |
> |
17 |
> Anyone trying to build the same with a purchase of less than 100 servers |
18 |
> is not going to spend much less. |
19 |
> |
20 |
> 2 x 2GB RAM = $1000 |
21 |
> 2 x CPU = $500 or so |
22 |
> Raid Card = $300 |
23 |
> Drives = $100 each |
24 |
> 1u chassis/MB/etc = $500 |
25 |
> |
26 |
> IIRC they drop the drives and shove everything into RAM. Which is fine |
27 |
> when you have a limited data set and enough machines to shove it into |
28 |
> RAM. Originally Google did have a limited data set. It was only after |
29 |
> the infrastructure reached a critical size that they began Google mail |
30 |
> and other large storage things. And had fours years to work out the |
31 |
> operational kinks. |
32 |
> |
33 |
> I think it unlikely that someone with a single storage set has enough |
34 |
> money or time to pay for a few Phds to write a custom filesystem, a few |
35 |
> hundred servers (10TB/4GB), and the datacenter monkey necessary to |
36 |
> replace gear constantly... oddly Google has both of these in spades. And |
37 |
> if you read the whitepaper the smallest Google data cluster is nineteen |
38 |
> servers, aka $40-60k for schleps like us, aka the cost of a SAN that |
39 |
> that uses less power and burns less switch ports. |
40 |
> |
41 |
> This infatuation with the Google stuff that very few people (ie none of |
42 |
> us) have the in house infrastructure to handle or the available cash is |
43 |
> useless. Unless someone has actually built their own mini Google and |
44 |
> wants to tell us all about it with nice numbers like total cost, source |
45 |
> code, transactions per second, cost per GB of user data, throughput, and |
46 |
> other data points. |
47 |
> |
48 |
> kashani |
49 |
|
50 |
Sorry, I'll have to pipe up about that... There's at least one guy here |
51 |
using mogilefs, which is basically a google approach :) I'm using it as |
52 |
well. |
53 |
|
54 |
There are a few nice things about distributing your files around: |
55 |
|
56 |
- You don't necessarily need to dedicate machines to it. I have a |
57 |
hundred and change diskless boot webservers. I add two harddrives to a |
58 |
box, fire up the mogstored daemon, and put it back into the webserver pool. |
59 |
- You're overspeccing. Why would each box need 4G of RAM? Whatever |
60 |
NAS/SAN you buy will *not* have nearly that much cache. If you're going |
61 |
to live without, live without :) A single dualcore chip or less could |
62 |
work too. |
63 |
- Something like MogileFS uses node-level redundancy (NAID? Someone |
64 |
bothered giving it a name...), so there's no point in buying RAID cards. |
65 |
Use onboard SATA/PATA or the cheapest cards you can buy that will give |
66 |
decent throughput ($50 or less instead of $300). |
67 |
- If it floats your boat, go ahead and get those 3U supermicro cases |
68 |
with 16 drive bays. Just use a couple cheap 4+ port SATA hot swap |
69 |
capable controllers instead of 2x$500+ battery backed RAID controllers. |
70 |
Get a bunch of them as slimmed down as possible. Hell, just fuse some |
71 |
steel together, tool an mb/PSU to it, and stack a ton of drives in front |
72 |
of some huge fans. Save 2/3rds of the case cost. |
73 |
|
74 |
My case might not be enough like yours, but I can easily add terabytes |
75 |
of space for *just* the cost of the drives (and the relatively small |
76 |
power addage per box I add drives to). There are a few other services |
77 |
too though... Mogile needs a central DB (which you'd have two of, |
78 |
right?) and tracker services. Again, cheap is fine. Maybe you have a DB |
79 |
with some free resources... I just can't imagine someone selling me a |
80 |
NAS/SAN that's this scalable and this cheap. |
81 |
|
82 |
have fun, |
83 |
-Dormando |
84 |
-- |
85 |
gentoo-server@g.o mailing list |