Gentoo Archives: gentoo-cluster

From: Jos Houtman <jos@×××××.nl>
To: gentoo-cluster@l.g.o
Subject: [gentoo-cluster] cluster or distributed queue, general question
Date: Thu, 10 Jan 2008 13:59:58
Message-Id: AD5924F6A589EF419765D39834C6BAFE699FBA@hyves1.exchange.cysonet.com
1 List,
2  
3 For my master thesis I took up a project that requires mapping of a number of statically defined parallel jobs into a more dynamic environment that allows better scaling.
4 The situation as described below let me to believe a cluster or distributed queue (DrQueue?) solution is necessary. For the situation see [situation] at the end of this email.
5  
6 Because I am new in this field I ask for a bit of your time to help me get my bearings on current work in the field and good documentation.
7  
8 To be able to see if there are any suitable (or near enough) environments, I made a list of capabilities that this environment should have:
9 * Dynamic load balancing, either by process migration or stopping jobs and starting them somewhere else.
10 * Dynamic decision on the degree of parallelism, according to the dataset that needs to be processed (growing/shrinking).
11 * Failover of the jobs when node failure happens.
12 * The guarantee that a job runs only once in the cluster, even during node failure.
13 * Limiting jobs to a class of nodes (subset of the total of nodes)
14  
15 Do you know of any projects that have these capabilities?
16 HPC clustering seems to come close but I don't know about the dynamic degree of parallelism, isn't that defined at job submission?
17 And when using process migration, do IP connections also migrate, in other words will database connections stay intact during process migration?
18  
19 I also hope you have some favorite resources on the subject, especially on methods that can be used for these capabilities.
20  
21  
22 [situation]
23 Over the years we have created a small (about 40) number of jobs that support the main function of our business ( an online social community). Typical jobs include aggregation of data, Queue processing, automated email notifications, video/photo rendering. A common factor is the need for database connections for all these scripts.
24  
25 For scalability issues most of these jobs are parallelized, sometimes the dataset is partitioned, and processing is done in manageable chunks. Each job basically run in a while(true) with a bit of sleep after a chunk is processed, so not to overwhelm the machine's when all data is processed. Some jobs, though, cannot be split and therefore cannot run in parallel since this would cause data corruption.
26  
27 To run these job we have about 10 nodes, configuration is done statically through a configuration file. The configuration defines how many instances there need to run, sometimes even where to run (crude load balancing). Because of our growing volume of users there is a need to identify which job cannot keep up and adjust the configuration accordingly. This is a cumbersome job that has grown out of habit and introduces in efficient use of the resources (both human and machine alike).
28  
29 With regards,
30
31 Jos Houtman
32 System administrator Hyves.nl
33 email: jos@×××××.nl
34
35
36 --
37 gentoo-cluster@l.g.o mailing list

Replies

Subject Author
Re: [gentoo-cluster] cluster or distributed queue, general question Panagiotis Christopoulos <pxrist@×××××.com>
Re: [gentoo-cluster] cluster or distributed queue, general question "Robin H. Johnson" <robbat2@g.o>