1 |
Joost knows far more about databases than I do, so I mostly commented on |
2 |
the workflow part. |
3 |
|
4 |
On 2016-05-20 22:36, waltdnes@××××××××.org wrote: |
5 |
> Yes, I did RTFM at https://wiki.gentoo.org/wiki/PostgreSQL/QuickStart |
6 |
> and that's part of my problem. <G> I figured it would be a simple |
7 |
> search and replace "9.3" ==> "9.5" in the wiki, but... |
8 |
> |
9 |
> 1) The wiki recommends... |
10 |
> PG_INITDB_OPTS="--locale=en_US.UTF-8" |
11 |
> |
12 |
> ...but I get... |
13 |
> |
14 |
>> The database cluster will be initialized with locale "en_US.iso88591". |
15 |
>> initdb: "en_US.UTF8" is not a valid server encoding name |
16 |
> "locale -a" returns... |
17 |
> C |
18 |
> POSIX |
19 |
> en_US |
20 |
> en_US.iso88591 |
21 |
> en_US.utf8 |
22 |
> |
23 |
> 2) The wiki says... |
24 |
>> This time the focus is upon the files in the PGDATA directory, |
25 |
>> /etc/postgresql-9.3 , instead with primary focus on the |
26 |
>> postgresql.conf and pg_hba.conf files. |
27 |
> "ls /etc/postgresql-9.5/" returns... |
28 |
> postgresql.conf psqlrc |
29 |
> |
30 |
> but postgresql seems to want them in /var/lib instead... |
31 |
> |
32 |
>> mv: cannot stat '/var/lib/postgresql/9.5/data/pg_hba.conf': No such |
33 |
>> file or directory |
34 |
>> mv: cannot stat '/var/lib/postgresql/9.5/data/pg_ident.conf': No |
35 |
>> such file or directory |
36 |
>> mv: cannot stat '/var/lib/postgresql/9.5/data/postgresql.conf': |
37 |
>> No such file or directory |
38 |
> Can somebody please confirm the correct way to go? |
39 |
|
40 |
I have never run postgresql on gentoo (hopefully soon :D), but on |
41 |
Debian-derived distros and RPM-based distros, PGDATA is always somewhere |
42 |
in /var. /etc seems wrong. |
43 |
|
44 |
> |
45 |
> Why I want postgresql... I've been keeping a bunch of data in a |
46 |
> spreadsheet, and it's gotten too large. The spreadsheet locks up my |
47 |
> system when I try to update it. I've used "top" and watched as |
48 |
> gnumeric's memory consumption grows to eat all available ram. It locks |
49 |
> up the system so I can't even ssh in. This is on an X86_64 with 8 gigs |
50 |
> of RAM! Fortunately, "magic-sysrq" allows a relatively clean shutdown. |
51 |
> While we're at it, is there a way for gnumeric to pull in data directly |
52 |
> from postgresql? ODBC? I'm aware of copying from postgresql to a CSV |
53 |
> file and importing that, but it's rather clunky. |
54 |
|
55 |
`equery use gnumeric' gives the `libgda' flag, which should pull in |
56 |
database support. I've never used it, so I don't know whether or not it |
57 |
works/how well it works. What is in this spreadsheet? If it is financial |
58 |
stuff, you can use Gnucash, which supports using a database as a backend. |
59 |
|
60 |
> |
61 |
> My main problem is that columns of several thousand rows are functions |
62 |
> based on other columns of several thousand rows. For the time-being, |
63 |
> I've split up the spreadsheet into a few pieces, but a database is the |
64 |
> best solution. If I could run the calculations in the database, and |
65 |
> pull in the final results as static numbers for graphing, that would |
66 |
> greatly reduce the strain on the spreadsheet. Or is it possible to |
67 |
> graph directly from postgresql? |
68 |
|
69 |
Here are my recommendations, in order of "least code" to "most code" (I |
70 |
don't think postgresql supports graphing): |
71 |
|
72 |
1. Write some sql scripts that compute the data you need and output CSV, |
73 |
then import to Gnumeric and do the plots. |
74 |
2. Write python script(s) that run SQL commands and plot the data with |
75 |
matplotlib. |
76 |
3. Write a webapp so you don't have to run scripts by hand - the plots |
77 |
are generated by opening a web page. |
78 |
|
79 |
Depending on how much automation you want vs. how much time you want to |
80 |
spend writing/debugging code, hopefully one of those helps. I help |
81 |
researchers use a HPC cluster; some are very savvy programmers, some are |
82 |
not. For working on "big data" projects, some will throw raw data into a |
83 |
Hadoop cluster and happily do all their work using Hadoop, while some |
84 |
will put in raw data, clean it up, and then pull it out and use MATLAB, |
85 |
stata, R, etc., so you just need to find the workflow that works best |
86 |
for you. I personally would choose option 3, as it involves the least |
87 |
amount of running scripts over and over, but to each his own. |
88 |
|
89 |
I have actual free time now (done with school, finally), so I might be |
90 |
able to help prototype if you would like as well. |
91 |
|
92 |
Alec |