Gentoo Logo
Gentoo Spaceship

Installation:
Gentoo Handbook
Installation Docs

Documentation:
Home
Listing
About Gentoo
Philosophy
Social Contract

Resources:
Bug Tracker
Developer List
Discussion Forums
Gentoo BitTorrents
Gentoo Linux Enhancement Proposals
IRC Channels
Mailing Lists
Mirrors
Name and Logo Guidelines
Online Package Database
Security Announcements
Staffing Needs
Supporting Vendors
View our CVS

Graphics:
Logos and themes
Icons
ScreenShots

Miscellaneous Resources:
Gentoo Linux Store
Gentoo-hosted projects
IBM dW/Intel article archive




List Archive: gentoo-soc
Navigation:
Lists: gentoo-soc: < Prev By Thread Next > < Prev By Date Next >
Headers:
To: gentoo-soc@g.o
From: Sebastian Pipping <webmaster@...>
Subject: Gentoo stats gathering vs. privacy protection (was Re: About "Create and release a Gentoo stats server/client")
Date: Tue, 07 Apr 2009 06:00:37 +0200
Hello again!


A few fresh thoughts on data privacy protection and future Gentoo stats
gathering from talking to an expert earlier today.  If it doesn't make
sense, it's my fault not his ;-)

I'm sure you will have additions and corrections to this.
Go ahead, I want you to.


== Simplified overview ==
The data we intend to collect is bound to machines.  If every machines
submitting information sends some unique identifier we know that a
machine with that identifier has the properties submitted.  As we won't
store IPs nobody with the collected data will be able to reduce some
submitted machine config back to the location of the machine, the name
of the admin or so.


== Exceptions ==
The point were that generality breaks where the data you submit can be
linked with information from other sources.  If anything of the data you
submit is occuring "rarely enough" in the wild it can allow mapping some
machine config back to information about the machine's location or its
users.

Example:
  If you're the only one who has the only ever produced 645-core CPU
  running at home, the guy who sold it to you can map your data back
  to your name and address, provided he's able to find the bill for it.


== Counter-measures ==
To reduce this rare-configuration issue one could only list
pieces of information that has reached a certain minimum, say 25
occurrences.

Example:
  "Gentoo" would not be showing up in the Operating System
  section unless at least 25 submissions with OS "Gentoo" have
  been received.

Smolt is effectively doing something like that as my Gentoo submission
is not shown though my data shows at other places at their page.
The minimum occurrence of an OS on that page is 27 because the top 30
entries are listed.  If I submit 28 fake data sets it should show up
in the list.  Anybody else could do that too and therefore find my
Gentoo-OS entry as he can identify his own fake entries easily.
All he'll get though is my machine ID if it's exposed but that's it.
As I am writing about it I add extra information allowing you to resolve
that entry back to my person.  So that's another example of linking with
other information.


== Conclusions ==
People who have super-secret custom setups that nobody must know about
will not want to give us their data.  That's okay.  All other people
with avarage desktop machines won't have any rare data to submit.


== Proposal ==
- Use machine IDs on server side to estimate the set of hosts
  involved and to refresh data for that machine later
- State what data we gather and what we do with it.
- Make stat submitters question themselves if their setup has
  anything top-secret to it.  An informative and not-over-frightening
  is very important at that point.
- Allow them to configure-away that data from what they submit

They keep what the want to keep, we get what we want to get.


What do you think?



Sebastian




References:
About "Create and release a Gentoo stats server/client"
-- Sebastian Pipping
Re: About "Create and release a Gentoo stats server/client"
-- Sebastian Pipping
Re: About "Create and release a Gentoo stats server/client"
-- Fabian Groffen
Re: About "Create and release a Gentoo stats server/client"
-- Sebastian Pipping
Re: About "Create and release a Gentoo stats server/client"
-- Fabian Groffen
Navigation:
Lists: gentoo-soc: < Prev By Thread Next > < Prev By Date Next >
Previous by thread:
Re: About "Create and release a Gentoo stats server/client"
Next by thread:
Re: About "Create and release a Gentoo stats server/client"
Previous by date:
Fwd: On making accurate time estimates
Next by date:
*Silence*


Updated Jun 17, 2009

Donate to support our development efforts.

Gentoo Centric Hosting: vr.org

VR Hosted

Tek Alchemy

Tek Alchemy

SevenL.net

SevenL.net

php|architect

php|architect

Copyright 2001-2007 Gentoo Foundation, Inc. Questions, Comments? Email www@gentoo.org.