Hi, I'm JT and these are my thoughts on community, content management, Plain Black, and WebGUI.
A new feature has been added in 7.7.6-beta that will allow you to participate in a worldwide survey of WebGUI sites. The data will be used to determine what features are most used in WebGUI, and how WebGUI sites grow. This will be useful for developers to make WebGUI faster, easier to use, and all around better. But it may be useful to you as well, whether you want to know how your site stacks up against other users, or you want to see how many other sites are deployed on the same operating system as you.
This video shows you how easy it is to enable this new feature:
A summary of all the data collected will be published to webgui.org/stats.
I hope all of you will choose to particpate in this survey after you upgrade to WebGUI 7.7. For those of you concerned about privacy, don't worry, we've got you covered. None of the data sent to the central repository will contain anything identifiable about your site or you. It is all generic data like how many users you have, and how many assets you have, but no site name, company name, email address, or any other personal information is sent in the survey data.
For those of you concerned about privacy, don't worry, we've got you covered.
I'm sorry, but you will have to do better than that to convince me to turn this feature on. If I'm going to give you any potentially private data I will want to know in advance what exactly you are going to do with it.
Of course you write this:
None of the data sent to the central repository will contain anything identifiable about your site or you. It is all generic data like how many users you have, and how many assets you have, but no site name, company name, email address, or any other personal information is sent in the survey data.
What I would like to know is: will there be a way to check if the data received is actually the data that will be presented?
What exactly are you trying to achieve and how?
When you change something in the stats gathering code, will you promise to let me know in advance and will you automatically (by way of updatescript) disable the sending of data so that I will have to re-enable it after review of the changes?
The data will be used to determine what features are most used in WebGUI, and how WebGUI sites grow.
What do you expect to see and when and where are these figures going to be presented?
Will there be a real time access to this data somewhere?
Don't get me wrong, I do like the idea of having community stats, I just want to be absolutely sure that my privacy is assured.
Koen de Jonge - ProcoliX
http://www.procolix.com
Hosting - WebGUI - Virtualization
The stats page on webgui.org still says this:
Nothing to see here just yet. Still collecting initial data.
Does that mean that no data has been received up till now?
How many sites are reporting stats now?
Koen de Jonge - ProcoliX
http://www.procolix.com
Hosting - WebGUI - Virtualization
I'm not sure what you mean. I don't plan on displaying specific entries. This data is all about trends over time.What I would like to know is: will there be a way to check if the data received is actually the data that will be presented?
What exactly are you trying to achieve and how?
When you change something in the stats gathering code, will you promise to let me know in advance and will you automatically (by way of updatescript) disable the sending of data so that I will have to re-enable it after review of the changes?
I already said both of those things in the original post.What do you expect to see and when and where are these figures going to be presented?
What do you mean by realtime access?Will there be a real time access to this data somewhere?
Don't get me wrong, I do like the idea of having community stats, I just want to be absolutely sure that my privacy is assured.
I think the best way we can assure you of your privacy is to remind you that WebGUI is free, open-source software. You can open up the code and see exactly what is sent and how it is sent.
See lib/WebGUI/Workflow/Activity/SendWebguiStats.pm
If you don't trust JT, you can trust me, or any of the other people who will help WebGUI by enabling this and will be checking the code just like I did.
Just one more benefit of free software: The developers can't get away with violating users' privacy.
Doug, I trust you or any of the people who develop WebGUI.
Given the link to the source you gave me currently this is submitted (in WebGUI 7.7):
Now going into paranoia mode:
This is being sent over http (non encrypted) to www.webgui.org
In theory someone in the middle could find out your exact apache version and perlversion and os and use that data to use a known exploit to hack your system.
/paranoia
Actually I think that I would enable stats if this list would stay the same and it would be sent over https :)
Koen de Jonge - ProcoliX
http://www.procolix.com
Hosting - WebGUI - Virtualization
First of all: thanks that makes things a lot clearer for me.
By realtime access I mean: can the community have the raw data submitted? Or are you only keeping counters? Perhaps this will become clear when the stats page has something to view on it.
How much more reassured can you be than that the code for both the collection of the data and the submission of the data is open source and reviewable by anyone at any time?
If the data itself is open source, that would help a lot. You could, for example have the raw data downloadable as a csv file.
If you (as a submitter of data) can see the stats grow day by day, that would help too.
Are you logging which IP adresses are submitting data?
How do you plan to deal with delibirate stats poisioning?
Koen de Jonge - ProcoliX
http://www.procolix.com
Hosting - WebGUI - Virtualization
By realtime access I mean: can the community have the raw data submitted? Or are you only keeping counters? Perhaps this will become clear when the stats page has something to view on it.
How much more reassured can you be than that the code for both the collection of the data and the submission of the data is open source and reviewable by anyone at any time?If the data itself is open source, that would help a lot. You could, for example have the raw data downloadable as a csv file.
The stats are submitted weekly, not daily, but yes you'll be able to see it grow over time.If you (as a submitter of data) can see the stats grow day by day, that would help too.
No.Are you logging which IP adresses are submitting data?
How do you plan to deal with delibirate stats poisioning?
> I don't. Since I'm not allowed to track anything at all about what site
the data is coming from,
> or the user submitting it, there's really not
much I can do to stop stats poisoning. Unless you
> have ideas.
I'm actually more concerned about this than someone figuring out who is submitting what data. Such poisoning could significantly impact the effectiveness of this data, esp if it were done slowly over time.
One possible solution is to require a registered user id be submitted with the data (no password required). You could then at least match the stats to an account and easily remove that account's stats if it's discovered that this user is submitting tainted data. Of course, you wouldn't show or in any way provide the userid when presenting data (formatted or raw).
As an extra bonus, this technique would allow a user to see just their stats. As an integrator, I'd like to see stats for just my clients. The trending of my stats would be interesting to compare to the community as a whole.
William
----
Knowmad Technologies
http://www.knowmad.com
The idea of tying the results to a username (authenticated using http basic auth, since otherwise I could say I'm you and what would argue with me) would solve the problem, but it would also destroy the anonymity. Perhaps allow it but don't require it, that way it's useful to you if you want to lose that measure of anonymity?
I'm not a statistician, but I believe that most statistical analyses are reported with an error % proportional to the results recieved. A larger number of results would have a smaller margin of error.
That, and what would one gain by skewing the stats? I'm not saying there aren't people who just screw with polls for the sake of screwing with polls, but if we can remove or reduce the benefit and increase the cost to do the screwing, we can prevent enough of the poisoning to be within a standard margin of error.
The only problem there being how to increase the cost of skewing results since we must remain open source. Do it on the server-side and only allow X results per site-hash, Y site-hashes per IP address per week? (so, say 40 sites on one IP can submit stats one time per week, but not 50 sites on one IP)
Good points about margin of error. Having by-passed stats in school, I'd forgotten about that.
Nonetheless, I like the idea of rate-limiting the stats to x number of submissions per IP per week. We're not hosting more than 2 dozen sites from a single IP so 50 should be adequate for us. Perhaps Alpha-Mega or Procolix have some other ideas.
----
Knowmad Technologies
http://www.knowmad.com
I think this is a great feature. It's an opportunity for eye-candy as well. I understand the privacy issue that may play a role in some cases, but I would have no problem submitting the ip. I would love to see a Google-map with a lot of dots all over the world, symbolizing WebGUI websites. If it could be a widget, I would put it on my website. I imagine a combination with that screenshot site, so you can hoover over a dot and see a small sceenshot and surf to that site. Surfin' the world of WebGUI.
Kind regards,
Arjan Widlak
United Knowledge
Internet for the public sector
Well, perhaps we should not call it a success right now, but at least it is being used and it generates data. 
Koen de Jonge - ProcoliX
http://www.procolix.com
Hosting - WebGUI - Virtualization