Hi, I'm JT and these are my thoughts on community, content management, Plain Black, and WebGUI.

Next: WebGUI Community Stats

User: JT
Date: 5/15/2009 10:54 am
Views: 3144
Rating: -1    Rate [

+

|

-

]
Send to a Friend

A new feature has been added in 7.7.6-beta that will allow you to participate in a worldwide survey of WebGUI sites. The data will be used to determine what features are most used in WebGUI, and how WebGUI sites grow. This will be useful for developers to make WebGUI faster, easier to use, and all around better. But it may be useful to you as well, whether you want to know how your site stacks up against other users, or you want to see how many other sites are deployed on the same operating system as you.


This video shows you how easy it is to enable this new feature:

A summary of all the data collected will be published to webgui.org/stats.

I hope all of you will choose to particpate in this survey after you upgrade to WebGUI 7.7. For those of you concerned about privacy, don't worry, we've got you covered. None of the data sent to the central repository will contain anything identifiable about your site or you. It is all generic data like how many users you have, and how many assets you have, but no site name, company name, email address, or any other personal information is sent in the survey data.

Replies

Flat
Re: Next: WebGUI Community Stats
User: koen
Date: 5/26/2009 2:28 am
Rating: 3    Rate [

+

|

-

]
Status: Approved

For those of you concerned about privacy, don't worry, we've got you covered.

I'm sorry, but you will have to do better than that to convince me to turn this feature on. If I'm going to give you any potentially private data I will want to know in advance what exactly you are going to do with it.

Of course you write this:

None of the data sent to the central repository will contain anything identifiable about your site or you. It is all generic data like how many users you have, and how many assets you have, but no site name, company name, email address, or any other personal information is sent in the survey data.

What I would like to know is: will there be a way to check if the data received is actually the data that will be presented?

What exactly are you trying to achieve and how?

When you change something in the stats gathering code, will you promise to let me know in advance and will you automatically (by way of updatescript) disable the sending of data so that I will have to re-enable it after review of the changes?

The data will be used to determine what features are most used in WebGUI, and how WebGUI sites grow.

What do you expect to see and when and where are these figures going to be presented?

Will there be a real time access to this data somewhere?

Don't get me wrong, I do like the idea of having community stats, I just want to be absolutely sure that my privacy is assured.

Koen de Jonge - ProcoliX
http://www.procolix.com
Hosting - WebGUI - Virtualization


Re: Next: WebGUI Community Stats
User: koen
Date: 5/26/2009 2:30 am
Rating: 2    Rate [

+

|

-

]
Status: Approved

The stats page on webgui.org still says this:

Nothing to see here just yet. Still collecting initial data.

Does that mean that no data has been received up till now?

How many sites are reporting stats now?

Koen de Jonge - ProcoliX
http://www.procolix.com
Hosting - WebGUI - Virtualization


Re: Next: WebGUI Community Stats
User: JT
Date: 5/26/2009 8:51 am
Rating: 2    Rate [

+

|

-

]
Status: Approved
> Does that mean that no data has been received up till now?
>
> How many sites are reporting stats now?
>

It means that I put a temporary page in place until we have some data  
worth displaying. I'd expect within the next couple of weeks there  
will be enough data worth displaying.


JT Smith
ph: 703-286-2525 x810
fx: 312-264-5382

Create like a god. Command like a king. Work like a slave.


Re: Next: WebGUI Community Stats
User: JT
Date: 5/26/2009 8:51 am
Rating: -13    Rate [

+

|

-

]
Status: Approved

What I would like to know is: will there be a way to check if the data received is actually the data that will be presented?

I'm not sure what you mean. I don't plan on displaying specific entries. This data is all about trends over time.

What exactly are you trying to achieve and how?

I actually did say that later on in the post. Specifically, "The data will be used to determine what features are most used in WebGUI, and how WebGUI sites grow." However, more specifically:

If one asset is used more than another asset then that tells us that we should look at the less used asset to see how we can make it more useful, and the more used asset to see how we can make it perform better. Another thing to look at is how quickly sites gain users over time, and the relationship of user growth to group growth, and user growth to asset growth. This is all about trending how people use their WebGUI sites so we can figure out how to make WebGUI better.

When you change something in the stats gathering code, will you promise to let me know in advance and will you automatically (by way of updatescript) disable the sending of data so that I will have to re-enable it after review of the changes?

I hadn't actually considered that, but I suppose we certainly could do that. At the present time I'm not sure what else we'd want to send back. But i suppose over time we might figure out something else that would be useful.

What do you expect to see and when and where are these figures going to be presented?

I already said both of those things in the original post.

Will there be a real time access to this data somewhere?

What do you mean by realtime access?

Don't get me wrong, I do like the idea of having community stats, I just want to be absolutely sure that my privacy is assured.

I'm not sure that anything I can ever do will assure *you* of your privacy. =) How much more reassured can you be than that the code for both the collection of the data and the submission of the data is open source and reviewable by anyone at any time?


JT Smith
ph: 703-286-2525 x810
fx: 312-264-5382

Create like a god. Command like a king. Work like a slave.


Re: Next: WebGUI Community Stats
User: preaction
Date: 5/26/2009 9:19 am
Rating: 13    Rate [

+

|

-

]
Status: Approved

I think the best way we can assure you of your privacy is to remind you that WebGUI is free, open-source software. You can open up the code and see exactly what is sent and how it is sent.

See lib/WebGUI/Workflow/Activity/SendWebguiStats.pm

If you don't trust JT, you can trust me, or any of the other people who will help WebGUI by enabling this and will be checking the code just like I did.

Just one more benefit of free software: The developers can't get away with violating users' privacy.


Re: Next: WebGUI Community Stats
User: koen
Date: 5/26/2009 4:27 pm
Rating: 24    Rate [

+

|

-

]
Status: Approved

Doug, I trust you or any of the people who develop WebGUI.

Given the link to the source you gave me currently this is submitted (in WebGUI 7.7):

  • webguiversion
  • perlversion (major and minor)
  • apacheversion
  • ostype
  • sitename
  • number of users in db
  • number of groups in db
  • number of published assets in db
  • number of packages in db
  • list of asset types in db with the amount of them used in db

Now going into paranoia mode:

This is being sent over http (non encrypted) to www.webgui.org

In theory someone in the middle could find out your exact apache version and perlversion and os and use that data to use a known exploit to hack your system.

/paranoia

Actually I think that I would enable stats if this list would stay the same and it would be sent over https :)

Koen de Jonge - ProcoliX
http://www.procolix.com
Hosting - WebGUI - Virtualization


Re: Next: WebGUI Community Stats
User: JT
Date: 5/26/2009 4:40 pm
Rating: -8    Rate [

+

|

-

]
Status: Approved
> Now going into paranoia mode:
>
> This is being sent over http (non encrypted) to www.webgui.org
>
> In theory someone in the middle could find out your exact apache  
> version and perlversion and os and use that data to use a known  
> exploit to hack your system.
>
> /paranoia
>
> Actually I think that I would enable stats if this list would stay  
> the same and it would be sent over https :)
>

I think you are being way too paranoid. But since there is no harm in  
encrypting the data as it's sent across, I've made the change. 7.7.8  
sends the data over HTTPS.


JT Smith
ph: 703-286-2525 x810
fx: 312-264-5382

Create like a god. Command like a king. Work like a slave.


Re: Next: WebGUI Community Stats
User: koen
Date: 5/26/2009 4:19 pm
Rating: 35    Rate [

+

|

-

]
Status: Approved

First of all: thanks that makes things a lot clearer for me.

By realtime access I mean: can the community have the raw data submitted? Or are you only keeping counters? Perhaps this will become clear when the stats page has something to view on it.

How much more reassured can you be than that the code for both the collection of the data and the submission of the data is open source and reviewable by anyone at any time?

If the data itself is open source, that would help a lot. You could, for example have the raw data downloadable as a csv file.

If you (as a submitter of data) can see the stats grow day by day, that would help too.

Are you logging which IP adresses are submitting data?

How do you plan to deal with delibirate stats poisioning?

Koen de Jonge - ProcoliX
http://www.procolix.com
Hosting - WebGUI - Virtualization


Re: Next: WebGUI Community Stats
User: JT
Date: 5/26/2009 4:45 pm
Rating: 1    Rate [

+

|

-

]
Status: Approved

By realtime access I mean: can the community have the raw data submitted? Or are you only keeping counters? Perhaps this will become clear when the stats page has something to view on it.

I suppose we could allow exporting the raw data, though I'm not sure if it's really useful to anybody for anything other than the statistics and graphs that we're going to publish.

How much more reassured can you be than that the code for both the collection of the data and the submission of the data is open source and reviewable by anyone at any time?

If the data itself is open source, that would help a lot. You could, for example have the raw data downloadable as a csv file.

It would have to be either an XML or JSON file, or a series of CSV files, because it's not stored in a flat table. Also, over time this is going to be gigabytes worth of data. So I'm concerned about the transportability of it.

If you (as a submitter of data) can see the stats grow day by day, that would help too.

The stats are submitted weekly, not daily, but yes you'll be able to see it grow over time.

Are you logging which IP adresses are submitting data?

No.

How do you plan to deal with delibirate stats poisioning?

I don't. Since I'm not allowed to track anything at all about what site the data is coming from, or the user submitting it, there's really not much I can do to stop stats poisoning. Unless you have ideas.



Re: Next: WebGUI Community Stats
User: knowmad
Date: 5/26/2009 10:36 pm
Rating: 3    Rate [

+

|

-

]
Status: Approved

> I don't. Since I'm not allowed to track anything at all about what site the data is coming from,
> or the user submitting it, there's really not much I can do to stop stats poisoning. Unless you
> have ideas.

I'm actually more concerned about this than someone figuring out who is submitting what data. Such poisoning could significantly impact the effectiveness of this data, esp if it were done slowly over time.

One possible solution is to require a registered user id be submitted with the data (no password required). You could then at least match the stats to an account and easily remove that account's stats if it's discovered that this user is submitting tainted data. Of course, you wouldn't show or in any way provide the userid when presenting data (formatted or raw).

As an extra bonus, this technique would allow a user to see just their stats. As an integrator, I'd like to see stats for just my clients. The trending of my stats would be interesting to compare to the community as a whole.

 

William

----
Knowmad Technologies
http://www.knowmad.com


Re: Next: WebGUI Community Stats
User: preaction
Date: 5/27/2009 1:15 am
Rating: -2    Rate [

+

|

-

]
Status: Approved

The idea of tying the results to a username (authenticated using http basic auth, since otherwise I could say I'm you and what would argue with me) would solve the problem, but it would also destroy the anonymity. Perhaps allow it but don't require it, that way it's useful to you if you want to lose that measure of anonymity?

I'm not a statistician, but I believe that most statistical analyses are reported with an error % proportional to the results recieved. A larger number of results would have a smaller margin of error.

That, and what would one gain by skewing the stats? I'm not saying there aren't people who just screw with polls for the sake of screwing with polls, but if we can remove or reduce the benefit and increase the cost to do the screwing, we can prevent enough of the poisoning to be within a standard margin of error.

The only problem there being how to increase the cost of skewing results since we must remain open source. Do it on the server-side and only allow X results per site-hash, Y site-hashes per IP address per week? (so, say 40 sites on one IP can submit stats one time per week, but not 50 sites on one IP)


Re: Next: WebGUI Community Stats
User: knowmad
Date: 5/27/2009 8:32 am
Rating: 10    Rate [

+

|

-

]
Status: Approved

 

Good points about margin of error. Having by-passed stats in school, I'd forgotten about that.

Nonetheless, I like the idea of rate-limiting the stats to x number of submissions per IP per week. We're not hosting more than 2 dozen sites from a single IP so 50 should be adequate for us. Perhaps Alpha-Mega or Procolix have some other ideas.

 

----
Knowmad Technologies
http://www.knowmad.com


Re: Next: WebGUI Community Stats
User: arjan
Date: 5/27/2009 6:12 pm
Rating: -2    Rate [

+

|

-

]
Status: Approved

I think this is a great feature. It's an opportunity for eye-candy as well. I understand the privacy issue that may play a role in some cases, but I would have no problem submitting the ip. I would love to see a Google-map with a lot of dots all over the world, symbolizing WebGUI websites. If it could be a widget, I would put it on my website. I imagine a combination with that screenshot site, so you can hoover over a dot and see a small sceenshot and surf to that site. Surfin' the world of WebGUI.

Kind regards,

Arjan Widlak

United Knowledge
Internet for the public sector

www.unitedknowledge.nl


Re: Next: WebGUI Community Stats
User: koen
Date: 8/5/2009 4:09 pm
Rating: 20    Rate [

+

|

-

]
Status: Approved

Well, perhaps we should not call it a success right now, but at least it is being used and it generates data. Tongue out

Total sites reporting data: 3

Koen de Jonge - ProcoliX
http://www.procolix.com
Hosting - WebGUI - Virtualization


PreviousBackNext