Measuring Participation Inequality in Social Networks
This YouMoz entry was submitted by one of our community members. The author’s views are entirely their own (excluding an unlikely case of hypnosis) and may not reflect the views of Moz.
This all started sometime around 2001 -- I originally heard of the Gini coefficient freshman year in college during one of those massive lecture courses of Economics 101. From Wikipedia:
The Gini coefficient is a measure of inequality of income distribution or inequality of wealth distribution. It is defined as a ratio with values between 0 and 1: 0 corresponds to perfect equality (e.g. everyone has the same income) and 1 corresponds to perfect inequality (e.g. one person has all the income, while everyone else has zero income).
It bounced back in my brain one day a while back as I overheard someone lamenting the 90-9-1 rule of online participation: that 90% of your users will be "lurkers," those who read but don't contribute, 9% will contribute sporadically or only occasionally, and 1% of your entire user base will make up the bulk of the total participation in your community.
Some people like to use the 90-9-1 rule to boo-hoo any attempt at building an online community, some like to do a little math and say "hey, 1% of my total user base is still a big number if they really do become outspoken evangelists" -- but everyone is always looking for a way to break the rule and encourage widespread participation.
But how do we create a metric that allows us to track the ROI of our efforts to increase participation? We can build our own Gini-like metric ....
WARNING: this is a long one but if you stick with me, I bet you're going to start thinking about measuring online communities in a different way.
In most communities, I encourage point systems driven by participation -- leave a comment, get a point, write a blog, get a point -- sometimes certain activities are worth more points(be careful when doing this), and always, the community itself has an effect on the total score: for instance, write a defamatory comment, get negative points from other users and your total score drops. Another choice we often have to make is to decide whether or not to make the score visible to the community -- it almost always encourages competition between users, which in some communities is perfect and in others, can lead to negative behaviors. Digg, for instance, used a visible participation score and it led to the top users wielding too much influence over the entire community -- which fostered a drop in the quality of the content.
Regardless of how visible we make the score, we, as the community organizers, can use it in all manner of ways. In this example, we can use the score to compare the participation of users across the entire community to determine the distribution of participation and build a dynamic metric we can track over time -- just like economists use the Gini coefficient to measure income distribution.
In statistics, what we're looking for is called statistical dispersion -- how far data elements fall from each other or a mean value. In our case, a perfectly distributed community would all have the same participation points, or each member would have the same number of points as the total community points divided by the number of members.
The perfectly distributed community would look like:
User1: 500 Points, User2: 500 points, User3: 500 points and so on... Everyone is participating equally.
But we know that's not how it looks in real communities, we're much more likely to see:
User1: 0 points, User2: 0 points, User 3: 5 points, User 4: 500 points... Participation is very unequally dispersed.
And we also know that as participation grows increasingly less equal, we see new entrants into the community drop-off more quickly and even older members fade away -- as good community managers, we look out for this type of activity, but it would be extremely beneficial to have a dashboard of quantitative data to back up our qualitative assumptions.
To solve this, in short terms, I start by running a calculation on each user to find the average deviation, also known as the absolute deviation, from the mean (or ideal mean) of the community. Once I know this, I take the coefficient of the variance, which is the average deviation divided by the mean, times 100% which gives us the deviation as a percentage of the mean. Understand? Good, cause I just confused myself.
Okay, I'll show my work!
Let's start with a community:
community 1 points User 1 50 User 2 4 User 3 6 User 4 18
Total Points: 78 Mean (or perfect score): 19.5 points
The average deviation of the group is 15.25. On average, each score is 15.25 units away from the mean.
Taking the coefficient of variance, 15.25/19.5 x 100% = 78.21% -- which means, the average deviation is 78.21% of the mean -- or, the participation in this community is largely unequal.
Unequal compared to what? I'm glad you asked!
Let's look at another community:
community 2 points User 1 8 User 2 7 User 3 9 User 4 10Total Points: 34 Mean (or perfect score): 8.5 points
The average deviation of the group is 1. On average, each score is 1 point away from the mean.
Taking the coefficient of variance, 1/8.5 x 100% = 11.76% -- which means, the average deviation is 11.76% of the mean -- or the participation in this community is more equally distributed than community 1.
How can we use this? Each period, we can track the change in our coefficient to see if the participation in the community has grown more or less equally distributed, and on what scale the change has occurred. We shouldn't use this metric by itself, of course, it's also necessary to see the overall growth of participation -- by total number of points -- which we can also segment by our user types or buying segments that we've already constructed beforehand.Imagine now as you deploy an online community, you can track the distribution of participation from the very start -- and as you see more users register on the site and as you attempt to push more of them to contribute more often -- you now have a metric ready at your side to measure the effectiveness of each new campaign.* I'm no statistician, and I built this model on my own. One reason I'm putting it out there is genuinely for information share, but I'd also love to kick start a conversation about measuring the equality of participation. Give it a thought.
Comments
Please keep your comments TAGFEE by following the community etiquette
Comments are closed. Got a burning question? Head to our Q&A section to start a new conversation.