Consider looking for
related projects for help or ask at the Teahouse. If you are not currently a project participant and wish to help you may still participate in the project. This
status should be changed if collaborative activity resumes.
This
WikiProject aims primarily to design, implement, and discuss the collection of statistics about Wikipedia content, metacontent, contributors, and visitors. We seek to better understand how people use Wikipedia and its community, and what is most useful to them. We also seek to explore new ways of streamlining the generation of timely statistics.
Who contributes to Wikipedia, when during the day/week, and how often?
What causes sudden spikes in readers, contributors, vandals?
Are there patterns in the contributions? E.g. age, gender, race and nationality versus categories?
What motivated the top contributors? E.g. repute,
reciprocity, altruism, relationships, roles? Free content, neutrality, software design, democracy, community, others?
How are the quality, validity and reliability of content maintained? By whom, and to what extent?
How does server load contribute to activity of users? in the hours/days after a slowdown?
Where (on Earth!) are the contributors? Are contributors to en.Wikipedia in English speaking countries, Spanish/Portuguese lang. contributors in Iberia or Latin America or elsewhere, German lang. contributors in German, Austria, Switz. or elsewhere, etc.
How have changes to
Recent Changes page and Main historically affected user clickthroughs from those pages?
How often do anonymous visitors/readers (or visitors from Google/Yahoo) visit pages like RC, Random, the Community Portal?
What are the readers' ratings of the quality or usefulness of each page?
Curtailing Mischief
How can we quantify vandalism? Trolling?
How many admins are online at a given time?
How does the # online relate to the amount of
vandalism that takes place?
Are vandals deterred by quick response times?
How effective are bans and blocks? How often do vandals come back right away as anons or with another ip?
What is the average block length? How does the block length change from editors to IPs?
What is the median time-to-correction for acts of vandalism? (Recent study:
Vandalism Survival.)
Processes
How do different people add content? <-- what does this mean (other than Edit This Page)? Elaboration needed.
Slow vs. fast contributors; people who write offline vs. online
How many use offline editors, and upload in blocks?
How many people migrate content from other free repositories to WM sites?
photos, text (to commons, source)
Methodology
This section should cover how the research data will be collected and analysed, and not Wikipedia context or processes (moved to above section).
Data Collection
Webalizer statistics
Add optional fields in every member's profile form for age, gender, race, nationality (perhaps with a privacy option - so system can collect data, but not visible to general public)
Polls for all in Community Portal
Surveys/Interviews of top contributors
Constructs needed for different motivational factor
Define & select uniform data structures and software (SPSS, SAS)
Define variables
Outcome measures
Correlational designs
t-tests
ANOVA/MANOVA (for correlational data)
Post-hoc statistics (LSDs, Fischers)
Factor analysis
Non-parametric measures (Chi-Square)
Caveats?
Privacy
Possible solution: Constrain to publicly available data; and, if private data must ever be used, absolutely no personally-identifiable info.
Consent to participate in certain surveys
Possible solution: Avoid experimental setups, and avoid self-response surveys, as self-response is frequently difficult to gauge at times, as well. However, properly structured, anonymous polls that have pretty much no chance of "psychological trauma" or whatnot are probably safe :P
Feedback effects of certain metrics (edit #) via social loops (people editing for the sake of edit count)
Possible solution/offset: Effect interactions betw. edit count/other factors; analysis of random sample of RfA fails vs. successes and method of analyzing primary rationale of voters?