This
user account is a
botoperated by
Dvandersluis (
talk).It is used to make repetitive
automated or
semi-automated edits that would be extremely tedious to do manually, in accordance with the
bot policy. This bot does not yet have the approval of the community, or approval has been withdrawn or expired, and therefore shouldn't be making edits that appear to be unassisted except in the operator's or its own user and user talk space. Administrators: if this bot is making edits that appear to be unassisted to pages not in the operator's or its own userspace, please
block it.
The purpose of this bot is to automatically update various statistics-related pages on Wikipedia that would be tedious if done by a human. There are currently three tasks, defined in detail below, that this bot will do:
This task was requested by
User:Mike Christie, and is outlined in detail on
User:Mike Christie/GACbot. The purpose of this task is to compile a statistical report on
Wikipedia:Good article candidates in order to aid the maintainers of that page to identify certain trends. As well, the bot will update the
GAC backlog template with the oldest five nominations. The bot will also generate a special page employing
ParserFunctions to allow for its transclusion for access to specific statistics, needed on other templates (rather than editing complex templates itself).
This task puts as little strain as possible on the Wikipedia servers. While a number of sub-tasks are being performed by the bot, only one page is required to be fetched in order to provide the necessary data. As well, the bot will only write to a minimal number of pages (currently three).
The bot starts at Wikipedia:Good article candidates and downloads that page's wikitext. Using special comments inserted into the page, the bot isolates the section of the page containing the nominations.
The bot will immediately abort if the page is not downloaded correctly, if the nomination section cannot be detected, or if the bot is unable to successfully login to Wikipedia. This would most likely be caused by a timeout on the bot's part, or a change in the format of the GAC page.
Using a series of regular expressions, the bot parses the page into an object of nested nomination categories and nominations. All pertinent information to be used later is stored within the object:
Nominator and nomination date.
Length status, if available.
On hold status, if applicable, along with the user who placed the article on hold, and the timestamp of the status change.
Under review status, if applicable, along with the user who is reviewing the article, and the timestamp of the status change.
Any malformations to the nomination detected during the parse.
Once the bot has the necessary data, it formulates a report of the data. The report will be written on Wikipedia:Good article candidates/Report. The report currently consists of four sections:
Old nominations report: a list of the oldest 10 unreviewed nominations, sorted by age.
Backlog count: a daily list of how many total articles are listed for GAC, how many are on hold or under review.
Exception report: a list of unexpected or undesirable issues.
Summary: a list by category, showing some nominations statistics in each category.
The bot will update Good article candidates/backlog/items with the oldest five nominations, for use in the backlog template.
The bot will finally update Template:GACstats. This page will allow other templates/pages to quickly acquire information from the GAC report without having to be updated specifically by the bot; rather, they would add transclude the page with a certain parameter, per statistic.
The final version of this bot was 3.0.1, updated
April 132009.
Detailed description
The bot starts at Category:Cleanup by month and collects the categories (listed under the Subcategories section on that page), named "Cleanup from {MONTH} {YEAR}", that contain pages needing cleanup.
Each category page is inspected, and the number of pages in that category is calculated:
The bot looks for the string "There are ## pages in this section of this category." at the top of the "Pages in category..." section on each category page, and keeps track of that number.
The bot will follow "(next 200)" links on category pages in order to get the complete count for the category.
Pages in subcategories are not counted twice.
Pages of the form Wikipedia:Cleanup/<MONTH> are ignored for counting purposes, as they are not truly in need of cleanup, but rather information pages about what needs cleanup.
The bot repeats the previous process, using the subcategories on Category:Music cleanup by month. This step is currently being skipped, as no such categories currently exist. If they are ever recreated, the bot will continue counting them.
The bot will immediately abort if a count of 0 is returned for any category (as this is an impossibility and means that the bot had trouble parsing a page, or, more likely,
timed out while trying to do so).
If the bot successfully retrieved information from each category, it will pull the total number of articles from
Special:Statistics.
The bot keeps track of the elapsed time and number of pages processed. On average, a successful run takes about three minutes, and processes less than one hundred pages.
Proposed future tasks
League of Copyeditors progress template
This task has not yet been started, but is an indication of future ideas for the bot.
WikiProject League of Copyeditors maintains a template,
Template:Copyedit progress, that tracks the project's progress of copyediting tagged articles. At present, it is manually updated, but this is a long process. This task, as done by the bot, would parse the
proofreading page, count the completed proofreads, and update the template.
The League of Copyeditors has
changed its name, and its needs are changing a bit too. We desperately need a process that does almost exactly what this bot does for GAN. I have written specifications based on the original specs here:
User:Noraft/GOCEbot. Maybe it wouldn't be too much tweaking to get this bot going on that project. We're doing a backlog elimination drive May 1, and it would be awesome if it was running by then (don't know if your schedule permits for that, though). Anyway, thanks for the bot at GAN. Works awesome!
ɳorɑfʈ Talk! 14:31, 14 April 2010 (UTC)