From Wikipedia, the free encyclopedia

Adapting catwatch.js

Dear Ais523,

I'm The Transhumanist, one of the acting curators over at the Portals WikiProject.

We've run into a brick wall, and we need your help.

After years of being dormant, the Portals WikiProject was jump started this past April, has grown to almost 100 members, and is in the process of revamping the entire portal system. We are going all out, and have a very active and productive development team. You can see the progress we've made so far in the project's newsletter archive. The excitement level is high.

While many portals will continue to be maintained by hand, we are attempting to build a portal model that is entirely automated, as most of the portals do not have active maintainers. There are currently about 1500 portals, of which about 10% are actively editor-maintained.

So far, we've managed to automate 3 standard portal sections, and have gotten 4 others to semi-automated status. In other words, 3 sections are now virtually maintenance-free. It would be nice if the other ones can be too.

Which brings us to the brick wall...

We have been trying to use Lua (modules) to automate certain portal sections with a method known as selective transclusion. Here is an example of a "Selected item" section using the {{ Transclude random excerpt}} template:

Selected amphibian type

Various types of frog

A frog is any member of a diverse and largely carnivorous group of short-bodied, tailless amphibians composing the order Anura (ἀνούρα, literally without tail in Ancient Greek). The oldest fossil "proto-frog" Triadobatrachus is known from the Early Triassic of Madagascar, but molecular clock dating suggests their split from other amphibians may extend further back to the Permian, 265 million years ago. Frogs are widely distributed, ranging from the tropics to subarctic regions, but the greatest concentration of species diversity is in tropical rainforest. Frogs account for around 88% of extant amphibian species. They are also one of the five most diverse vertebrate orders. Warty frog species tend to be called toads, but the distinction between frogs and toads is informal, not from taxonomy or evolutionary history.

An adult frog has a stout body, protruding eyes, anteriorly-attached tongue, limbs folded underneath, and no tail (the tail of tailed frogs is an extension of the male cloaca). Frogs have glandular skin, with secretions ranging from distasteful to toxic. Their skin varies in colour from well- camouflaged dappled brown, grey and green to vivid patterns of bright red or yellow and black to show toxicity and ward off predators. Adult frogs live in fresh water and on dry land; some species are adapted for living underground or in trees.

Frogs typically lay their eggs in water. The eggs hatch into aquatic larvae called tadpoles that have tails and internal gills. They have highly specialized rasping mouth parts suitable for herbivorous, omnivorous or planktivorous diets. The life cycle is completed when they metamorphose into adults. A few species deposit eggs on land or bypass the tadpole stage. Adult frogs generally have a carnivorous diet consisting of small invertebrates, but omnivorous species exist and a few feed on plant matter. Frog skin has a rich microbiome which is important to their health. Frogs are extremely efficient at converting what they eat into body mass. They are an important food source for predators and part of the food web dynamics of many of the world's ecosystems. The skin is semi-permeable, making them susceptible to dehydration, so they either live in moist places or have special adaptations to deal with dry habitats. Frogs produce a wide range of vocalizations, particularly in their breeding season, and exhibit many different kinds of complex behaviors to attract mates, to fend off predators and to generally survive. ( Full article...)

It displays an excerpt randomly, taken from one of the pages from its internal list, each time the portal page is purged.

The above template requires that the user supply the names of the pages to be transcluded. That makes this only a semi-automated solution. The sections using this rotate material, and therefore don't go stale as fast as a section with a single excerpt, but, adding new article titles by hand is how they are updated, and unfortunately, that method is not scalable.

We've been banging our heads against the brick wall to find a way to pull the names from a category with Lua, so that portals using these templates will be auto-updating (as the category grows, so would the selection in the portal accessing that category), but the generated nature of categories and the limitations of Lua prevent that.

Which means we must find a way other than Lua to generate a list of the pages from a category. Since JavaScript can make a wide range of api calls, that looks like the way we will have to go. (See https://en.wikipedia.org/w/api.php?action=help&recursivesubmodules=1&modules=main#query+categorymembers`). (That's how I found catwatch and you: via the search string "categorymembers").

It seems very likely that Catwatch.js could be adapted to do this for Selected article sections.

One thing catwatch does is store the category names in a js page, formatted in JavaScript to be run as part of the program. Instead, I'd like to store the category names in the Selected article sections themselves, such as in a hidden comment.

And, rather than add new category members to one's watchlist, I want to add them to the Selected article section as parameters to the transclusion template there, just like

| Frog
| Toad
| Salamander
| Caecilian 

are included in the example section provided above.

Any assistance or guidance you could provide would be most helpful.

I look forward to your reply. Sincerely,    — The Transhumanist   21:51, 5 July 2018 (UTC)

P.S.: I've pinged @ TheDJ and Evad37: TheDJ, because the script mentioned he worked on it, and Evad, as he has been working on this problem and also has familiarity with JS. -TT

Replied on User talk:The Transhumanist. -- ais523 05:38, 6 July 2018 ( U T C)
One of the issues with user scripts is that, by definition, they only apply to a single user. So you can't use a user script to do something like extract category entries to be displayed on a page for anonymous users; you could do it so that you saw excerpts yourself, but that would defeat the likely point of the project (to update the portal pages so that everyone sees them).
There's another issue with using JavaScript for this, too; being a client-side technology (i.e. it runs in the end user's browser), it runs entirely after all the server-side technology (i.e. things that run on Wikimedia's servers), meaning that you couldn't use JavaScript "live" to produce information that was passed to a Lua module. This applies to all uses of JavaScript (there are two main uses of it in Wikipedia, user scripts and MediaWiki:common.js, although I suspect that you'd have problems convincing people to change the latter for this purpose as not all users will run it; many users have JavaScript disabled by default, and there are possible performance issues too).
It seems like the correct solution to the problem, therefore, is not to try to work out the list of category entries when a user views a page (which is what the attempted Lua solution would do, and what you seem to have been assuming the JavaScript solution would do). Rather, what you want to do is automatically update lists of category entries somewhere that a Lua module can see them. That sounds like a job for a bot, rather than a script. I don't have much experience with fully automated bots from the point of view of writing them (I ran a bot in the past but it was basically an account with a very specific set of user scripts); you'd probably want to use an existing bot framework for something like this, and it would probably be worth contacting someone with more experience in those.
One other possibility would be to ask the developers to make the task you're aiming for easier. The category membership table is fairly quick to query behind the scenes (and in fact, Special:RandomInCategory exists but I haven't found any way to make its random selection feature accessible from Lua or wikitext). The best/simplest solution to this problem would be for MediaWiki to implement a magic word or ParserFunction which returns a randomly selected page from a category, and given that it would obviously improve the wiki (and use much less resources than a bot or script would!), it seems reasonable that such a request might be accepted. WP:VPT might be a good place to talk about this. -- ais523 05:37, 6 July 2018 ( U T C)
Thank you for your input. Some very good ideas there.
My understanding of going through Phabricator is that it is hit and miss and a lot of waiting and wondering. So, while I won't rule out that as an option, I'd like to pursue this avenue of development as well.
I wasn't thinking about a viewing script doing its thing (pulling category members) in real time for everyone. What I would like to do is write a program to edit the portal pages, updating them for everyone's benefit. I'm sorry my explanation wasn't clearer.
You are right that I want to update lists of category entries somewhere that a Lua module can see them. That is exactly what I was talking about. The place I want those lists is as template parameters right in the portal's wikicode. The templates use Lua, which in turn use the parameters as input. Here is an example with a list already inserted — read the embedded comment for clarification:
{{Box-header|title=Selected frog article|EDIT=yes|noedit=yes}}
{{Transclude random excerpt| paragraphs=1-3 | files=1 | more=
 | <!-- The script I want would list the category members below like this: -->
 | Archaeobatrachia
 | Mesobatrachia
 | Neobatrachia
 | Alsodidae
 | Alytidae
 | Amietia
 }}
{{Box-footer}}
The actual section that this example was taken from is at Portal:Amphibians, and includes over fifty frog articles. There is also a toad section, a salamander section, and a caecilian section, each with their own list. Now you can see why we don't want to have to insert the article names manually.
I don't mind if the program is a bot, written in JavaScript, or a userscript providing a menu item. It would be easier to develop it as the latter first, to allow for more interactive development and testing (with multiple users) before making a fully automated (bot) version.
You mentioned "work out the list of category entries". That's what I need to know how to do. I can handle menu items (see SearchSuite.js), and editing wikicode (see RedlinksRemover.js). It's what goes in between (fetching category members and putting them into a variable) that I'm having trouble with.
Therefore, I have some questions for you...
How does catwatch fetch the members of a category?
How does catwatch process those?
How does catwatch insert the result into a page? (I understand that this is not an edit page, but I don't recognize the methods being used, and I would like to grok this program.)
Any enlightenment you could provide concerning these mysteries would be very helpful.
I look forward to your replies.    — The Transhumanist   08:49, 6 July 2018 (UTC)
Oh, in that case, I have some bad news for you: catwatch doesn't fetch the members of a category. Doing so would be much too inefficient to do on every page load if the category were large. (There are some people who catwatch Category:Living people!) It's fetching only the most recently recategorised member of the category, via an API search of the category links table. (Incidentally, now that MediaWiki has its own category-watchlisting functionality, it would be possible, and probably more efficient, to use the watchlist table instead. catwatch is mostly redundant at this point.) As such, no processing is actually needed; we ask MediaWiki for the most recently recategorised page in the category, then simply display that to the user.
For your purposes, you'd probably want to record the time at which the script was last run. You could then query the categorylinks table for pages in the category that were recategorised since your last run of the script. Unfortunately, that would only spot insertions, not removals (as a page that was removed from the category will no longer be in the category); note that catwatch currently can't spot removal of a page from a category either. (Perhaps your script would also better work off a watchlist than a category table! Especially if you use a bot account to update the portals, even if the bot's implemented using user scripts, you could put all the categories in question – and nothing else – onto the bot account's watchlist, meaning that a watchlist query would be a very efficient way to get a list of exactly what you needed to update and on which pages.)
Inserting the result into a page is done by editing the page's HTML once it loads. You can use a very similar technique to edit the edit box of a page; simply find the edit box via its ID, then edit its value. Presumably you'd want some comments in the page specifying which bit gets updated by the bot; then it could split the string into three pieces using the comments in question and update just the second piece. -- ais523 09:05, 6 July 2018 ( U T C)
You've given me a very good idea. Some portal sections have hundreds of articles in their transclusion templates. If a bot were to service the portals daily, it could select a random 10 or so articles and put those in as parameters (instead of hundreds). That way Lua wouldn't be processing so much on each view, and readers would still have the benefit of fresh material day after day.
Fetching the most recent categorymembers sounds interesting, but updated goes both ways. We would definitely want to remove articles from the portals that got removed from the categories. That would require a reading of the entire category.
I didn't understand your explanation about a bot's watchpage being "a very efficient way to get a list of exactly what you needed to update and on which pages." There's 1500 portals (likely to grow to 7500), and what would we be watchlisting for those exactly?
Where in the catwatch program do you edit the page's HTML? I'd like to see how that is done.
It looks like I'm going to have to read the MediaWiki manual, for general programming support. What parts should I start with? Over time, what parts will I likely find most useful?
I look forward to your replies.    — The Transhumanist   21:41, 6 July 2018 (UTC)
@ The Transhumanist: The idea is that if the only thing on your watchlist is categories (not hard if you're using a bot account – bots don't normally use their watchlist for anything), and you turn off the "Hide: page categorization" option (which is actually off by default if you're using the API rather than the MediaWiki interface), then the watchlist will entirely be a list of additions to and removals from the categories you're looking at. So it'd be a perfect "todo list" of exactly what changes were needed.
I can see why you might want to scan the whole category instead, though. But scanning through the entirety of 7500 categories seems quite painful on the MediaWiki servers unless they're all pretty small. (This is one of the reasons I suggested a change to the software: "random member of category" should be more efficient than "entire contents of category" followed by randomizing, as less information will have to be transmitted.) Now that I write that, I wonder if Special:RecentChangesLinked might not be the best option – getting the most recently changed pages in a category would be an interesting substitute for random that might well have a bias we're interested in – but I can't find a query module corresponding to it in the API documentation.
For what it's worth, the MediaWiki servers allow most users to view 500 results per query, and running faster than one query per second used to be looked down on (you can possibly go faster nowadays if you're setting maxlag to a very small value, i.e. telling the server not to answer if it's busy; I'm not up to date on the current policy for keeping the servers happy). You can go up to 5000 with a bot flag, but part of the purpose behind the bot flag is so that 'crats can tell the software "we checked that this bot isn't going to burn out the servers, feel free to disable some of the safety checks", and I'm not sure if you could honestly set that if you're viewing large numbers of pages within 7500 different categories on a regular basis. (Probably many of the categories will be small enough that 500 is reasonable, but I'd imagine that some of them would be much larger.)
If you're interested in where catwatch itself edits the HTML, you can look at uses of "innerHTML" in the JavaScript file. You wouldn't be doing that to edit an edit box, though, so you might be able to learn only a limited amount from it. Editing the content of edit boxes in Wikipedia is no different from editing them anywhere else than Wikipedia, so I'd recommend a general-purpose HTML/JavaScript reference if you're interested in this; for example, Mozilla's documentation on HTMLTextAreaElement describes the things you can do with a <textarea> element (such as the edit box). In this case, you'd be changing its value to change the text inside it.
Even though, I think I'd recommend against the "bot written in JavaScript" method of keeping the portals updated. Something like 7500 edits per day would be a very high editing rate, one that would likely need a lot of special precautions to make work (and if you're taking a random sample of pages and editing it into every portal every day, that's the number of pages that will end up changing). That'd cause a huge amount of bloat in database tables and be a lot of extra effort for the servers on both the reads and the writes. (By comparison, User:ClueBot NG had less than 1000 edits yesterday.) When you're working at this sort of scale, a solution that runs on the servers themselves is likely to be the only reasonable option. -- ais523 05:01, 7 July 2018 ( U T C)
Concerning the category watchlist thing, the part I don't get is, how is it good for servicing more than one section of one portal? We'll be servicing thousands. Each portal could have two or more sections populated by category members, and some of those sections would be filled with the members from more than one category.
(I'm out of time, I'll have to answer the rest of your post later. In the meantime, I look forward to your reply).    — The Transhumanist   21:29, 7 July 2018 (UTC)
In terms of putting categories on watchlists, you'd put every category that you wanted to update a portal for on the bot's watchlist. Its watchlist (set to show multiple edits to a page) would then be a list of all the portal edits that were needed since last time you looked at it. However, something running server-side has surely got to be better. -- ais523 22:03, 7 July 2018 ( U T C)
You've lost me. What purpose would those serve all mixed together? How would you make use of (or even differentiate) the ones for a particular portal?    — The Transhumanist   08:10, 8 July 2018 (UTC)
The watchlist shows both which page was recategorised, and which category it was added to or removed from. -- ais523 10:18, 8 July 2018 ( U T C)
Who can run things server-side? What are our server-side options?    — The Transhumanist   08:10, 8 July 2018 (UTC)
The server-side options are mostly a) changes to the software to add features (i.e. via Phabricator), at which point anyone can use them; b) the toolserver, which gets a little out of date sometimes but has a read-only copy of Wikipedia that it can do fairly complex queries on. (I guess a third option would involve database dumps but there's no dump for the category members table specifically, so it's unlikely to work out well.) -- ais523 10:18, 8 July 2018 ( U T C)

ArbCom 2018 election voter message

Hello, Ais523. Voting in the 2018 Arbitration Committee elections is now open until 23.59 on Sunday, 2 December. All users who registered an account before Sunday, 28 October 2018, made at least 150 mainspace edits before Thursday, 1 November 2018 and are not currently blocked are eligible to vote. Users with alternate accounts may only vote once.

The Arbitration Committee is the panel of editors responsible for conducting the Wikipedia arbitration process. It has the authority to impose binding solutions to disputes between editors, primarily for serious conduct disputes the community has been unable to resolve. This includes the authority to impose site bans, topic bans, editing restrictions, and other measures needed to maintain our editing environment. The arbitration policy describes the Committee's roles and responsibilities in greater detail.

If you wish to participate in the 2018 election, please review the candidates and submit your choices on the voting page. MediaWiki message delivery ( talk) 18:42, 19 November 2018 (UTC)

ArbCom 2018 election voter message

Hello, Ais523. Voting in the 2018 Arbitration Committee elections is now open until 23.59 on Sunday, 3 December. All users who registered an account before Sunday, 28 October 2018, made at least 150 mainspace edits before Thursday, 1 November 2018 and are not currently blocked are eligible to vote. Users with alternate accounts may only vote once.

The Arbitration Committee is the panel of editors responsible for conducting the Wikipedia arbitration process. It has the authority to impose binding solutions to disputes between editors, primarily for serious conduct disputes the community has been unable to resolve. This includes the authority to impose site bans, topic bans, editing restrictions, and other measures needed to maintain our editing environment. The arbitration policy describes the Committee's roles and responsibilities in greater detail.

If you wish to participate in the 2018 election, please review the candidates and submit your choices on the voting page. MediaWiki message delivery ( talk) 18:42, 19 November 2018 (UTC)

Notice

The file File:Wikipedia user ais523's edit times.png has been proposed for deletion because of the following concern:

Orphaned userspace image

While all constructive contributions to Wikipedia are appreciated, pages may be deleted for any of several reasons.

You may prevent the proposed deletion by removing the {{proposed deletion/dated files}} notice, but please explain why in your edit summary or on the file's talk page.

Please consider addressing the issues raised. Removing {{proposed deletion/dated files}} will stop the proposed deletion process, but other deletion processes exist. In particular, the speedy deletion process can result in deletion without discussion, and files for discussion allows discussion to reach consensus for deletion. -- TheImaCow ( talk) 20:09, 22 January 2021 (UTC)

Speedied. -- ais523 22:15, 22 January 2021 ( U T C)
Actually, unspeeded; this is being used in someone else's userspace ( User:Penubag/optimum_toolsets), so it isn't a valid speedy. -- ais523 22:17, 22 January 2021 ( U T C)
Notice

The file File:Horizontal list in IE6.png has been proposed for deletion because of the following concern:

Orphaned image, no context to determine possible future use.

While all constructive contributions to Wikipedia are appreciated, pages may be deleted for any of several reasons.

You may prevent the proposed deletion by removing the {{proposed deletion/dated files}} notice, but please explain why in your edit summary or on the file's talk page.

Please consider addressing the issues raised. Removing {{proposed deletion/dated files}} will stop the proposed deletion process, but other deletion processes exist. In particular, the speedy deletion process can result in deletion without discussion, and files for discussion allows discussion to reach consensus for deletion. -- TheImaCow ( talk) 20:15, 22 January 2021 (UTC)

Replied at User talk:TheImaCow/Archive/2021/February#Prodding_images_that_were_wikilinked_from_archived_discussions. -- ais523 22:15, 22 January 2021 ( U T C)

Category:Fair use tag needs updating has been nominated for deletion

Category:Fair use tag needs updating has been nominated for deletion. A discussion is taking place to decide whether this proposal complies with the categorization guidelines. If you would like to participate in the discussion, you are invited to add your comments at the category's entry on the categories for discussion page. Thank you. Dylsss( talk contribs) 01:30, 14 April 2021 (UTC)

Category:Pages where template include size is exceeded has been nominated for renaming. A discussion is taking place to decide whether this proposal complies with the categorization guidelines. If you would like to participate in the discussion, you are invited to add your comments at the category's entry on the categories for discussion page. Thank you. * Pppery * it has begun... 17:10, 7 May 2021 (UTC)

ArbCom 2021 Elections voter message

Hello! Voting in the 2021 Arbitration Committee elections is now open until 23:59 (UTC) on Monday, 6 December 2021. All eligible users are allowed to vote. Users with alternate accounts may only vote once.

The Arbitration Committee is the panel of editors responsible for conducting the Wikipedia arbitration process. It has the authority to impose binding solutions to disputes between editors, primarily for serious conduct disputes the community has been unable to resolve. This includes the authority to impose site bans, topic bans, editing restrictions, and other measures needed to maintain our editing environment. The arbitration policy describes the Committee's roles and responsibilities in greater detail.

If you wish to participate in the 2021 election, please review the candidates and submit your choices on the voting page. If you no longer wish to receive these messages, you may add {{ NoACEMM}} to your user talk page. MediaWiki message delivery ( talk) 00:15, 23 November 2021 (UTC)

Your signature

Hi,

I got a bit confused when WMF's reply tool did not show up next to your signature, and then I realized you customized the timestamp with clever links to your talk and contribution pages. As clever as I find it, unfortunately it messes with tools like WMF's reply tool and may mess with bots that use timestamps to archive comments.

If you could find some other customization that would be just as clever but not as disruptive to these tools, that would be wonderful. Cheers! Aasim - Herrscher of Wikis 02:08, 7 July 2022 (UTC)

@ Awesome Aasim: It's been like this since 2006, and it took many years for it to start breaking things. Even just a couple of years ago it generally only broke at Miscellany for deletion, but I guess I have to give up entirely at this point now that the WMF have broken it on basically every page (and I'm probably going to forget to type the fourth tilde quite frequently in the future!). -- ais523 02:15, 7 July 2022 (UTC)
@ Ais523 Sorry I had to tell you this! I really thought your signature was some of the most clever I have ever seen in history of Wikipedia signatures. Maybe it lives on in the form of some other funny tweak (before the timestamp of course) :D Aasim - Herrscher of Wikis 02:20, 7 July 2022 (UTC)

A blast from the past

... reading your username on the technical village pump again! Also, while I'm here, if you remember the incident in question, you might find my most recent comment in this talk page section interesting. Graham 87 02:50, 7 July 2022 (UTC)

Thanks for your response!

Thank you for your marvelous response to my question regarding how to request that a page be edited! I completely understand now. You also gave me the little bit of extra confidence I needed to try actually improving the page - and maybe others too - myself. Thanks for that! I suppose that, secretly, hoping to edit pages is why I registered an account in the first place. Cheers! Ubadubba ( talk) 19:18, 8 July 2022 (UTC)