From Wikipedia, the free encyclopedia

IndiaGlitz

Many pre-2010 links no longer seem to redirect to their new ones, like this does not take us to this. The links may therefore be tagged as dead or archived. Kailash29792 (talk) 05:20, 17 March 2023 (UTC)

This is done. Took a while to figure out, lot of soft-404s. The IABot database is also updated, so those URLs will be repaired on other wikis also. -- Green C 15:43, 20 March 2023 (UTC)

Fix links to Aus honours site

I started making these changes, but there is too many for non-bot edits. In addition, in the citations, it might be worth removing the link to the Internet archive and removing the parameter with the status of a dead link.

Starting point:

  • basic search with category (there is more without the category filter).
  • Search pattern: ([^/])http:\/\/(?:www\.)?itsanhonour\.gov\.au\/honours\/\w+\/.+?aus_award_id=([0-9]+)[&\w=]+
  • Replacement code: $1https://honours.pmc.gov.au/honours/awards/$2

Nux ( talk) 20:21, 12 March 2023 (UTC)

Nux, the bot successfully converted 5,777 links to honours.pmc.gov.au .. two articles alone contained about 2,000 of the links: Australian knights and dames and List of recipients of the Bravery Medal (Australia) .. it edited 2,390 articles overall. Thanks for the post, this would have been impossible manually. If you see any problems let me know. -- Green C 14:18, 22 March 2023 (UTC)

Library and Archives Canada

Hello. There's currently a notice at https://www.bac-lac.gc.ca/eng/discover/films-videos-sound-recordings/Pages/films-videos-sound-recordings.aspx saying they're moving the content to their new website at https://library-archives.canada.ca/eng. I would like to request an archive of any bac-lac.gc.ca links just in case. Here are some examples. Thanks! MrLinkinPark333 ( talk) 23:08, 26 March 2023 (UTC)

This is actually a web archive provider: Wikipedia:List_of_web_archives_on_Wikipedia#Canada_(bac-lac.gc.ca). I think we'll have to wait for the move to see what the new URL structure is, then modify existing links to the new site. Hopefully a simple change of https://www.bac-lac.gc.ca/eng to https://library-archives.canada.ca/eng -- Green C 23:26, 26 March 2023 (UTC)

us.archive.org links

We have hundreds of articles with links of the sort https://ia800102.us.archive.org/0/items/gandhiwieldsweap00shar/gandhiwieldsweap00shar.pdf , which break over time; it should be https://archive.org/download/gandhiwieldsweap00shar/gandhiwieldsweap00shar.pdf . The replacement is from https?://ia[0-9]+.us.archive.org/[0-9]+/items/(.+) to https://archive.org/download/\1. Nemo 21:42, 31 January 2023 (UTC)

@ Nemo bis,  Done via JWB. I accidentally had global regex matching disabled and made several edits to some articles, oops :) Best, EpicPupper ( talk) 02:11, 17 February 2023 (UTC)
Thanks! Search results have not updated yet, so there are still almost 1200 matches. Some of these seem real, for example [1] on Zürich. I've not used JWT, but if it finds articles by regex search you might need to switch to a broader search, because I'm not sure regex search is working right now.
Also, [2] was not updated on Troy. If you're using a space delimiter after the URL match, you may need to use \b instead. Nemo 06:15, 17 February 2023 (UTC)
This kind of work is more complicate then it might seem there are different file types (pdf, txt, wav etc) that should be handled differently, and different types of file structures at IA to deal with. it took me a while to find all the issues and develop code for it that gives reasonable results. It's not a simple search-replace thought sometimes it is. I have developed code for it and can process them, but some people get upset about it, so I have been hesitant to try and fix them all in one go, and just catch them incidentally when the bot happens to process an article with one there. Possibly I could modify it to only process when the link is dead so there is no question that something should be fixed. -- Green C 14:30, 15 March 2023 (UTC)
Handled differently how? Do you have an example? The links in the form https?://ia[0-9]+.us.archive.org/[0-9]+/items/(.+) always go to the mere file download just like the /download/ URL. The trouble begins if you start operating on the various other kinds of URLs which might include things like us.archive.org/view_archive.php or various views like /stream/ . I'm not proposing a complete normalisation of all IA URLs, only the /download/ ones which are definitely going to break. Nemo 07:14, 23 March 2023 (UTC)
It's in about 1,300 pages. I'm running it now, for links that are already dead. The others will rerun periodically after they die. Some users don't get it, I don't want to deal with them unless the link is already dead in which case they have no basis. -- Green C 16:13, 23 March 2023 (UTC)
This is done, about 7% of the links were dead and converted. -- Green C 01:56, 24 March 2023 (UTC)
Thanks! Nemo 06:20, 27 March 2023 (UTC)
Periodic required.