From Wikipedia, the free encyclopedia

whitehouse.gov

A lot of whitehouse.gov links have died after the domain recently "changed owner". A rare occasion where many Wikipedians may be glad for sources dying. There is an archive at https://trumpwhitehouse.archives.gov. Example of old broken and new working url:

There is a slim chance/risk that some of the broken links will work again in about four years. Some whitehouse.gov links are working and should not be changed. Can a bot sort it out? PrimeHunter ( talk) 13:09, 25 February 2021 (UTC)

Some older source links are archived at https://obamawhitehouse.archives.gov or https://georgewbush-whitehouse.archives.gov.
Obama example of broken and working link:
Bush example of broken and working link:
Some links work via redirects:
redirects to
https://www.archives.gov/presidential-libraries/archived-websites also mentions Clinton archives. The newest is https://clintonwhitehouse5.archives.gov/ from January 2001. I don't know whether we have broken links it could fix.
A bot could test every whitehouse.gov link to see whether it works now or at any of the archives. PrimeHunter ( talk) 14:02, 25 February 2021 (UTC)
OK, based on your research, I agree it's worth exploring to see how well it works. Will take a look. -- Green C 14:25, 25 February 2021 (UTC)
  • Results: modified 8,263 URLs in 5,060 articles. Changed metadata info such as |work=whitehouse.gov. Plus other general fixes by WaybackMedic. Matter of curiosity: 67% were found by the scanning method described above and the rest had working redirects in the header. Most of the working redirects were Obama, Trump had a high proportion of 404s and no redirects, perhaps poorly maintained and/or too soon after leaving office. Also some pages (10%?) can't be archived by any web archive service, they just don't work, there is something in the page that prevents web archiving by third parties but regardless they still work at the National Archives. @ PrimeHunter: -- Green C 16:46, 3 March 2021 (UTC)
@ GreenC: Great! Thanks a lot. Do you have a list of broken links which couldn't be fixed? I noticed one in [1]: https://www.whitehouse.gov/the-press-office/2013/05/20/president-obama-announces-sally-ride-recipient-presidential-medal-freedom. It redirects but the target doesn't work. Thanks for checking the redirect didn't help. It turned out to be our own fault. The real link [2] didn't have a final m which was added by a careless editor in [3], so there is no general fix we can learn from that. PrimeHunter ( talk) 22:30, 3 March 2021 (UTC)
There were 30: Wikipedia:Link rot/cases/whitehouse.gov -- Green C 22:55, 3 March 2021 (UTC)
@ GreenC: Thanks. That's a nice low number. I have fixed many of them with guessing or Googling without finding a system. Some were clearly our own fault with url's that never would have worked. Should I remove the fixed ones from Wikipedia:Link rot/cases/whitehouse.gov? PrimeHunter ( talk) 02:21, 4 March 2021 (UTC)
Yes about 0.5% of the whitehouse URLs is explainable by local data entry or remote site errors, it's probably better than one might expect. It's a good idea to check for, and great you were able to fix some. Use the page any way you like, markup or delete entries. -- Green C 03:12, 4 March 2021 (UTC)

Replace atimes.com links

Please replace all instances of atimes.com and its subdomains with asiatimes.com. The old website is replaced by an advertising site. ~ Ase1este charge-parity time 10:11, 28 February 2021 (UTC)

Also, if the corresponding page with the new domain is not found, not archived, and there is an archive with the old domain, then do not replace the URL, but add the archive link and mark the URL status as unfit. Thanks. ~ Ase1este charge-parity time 10:26, 28 February 2021 (UTC)
Ok. It might take a couple passes, first to move the domain where possible, and second to add the archives+unfit for the remainder. Still working on the whitehouse.gov above could be a few days at least. -- Green C 15:46, 28 February 2021 (UTC)
Ok, thanks, I can wait. ~ Ase1este charge-parity time 17:42, 28 February 2021 (UTC)

Results:

  • 287 URLs changed from atimes.com to asiatimes.com
  • 1,995 URLs converted to archives including |url-status=unfit. Includes CS1|2, square and bare links
  • 3 URLs had no archives (in Peter Heehs, Thaksin Shinawatra, Iran–Saudi Arabia relations). Added {{ dead link}}. Need manual attention.
  • 11 citations converted from [square link]{{ webarchive}} to {{ cite web}} with |url-status=unfit.
  • 1 URL in File: space
  • Domain status set to 'Blacklisted' in the IABot database.

@ Aseleste: I think that is all, if you see anything else let me know. -- Green C 04:23, 6 March 2021 (UTC)

Looks good, thanks! ~ Ase1este charge-parity time 04:28, 6 March 2021 (UTC)

observer.com

I found many broken links to www.observer.com: some (but not all) of these links no longer lead to the articles that were originally cited. Jarble ( talk) 21:04, 13 February 2021 (UTC)

Since this is a mix of live and dead probably better to leave it for IABot which should be able to detect the dead. -- Green C 03:19, 14 February 2021 (UTC)
@ GreenC: IABot won't detect them. I tried running IABot on this page, but the link is still incorrect. Jarble ( talk) 21:35, 11 March 2021 (UTC)

IABot won't work. It's pretty complex. First impression is anything "https" is OK. Anything "http" without a hostname is also OK. That narrows it down to about a thousand possible trouble URLs. Of these, some work and some don't. Some are also redircting to spam links needing |url-status=unfit. There are patterns, but also exceptions. I might need to make a dry run, log what it does, build rules to take into account the mistakes, then make a live run. Hard to say up front what the rules should be. Will take some time to figure out, there are a lot of variables. -- Green C 01:45, 12 March 2021 (UTC)

Results

The rest were already archived or still working or now tagged with {{ dead link}}. Once the soft404 redirects were identified it was not too difficult. If you see any problems let me know. @ Jarble: -- Green C 21:39, 13 March 2021 (UTC)