User:Certes/Gene_links References

This page lists gene articles which are not linked from the base name. For example, there is no obvious route from ACR to ACR (gene). FOO is used as a placeholder to denote the base name such as ACR.

Dab missing entry

FOO is (or redirects to) a dab which does not list the gene.

Done Section completed: add an entry for FOO (gene) to existing dab FOO.

ACR
AIC also add Akaike information criterion
BTC
CAMP: CAMP (gene) is a list of enzymes
Rewrote cathelicidin and the enzyme article it linked to, as neither of its two corresponding genes are now called "camP", then redirected CAMP (gene). Seppi333 ( Insert 2¢) 02:50, 30 November 2019 (UTC)
CDNF
CFI
CGB: existing entry displays the gene name but is piped elsewhere
CNO
CROP
CS
CTRC
FAT
GART
GDA
GLA
HAL
MFF
MIA
NPPA
NTM
PGC
PIGS
PLI
Pol
POR
PTLD
REN
SAC
SLN
TAT
UMPS
Y14

Unrelated article with dab

FOO is (or redirects to) an article about an unrelated primary topic. FOO (disambiguation) is (or redirects to) a dab which does not list the gene.

Done: Section completed: add an entry for FOO (gene) to existing dab FOO (disambiguation).

FEMA
PPL

Unrelated article without dab

FOO is (or redirects to) an article about an unrelated topic. FOO (disambiguation) does not exist.

Fix: If the incumbent article is not primary, move it to FOO (topic) and list it along with the gene on a new dab FOO. Check for incoming links to FOO and update these. If the topic is primary but the initials also denote other topics, create FOO (disambiguation). Otherwise, the primary topic article needs a hatnote to the gene.

Done Section complete except for CTU2, which is the actual name of the C16orf84 gene: requesting a second opinion from PamD or Seppi333.

AK1 Done
APOD Done
ASUN Done
BATF Done
BRF1 Done
BRF2 Done
BX3 Done
CCT2 Done
CCT5 Done
CES3 Done
CGB2 Done
CHGA Done
CKLF Done
CLK3 Done
CNN3 Done
CPA4 Done
CRCP Done
CROT Done, though the gene has a claim to be PT
CSF3 Done: retargeted to gene
CSH2 Done
CSN3 Done
CTSH Done
CTU2
DMWD Done
DNA2 Done
Doubletime Done dab page tweaked, gene now linked directly by hatnote
EN1 Done two genes linked in one complex hatnote
ESAM Done new dab page
ESPN Done (mega hatnote now even bigger)
FMOD Done expanded hatnote
GATM Done new dab page
GBAS Done new dab page
GMDS Done new dab page
GPS2 Done new dab page (but querying whether existing redirect was justified)
GPX2 Done new dab page
HPCA Done new dab
HULC Done
IRGC Done
Isomorph Done: added Isomorph (gene) (a classification of mutations) to Isomorphism (disambiguation)
Kaiso Done linked PT to new dab
KMO Done new dab
KYNU Done
MAL2 Done
MLIP Done
MPNS Done
MSLN Done
NAAB Done new dab page
NAGA Done: added NAGA (gene) to dab Naga
NEBL Done
NEMF Done
NFIC Done
NKRF Done
ODAM Done
Paralytic Done P to S
PCTP
PIGN: medical, but probably unrelated to PIGN (gene)
POLA1
POP1
POP4
PPCS
PPIE
PPIG
PREP
PSG1
RARS
RAX
SCEL
SNCB
Spätzle Done P to S
VISA Done
WARS Done
WTAP Done

Enzyme or protein article

FOO describes an enzyme or protein related to FOO (gene) but does not link to the gene.

Fix: Expert advice is needed.

AlkB; AlkB (gene) redirected to a section of AlkB Done Seppi333 ( Insert 2¢) 23:19, 29 November 2019 (UTC)
ANK2; ANK2 (gene) – the latter is a duplicate article of the former created by User:ProteinBoxBot. It should be merged into the former. The sitelink for ANK2 (gene) needs to be moved to ANK2 on wikidata in order to move the {{ Infobox gene}} template when this happens (i.e., the duplicate article has a gene infobox but the primary article does not). Done Seppi333 ( Insert 2¢) 23:23, 29 November 2019 (UTC)
CACNA1B; CACNA1B (gene) Done merged. Seppi333 ( Insert 2¢) 23:44, 29 November 2019 (UTC)
CASP12; CASP12 (gene) Done merged. Seppi333 ( Insert 2¢) 23:44, 29 November 2019 (UTC)
NRXN1; NRXN1 (gene) Done merged. Seppi333 ( Insert 2¢) 23:44, 29 November 2019 (UTC)
SCN1A; SCN1A (gene) Done merged. Seppi333 ( Insert 2¢) 23:54, 29 November 2019 (UTC)
SKP1; SKP1 (gene) Done moved Skp1 to the official UniProt name since that was an incorrectly capitalized gene name, SKP1 and SKP1 (gene) (will) redirect there after 2x redirects are corrected by a bot. Sitelink on wikidata moved to the correct item. Seppi333 ( Insert 2¢) 00:04, 30 November 2019 (UTC)
SPI1; SPI1 (gene) Done - fixed this one earlier. Seppi333 ( Insert 2¢) 23:54, 29 November 2019 (UTC)
TMEM243; TMEM243 (gene) Done moved sitelink and redirected page. Seppi333 ( Insert 2¢) 00:04, 30 November 2019 (UTC)

Miscellaneous

See individual entries for a description of each anomaly.

Fix: Expert advice is needed.

ALG2 is a gene; ALG2 (gene) is a list of enzymes. Fixed
~~See WT:MCB#ALG2 and GDP-Man:Man2GlcNAc2-PP-dolichol alpha-1,6-mannosyltransferase.~~ Deleted that section. The ALG2 gene encodes a protein that belongs to 2 classes of enzymes, so it makes sense to redirect both pages to the gene and list the corresponding enzymes there. Seppi333 ( Insert 2¢) 01:23, 30 November 2019 (UTC)
CFTR: redirects are anomalously titled CFTR(gene) (no space) and Cftr (gene) (lower case). Pending deletion.
This is a fairly widely studied gene due to its central pathophysiological role in cystic fibrosis; CFTR gets a lot of search traffic. I'd suggest deleting CFTR(gene) since it's an erroneous page title that I tried to move to CFTR (gene) with redirect suppression before realizing it already existed. Keeping cftr (gene) seems fine since, while technically incorrect capitalization, it's at least the correct spelling. Addendum: I've PRODed CFTR(gene). Seppi333 ( Insert 2¢) 01:23, 30 November 2019 (UTC)
EYCL1 is a gene; EYCL1 (gene) redirects to a related protein. Pending deletion.
See WT:MCB#Deletion of EYCL1, EYCL1 (gene) and Eye color 1 (green/blue). Seppi333 ( Insert 2¢) 01:46, 30 November 2019 (UTC)
KCTD9 and KCTD9 (gene) may be duplicate articles. Done
LRIF1 and LRIF1 (gene) redirect to articles about different proteins. Fixed
They now go to the same place; one of them went to the wrong page. Seppi333 ( Insert 2¢) 02:00, 30 November 2019 (UTC)
NAGK redirects to one enzyme; NAGK (gene) is a list of enzymes. Fixed
The bacterial enzyme listed on NAGK (gene) had a 2:1 correspondence between gene and enzyme and the corresponding gene also had a different capitalization (NagK), so I redirected it to where NAGK went. Seppi333 ( Insert 2¢) 02:00, 30 November 2019 (UTC)
NFAM1 and NFAM1 (gene) may be duplicate articles. Done
NFATC2IP and NFATC2IP (gene) may be duplicate articles. Done
NMT1 dab and NMT1 (gene) list share the same entries. Fixed
This one was confusing; don't think I've ever seen 2 enzymes associated with a single gene, but since both enzymes are associated with multiple genes, I redirect both pages to the pagename of the protein that the gene encodes and listed both enzymes there. Seppi333 ( Insert 2¢) 03:21, 30 November 2019 (UTC)
TITF1: TITF1 (gene) is a different topic, but is it just a typo for TTF1? Fixed
Hmm. [1] - this is a query in the HGNC database for TITF1. It's an old gene symbol for NKX2-1 (current gene symbol), which is currently known as the "NK2 homeobox 1" gene. Both should redirect to the current gene symbol unless disambiguation at TITF1 is necessary. Changed the TITF1 target. Seppi333 ( Insert 2¢) 00:57, 30 November 2019 (UTC)
WAS: WAS (gene) redirects to Wiskott–Aldrich syndrome protein but that article calls the gene WASp. Fixed Clarified by stating the encoding gene's gene symbol. Seppi333 ( Insert 2¢) 00:57, 30 November 2019 (UTC)

Merged the wikidata sitelinks for NFATC2IP, KCTD9, and NFAM1 and the corresponding (gene) pages. Will deal with the rest a bit later. Seppi333 ( Insert 2¢) 00:10, 30 November 2019 (UTC)

Re- ALG2 (gene): I think it may be worth recoding and rerunning my User:Seppi333/GeneListNLP script to detect/write a list of target pages that are wikilinked from the gene lists and that contain all 5 of the words "Set", "index" "page", "lists", and "articles" on them in order to identify links to set index articles, unless you can locate those with an SQL query. The last time I ran that script, it took 1:33:45 (1.5 hrs) to download and process all the pages, so if it's possible to locate them using another method, it'd probably best to do that instead. Seppi333 ( Insert 2¢) 01:23, 30 November 2019 (UTC)

This PetScan query identifies SIAs linked from gene lists. Certes ( talk) 10:25, 30 November 2019 (UTC)

False positives

FOO links to FOO (gene) (or the target of that redirect) in a complex way not spotted by the Quarry queries.

Fix: probably no action but we may consider a more direct link.

BBC3 Done (improved hatnote to offer direct link to gene)
CAD
Cfr
Dlx
ELO Done (clarified link to gene on dab page)
FARSA Done (retargeted to dab page, no clear PT: shortens route to gene)
Hairy Done (improved hatnote to offer direct link to gene)
KIZ
LAT
LOX Done (clarified link to gene on dab page)
MAFA (possibly a related protein)
MFSD2A and MFSD2A (gene) redirect to the same article.
MIB2
MINA Done (clarified link to gene on dab page)
NES
OSCAR
Pokemon
REST
RHO
Sphinx
Tinman
THEMIS
TOR

Other links

Here are some other link issues raised by the gene lists. They need an expert to fix them because the suggested fix may be wrong, they may indicate wider problems, or the initialism redirect might merit conversion into a dab.

Direct links

The gene lists link directly to a page which is not in gene categories. These fall into two sections.

1. The target page appears not to be a gene. The link needs to be corrected. In each case, incoming links suggest that the non-gene article is the primary topic, but we could consider moving that article and creating a dab.

CHML: List of human protein-coding genes 1 should link to CHML (gene)
DR1: List of human protein-coding genes 1 should link to DR1 (gene)
HPX: List of human protein-coding genes 2 should link to HPX (gene)
PIM2: List of human protein-coding genes 2 ~~and Protein kinase domain~~ should link to PIM2 (gene)

2. The target page appears to be a gene or closely related topic. Links may be correct but the gene page could be added to appropriate gene categories.

Redirects

The gene lists link to a redirect to a page which is not in gene categories.

List of human protein-coding genes 1 links to AAMP, which redirects to unrelated article African American Museum in Philadelphia. They should probably link to AAMP (gene).
List of human protein-coding genes 1 links to CCNC, which redirects to unrelated article Chinese Canadian National Council. They should probably link to CCNC (gene).
List of human protein-coding genes 1 ~~and Cathepsin Z~~ link to CTSW, which redirects to unrelated article Flight Design CT. They should probably link to CTSW (gene).
List of human protein-coding genes 1, ~~Helicase and ZGRF1~~ link to DNA2, which redirects to unrelated article DNA². They should probably link to DNA2 (gene).
List of human protein-coding genes 1 links to EN1, which redirects to unrelated article EN postcode area. They should probably link to EN1 (gene).
List of human protein-coding genes 1 links to EN2, which redirects to unrelated article EN postcode area. They should probably link to EN2 (gene).
List of human protein-coding genes 2 links to ETDA, which redirects to unrelated article Ethylenediaminetetraacetic acid. They should probably link to a new redirect ETDA (gene). Which article should it redirect to?
Going to leave that as a redlink until it's better characterized. Seppi333 ( Insert 2¢) 10:07, 2 December 2019 (UTC)
List of human protein-coding genes 2, ~~Epstein–Barr virus-associated lymphoproliferative diseases, List of OMIM disorder codes and PD-1 and PD-L1 inhibitors~~ link to ICOS, which redirects to article Icos about a genetics company. They should probably link to ICOS (gene).
List of human protein-coding genes 2 ~~and Brpf1~~ link to KAT7, which redirects to unrelated article KAT-7. They should probably link to KAT7 (gene).
List of human protein-coding genes 2, ~~Alpha/beta hydrolase superfamily and Ichthyosis~~ link to LIPN, which redirects to an article Lamellar ichthyosis about a related disease. We may want to link via a new redirect LIPN (gene).
Created Lipase member N and LIPN (gene)
List of human protein-coding genes 2 ~~and CARD domain~~ link to MAVS, which redirects to unrelated article Dallas Mavericks. They should probably link to Mitochondrial antiviral-signaling protein, perhaps via a new redirect MAVS (gene).
List of human protein-coding genes 3 ~~and AAA proteins~~ link to NVL, which redirects to unrelated article Null (SQL). They should probably link to NVL (gene).
List of human protein-coding genes 3 links to OSR2, which redirects to unrelated article Windows 95. It should probably link to OSR2 (gene).
List of human protein-coding genes 3 ~~and several articles~~ link to PIGN, which redirects to unrelated article Acute proliferative glomerulonephritis. They should probably link to PIGN (gene).
List of human protein-coding genes 3 ~~and WD40 repeat~~ link to PLAA, which redirects to unrelated article Poor Law Amendment Act 1834. They should probably link to PLAA (gene).
List of human protein-coding genes 3 ~~and List of OMIM disorder codes~~ link to RHO, which redirects to unrelated article Rho. They should probably link to RHO (gene).
List of human protein-coding genes 3, ~~Cancer syndrome and Housekeeping gene~~ link to SDHC, which redirects to unrelated article SD card. They should probably link to SDHC (gene).
List of human protein-coding genes 4 ~~and several articles~~ link to SYK, which redirects to unrelated article Helsingin Suomalainen Yhteiskoulu. They should probably link either to Syk or to its redirect SYK (gene).

Ahh. I was wondering why my NLP script didn’t locate those... it’s the hatnotes. I should probably reprogram it to fix that bug. Will ~~fix these pages later tonight and~~ (nothing to fix, exception maybe conversion to DABs; I think you guys are better judges of when/how to disambiguate than I though, so I'll leave it to you) revise the wikitables once we locate all these pages. Seppi333 ( Insert 2¢) 02:02, 1 December 2019 (UTC)

Looks like you're right; all of them should link to the SYMBOL (gene) page since those are all the correct articles. I moved the Syk page to the official UniProt name for the protein ( Tyrosine-protein kinase SYK) since the only synonym/alias with a lowercase spelling was "p72-Syk". I'll retarget the links in the gene lists/tables once we find the rest of these since it's much less work for me to add them all at once than piecewise. I can rewrite my script to detect the multi-word expressions used on the hatnote pages and just parse the leads to identify ones like Rho tomorrow since it's fairly easy to code that; but, I get the impression that you're able to identify all of the remaining links to mistargeted by simpler means than downloading and parsing 11500 pages.

Makes me want to learn SQL. What other methods do you use to locate pages like this? I'm really curious now. Seppi333 ( Insert 2¢) 04:51, 1 December 2019 (UTC)

@ Seppi333: In theory I could have located these with SQL. In practice, it might have been too complex to complete within Quarry's 30 minute limit, so I used PetScan instead with a Wikipedia search for incoming links. You mention checking 11,500 pages manually. In a way I've done that check myself, but only on the 30 or so suspicious pages that remained after filtering out cases that the queries suggest to be correct. Certes ( talk) 12:57, 1 December 2019 (UTC)

Oh. Wow, that's a surprisingly useful tool then. The algorithm is actually fully-automated; it basically just iteratively goes through all ~11500 of the blue wikilinks on the four list pages one at a time, loads the page (it takes 1.5 hours to run almost entirely because it has to load 11500 pages; I can't run it on a database dump), and determines whether or not the words "gene", "genes", "protein" or "proteins" are present on the page. It missed most of the links above because those words are in the DAB hatnotes. I hadn't considered that being a possibility when I wrote it. I should have some time to revise both the wikitable script to fix the lists and mistargeted link detection script to do a second check within the next 12-24 hours; shouldn't take that long to do. Seppi333 ( Insert 2¢) 22:00, 1 December 2019 (UTC)

Finding the bad direct links is as simple as this, which takes 4 seconds. There are a few false positives such as Locus (genetics) from wikilinks not in the table, but they're obvious. The links via redirects took a little more fiddling. Certes ( talk) 22:52, 1 December 2019 (UTC)

I'll have to make use of that tool; seems very handy. Going to work on the gene lists now and update it once I'm done. Seppi333 ( Insert 2¢) 10:07, 2 December 2019 (UTC)

Following up, I retargeted the links in the gene lists yesterday. Haven't quite finished reprogramming the other one yet, but will probably be tomorrow. I'll retarget the non-list gene articles with mistargeted links sometime within the next couple of hours.

Assuming neither of us find any additional pages, ~~I suppose we're done.~~ Thanks again for your help.

Edit: I didn't notice the sections above; will get to them after I retarget the links. Seppi333 ( Insert 2¢) 10:04, 3 December 2019 (UTC)

Further progress

@ Seppi333: I've fixed incoming links apart from the gene lists which should link to CHML (gene) rather than CHML, AAMP (gene) rather than AAMP, etc. I see that some of these have been done manually in the lists (though a piped link might be better) but not in the Python. Also, do you have any thoughts about AKNA, CD96 and WRAP53? Certes ( talk) 00:25, 16 December 2019 (UTC)

@ Certes: Hey there! I'm really sorry for falling off the grid after my last reply here; it seems rather rude of me. I've been really busy off-wiki lately and forgot to work on this. My bad about that. I'll go ahead and finish addressing the links above within the next day or so since I now have some time to work on WP. I'll fix AKNA, CD96, and WRAP53 right now though. I only need to adjust their wikidata sitelinks and add {{ infobox gene}} to the article source.

Done

BTW, I finished recoding an updated version of my mistargeted link detection algorithm last week. The updated algorithm is designed to detect the type of mistargeted links you uncovered since I used all of the links that you listed in this section as a sample of testcases; I continually revised the algorithm until it had a 100% detection rate on that sample. This time around, it took 3.5 hours (originally, 1.5 hours) for the algorithm to finish processing all ~12,500 blue wikilinks in the gene lists (LOL). The likely mistargeted links it found are included in the collapse tab below. It found a few more articles with similar issues to the ones that you listed above; these articles would be included in the 2nd list in the tab below. Sometime within the next 24-48 hours, I'll manually go through all the links in the tab below and highlight the mistargeted ones I find. This is probably the last set of links in the gene lists that need to be fixed/retargeted since I think I've accounted for all possible ways that a false negative might occur. Seppi333 ( Insert 2¢) 00:39, 18 December 2019 (UTC)

Output of the updated algorithm – will follow up after I've gone through it and marked the ones that need to be addressed.
I ran the updated algorithm early last week, so you might've already found/fixed some of these. Seppi333 ( Insert 2¢)

Note: immediately after each bulleted entry below, there are two index values listed: i=# and j=#. Index i is the number of distinct gene-related terms that are present in the lead's source code and index j is the number of distinct gene-related terms that are present in the input parameters of the lead's hatnote templates, provided that any were found (NB: there's no entries in either list where one index equal to 0 and the other non-zero).

My original script detected links to articles where none of 4 gene-related terms (i.e., "gene", "genes", "protein", "proteins") were found anywhere in the article's source code (NB: these links would be marked with i=0; j=0 in the 1st list below); the updated version of my algorithm checked the source code of only the lead for 5 word tokens (i.e., the original 4 and "infobox_gene") instead of searching the full article's source code, so there's additional entries in the 1st list below that weren't detected by the original algorithm.

The updated algorithm also listed all articles that included specific gene-related multi-word expressions (i.e., the following phrases: "the gene", "the genes", "the protein", "the proteins", "the enzyme", "the enzymes", "(gene)", "(enzyme)", and "(protein)") in the parameters of certain lead hatnotes if any were present – specifically, the {{ about}} hatnote, {{ for}} hatnote, and the family of redirect hatnotes like {{ redirect}}/{{ RDR}}, {{ redirect2}}, etc.. These new entries are included in the 2nd list below and have corresponding index values of i>0; j>0. If an entry in that list is marked with index values of 0<i<j, it's extremely likely that the link is mistargeted.

Entries in this list are articles where none of these 5 single-word tokens – gene, genes, infobox_gene, protein, proteins – are present in the source code of the article's lead.

en:ABO → en:ABO blood group system; i=0, j=0
en:ALKBH8 → en:TRNA (carboxymethyluridine34-5-O)-methyltransferase; i=0, j=0
en:AMACR → en:Alpha-methylacyl-CoA racemase; i=0, j=0
en:AKR1D1 → en:5β-Reductase; i=0, j=0
en:AGMAT → en:Agmatinase; i=0, j=0
en:ASPA (gene) → en:Aspartoacylase; i=0, j=0
en:BHMT → en:Betaine—homocysteine S-methyltransferase; i=0, j=0
en:BHMT2 → en:Betaine—homocysteine S-methyltransferase; i=0, j=0
en:CERK → en:Ceramide kinase; i=0, j=0
en:COLEC12 → en:Collectin; i=0, j=0
en:CYP19A1 → en:Aromatase; i=0, j=0
en:DDT; i=0, j=0
en:GYG1 → en:Glycogenin; i=0, j=0
en:GYS2 → en:Glycogen synthase; i=0, j=0
en:HIBCH → en:3-hydroxyisobutyryl-CoA hydrolase; i=0, j=0
en:INMT → en:Amine N-methyltransferase; i=0, j=0
en:IPMK → en:Inositol-polyphosphate multikinase; i=0, j=0
en:IVD (gene) → en:Isovaleryl-CoA dehydrogenase; i=0, j=0
en:LZTR1; i=0, j=0
en:MGME1; i=0, j=0
en:MTHFS → en:5-formyltetrahydrofolate cyclo-ligase; i=0, j=0
en:MYO1G; i=0, j=0
en:NFIB (gene); i=0, j=0
en:OXT (gene) → en:Oxytocin#Synthesis, storage, and release; i=0, j=0
en:PCCB → en:Propionyl-CoA carboxylase; i=0, j=0
en:POR (gene) → en:Cytochrome P450 reductase; i=0, j=0
en:PPCS (gene) → en:Phosphopantothenate—cysteine ligase; i=0, j=0
en:PRSS56; i=0, j=0
en:PSTK → en:O-phosphoseryl-tRNASec kinase; i=0, j=0
en:PSTPIP2; i=0, j=0
en:SHMT1 → en:Serine hydroxymethyltransferase; i=0, j=0
en:SHMT2 → en:Serine hydroxymethyltransferase; i=0, j=0

Entries in this list are articles where one or more of these 5 single-word tokens – gene, genes, infobox_gene, protein, proteins – are present in the source code of the article's lead (index i is the count of how many distinct tokens were found, so if the word gene is repeated 2+ times in the lead and none of the other word tokens were found, the linked entry would have an index value of i=1) AND one or more of the following tokenized multi-word expressions – the gene, the genes, (gene), the protein, the proteins, (protein), the enzyme, the enzymes, (enzyme) – are present in the parameter inputs of an {{ about}}, {{ for}}, or redirect-type hatnote template that the algorithm found in the lead (index j is the count of the number of distinct aforementioned expressions that were detected in the hatnote's parameter inputs):

en:ADAM22; i=5, j=1
en:ADAM7; i=4, j=1
en:AHR → en:Aryl hydrocarbon receptor; i=3, j=2
en:AIP (gene) → en:AH receptor-interacting protein; i=3, j=2
en:ALB (gene) → en:Serum albumin; i=4, j=1
en:ATM (gene) → en:ATM serine/threonine kinase; i=4, j=1
en:BAK1 → en:Bcl-2 homologous antagonist killer; i=3, j=1
en:BAMBI; i=3, j=1
en:BCAM → en:Basal cell adhesion molecule; i=2, j=1
en:BMI1; i=3, j=1
en:BTD → en:Biotinidase; i=2, j=1
en:CA12; i=2, j=1
en:CDKN1A → en:P21; i=3, j=2
en:CHL1; i=3, j=1
en:CHML; i=1, j=2
en:CISH; i=3, j=2
en:CLN3; i=3, j=1
en:CTBS; i=4, j=1
en:CWC15; i=4, j=1
en:CYBB → en:NOX2; i=3, j=1
en:DBR1; i=3, j=1
en:DBX2; i=3, j=1
en:DCP2; i=4, j=2
en:DHPS; i=4, j=2
en:DOCK10 → en:Dock10; i=3, j=1
en:DR1; i=1, j=2
en:DTNB; i=3, j=2
en:F5 (gene) → en:Factor V; i=2, j=1
en:F8 (gene) → en:Factor VIII; i=3, j=1
en:FANCI; i=5, j=1
en:GNMT; i=1, j=1
en:GRID2; i=3, j=1
en:HBA2 → en:Hemoglobin, alpha 2; i=2, j=1
en:HIRA; i=3, j=1
en:HK2; i=2, j=1
en:HSPA14; i=3, j=1
en:ID1; i=5, j=1
en:IL31 → en:Interleukin 31; i=3, j=1
en:IL32 → en:Interleukin 32; i=3, j=1
en:IL33 → en:Interleukin 33; i=3, j=1
en:IL34 → en:Interleukin 34; i=2, j=1
en:INS (gene) → en:Insulin; i=3, j=1
en:KAT6B → en:MYST4; i=5, j=1
en:LAD1 → en:Ladinin 1; i=3, j=1
en:LCTL; i=2, j=1
en:MAGEC2; i=4, j=1
en:MAP6; i=4, j=1
en:MIS12; i=5, j=1
en:MKX; i=4, j=1
en:MTA1; i=4, j=1
en:MTA2; i=5, j=1
en:MTA3; i=5, j=1
en:NONO → en:NONO (protein); i=4, j=2
en:NPW; i=3, j=1
en:OSR1; i=5, j=1
en:PEPD; i=2, j=1
en:PIM2; i=1, j=2
en:PKIA; i=3, j=2
en:PNN (gene) → en:Pinin; i=3, j=2
en:POLA1 → en:DNA polymerase alpha catalytic subunit; i=2, j=2
en:PRCP; i=3, j=1
en:PRND; i=3, j=1
en:PTK2; i=3, j=1
en:PTX3; i=4, j=2
en:RAB3A; i=3, j=1
en:RAI1; i=1, j=1
en:RHOB; i=3, j=1
en:RHO (gene) → en:Rhodopsin; i=2, j=1
en:RP1; i=3, j=1
en:RRAS; i=4, j=1
en:RTL1; i=2, j=1
en:SAT2; i=3, j=1
en:SCN1A → en:Nav1.1; i=3, j=2
en:SERPINA7 → en:Thyroxine-binding globulin; i=2, j=1
en:SMCP; i=3, j=1
en:ST13; i=4, j=2
en:ST14; i=3, j=2
en:ST7; i=3, j=2
en:TCF4; i=3, j=1
en:TFPT; i=3, j=1
en:TOX; i=5, j=1
en:TRA (gene); i=4, j=3
en:ZFX; i=3, j=1
en:ZP2; i=4, j=1

Also, thank you so much for helping me find and address the problematic links in the gene lists! I can't adequately express just how much I appreciate your assistance thus far.

If it weren't for you, several dozen links in the gene lists probably would've continued to point to the wrong articles since I don't think I would've realized the issues with the original algorithm that were producing false negatives. Seppi333 ( Insert 2¢) 00:46, 18 December 2019 (UTC)

No problem: there is no deadline and we all have things to do offline, especially in December. I'm happy to have helped but have probably done all I can for now. I think the only outstanding issue not mentioned above is cases like CTU2, where the base name leads to a rather flimsy non-gene primary topic and we need either a {{ redirect}} hatnote or a two-entry dab. (I'm not sure which is better.) However, I think all the wikilinks now lead to the right destination even in those cases. We've made a lot of improvements and it looks as if the job's almost complete. Certes ( talk) 01:26, 18 December 2019 (UTC)

I went through all the links and fixed problems that I found. In addition to the 4 you identified ( CHML, DR1, HPX, and PIM2), it looks like only DDT is new. I'll fix these links in the lists shortly. Seppi333 ( Insert 2¢) 15:58, 24 December 2019 (UTC)

@ Seppi333: I missed DDT because it's in Category:Nonsteroidal antiandrogens, a subcategory of Hormones, which I viewed as legitimate link targets. When I stopped excluding Hormones from my Petscan query, DDT appeared and nothing else did, so I don't see any similar cases. Most links to the pesticide seem correct but please can you fix the Python for List of human protein-coding genes 1 and check Protein design, which should perhaps link to DDT (gene) instead? Certes ( talk) 16:23, 24 December 2019 (UTC)

Looks like the DDT link in protein design is correctly targeted; had to read the paper to verify which page to link to (quote: Then they synthesized the 24-mcr (MIF1RPNVGAMSNFYHYPNIIIII:) designed to form a four-stranded 13-sheet and to bind the insecticide DDT. It did indeed...). Working on recoding the python script for the list pages right now. Seppi333 ( Insert 2¢) 17:23, 24 December 2019 (UTC)

Done The lists have been updated with piped links for these genes. Seppi333 ( Insert 2¢) 18:48, 24 December 2019 (UTC)