This is an archive of past discussions. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the
current talk page.
With apologies for the delay, I've now finished wrangling with various new credential protocols and have pulled the latest version of the bot -- with many long-anticipated bug fixes -- onto
the production site. Hopefully this will work for all with no glitches, but being realistic, please do raise any issues either here as usual, or (if the issue relates to the implementation, i.e. the service being unavailable) try raising a
GitHub issue, which may catch my attention more punctually. Please do let me know how yous all get on! In particular, if a reported bug is now fixed, please do mark it as such by setting its status to {{fixed}}.
Martin(
Smith609 –
Talk) 07:09, 23 July 2018 (UTC)
The bot replaced |translator-first= and |translator-last= with |inventor-first= and |inventor-last=, which isn't recognized by {{
cite book}}, nor is correct in this situation.
What should happen
The bot should not replace human added |translator-first= and |translator-last= with other parameters
The long supported |vauthors= produces clean metadata while the deprecated |authors= does not.
Boghog (
talk) 06:05, 24 July 2018 (UTC)
I note a recent
discussion where this behavior was mentioned with a question about whether this is the desired behavior.
Boghog (
talk) 06:29, 24 July 2018 (UTC)
The bot replaces |access-date= and |dead-url= with |accessdate= and |deadurl=. Both are accpeted - however, access-date and dead-url are prefered per template documentation.
What should happen
The bot should not replace parameters with other parameters with/without a hyphen.
User:AManWithNoPlan has kindly added new parameters to the bot's dictionary. I've pulled through this update now, so hopefully replacement of unrecognized parameters will no longer be an issue.
Martin(
Smith609 –
Talk) 16:29, 24 July 2018 (UTC)
Europe PubMed Central is a mirror of
PubMed Central. |pmc= links the title of the article to the relevant page on PubMed Central. Adding the redundant |url= replaces the already linked title with a link to a mirror site.
Boghog (
talk) 06:21, 24 July 2018 (UTC)
Inconvenience: Humans must occasionally make immediate edits to clean up after the bot
What happens
Some academic journals are also simultaneously book series. When a citation is made to a book in such a series using the citation template with the contribution/title/series parameters (for the title of the paper, title of the book, and title of the series) it is incorrect to add a duplicate journal parameter with the same value as the series. This creates a faulty citation, because the citation template does not allow both contribution and title in citations with nonempty journal parameters, and also because the series parameter means something different in citations with a journal. In the linked case, the citation was already correct as it stands. It would also work to use title/department/journal instead of contribution/title/series, but the bot's choice of contribution/title/journal is just broken.
it is better to omit |url= so that user expectation (that the citation title links to a source that can be read) is not confounded; users can get to Googlebooks through |isbn=978-3-527-30673-2 and its link through
Special:BookSources.
Is there a cross-Wikipedia consensus on this? I can see editors becoming upset if links that they have added are removed by an automatic process.
Martin(
Smith609 –
Talk) 16:35, 24 July 2018 (UTC)
There is no consensus to remove urls to google books information page. However, the bot should not add the links to all cite books without a url either. (
t)
Josve05a (
c) 16:40, 24 July 2018 (UTC)
Why does the bot suddenly add links to google books out of nowhere? That should not be done. Headbomb {
t ·
c ·
p ·
b} 11:42, 24 July 2018 (UTC)
I'd consider this one an urgent fix now thing so if that could be deployed that would be great. Everything else is relatively minor, but
Quark, for instance required massive cleanup because of this
[2]. Headbomb {
t ·
c ·
p ·
b} 18:59, 24 July 2018 (UTC)
Adding |url= when the cs1|2 template has |title-link= will produce the same undesirable results. I have not seen this, but when fixing this bug, you might check to make sure that the bot does not add |url= when |title-link= is set.
Either the deploy failed or the issue is not resolved correctly. This bot edit, three hours after the above deployment notice, adds superfluous google books links; one of which broke an existing
citation template.
@
AManWithNoPlan: not sure what that does exactly, but the net should be cast as wide as possible for anything that trigger an upgrade from cite arxiv to cite journal/cite conference/cite book (ISBN, Bibcodes, PMID, PMC, etc... if those apply) Headbomb {
t ·
c ·
p ·
b} 21:25, 24 July 2018 (UTC)
It catches all cite webs and cite arxiv that do not already have a doi.
AManWithNoPlan (
talk) 23:28, 24 July 2018 (UTC)
@
AManWithNoPlan: what happens if the preprint is published, but without a doi but other identifiers, like bibcodes? Headbomb {
t ·
c ·
p ·
b} 12:56, 25 July 2018 (UTC)
Not sure. Do you have an example to test. I think that you have to go through the DOI database first.
AManWithNoPlan (
talk) 13:25, 25 July 2018 (UTC)
arXiv:
1010.0278 says it's published in "Notices Amer. Math. Soc., 58(3):434-437, 2011" The metadata is poor, and the upgrade from arxiv to journal is messy
[3], but it's an example of where it could be done in theory. There are better examples out there, with better metadata, so I'll keep looking for those. Headbomb {
t ·
c ·
p ·
b} 13:43, 25 July 2018 (UTC)
Neither one of those cases has a DOI to be found using the ARXIV database
AManWithNoPlan (
talk) 15:26, 25 July 2018 (UTC)
Deleterious: Human-input data is deleted or articles are otherwise significantly affected.
What happens
BOT assisted edit at
M32p deleted the journal article name and replaced it with a nonsense journal article name, deleted the authors, deleted the journal volume, issue, publication date
I suggest that the bot crosscheck PMID, arXiv and bibcode against the DOI to see if the DOi is faulty. If all other uses match against each other, and the DOI doesn't then the DOI is in error.
I think this is very local to OUP manuscripts, and it's probably just simpler to check that the DOI info does not resolve to a pre-production placeholder thing. Headbomb {
t ·
c ·
p ·
b} 13:00, 25 July 2018 (UTC)
Just to clarify. I deleted the title and the authors, everything in fact, since it was poorly-formatted and generating CS1 errors. Then used the bot to recreate the citation. So the bot didn't do anything too radical like overwriting good info with bad, but it did pick up the wrong title as described.
Lithopsian (
talk) 20:03, 25 July 2018 (UTC)
{{fixed}} we will add more checking as more oddities are found
AManWithNoPlan (
talk) 12:49, 26 July 2018 (UTC)
This is because of the new code that allows DOI information to override Arxiv information. I know how to fix this. The citation forgets and the remembers the year. I need to change it to a placeholder and the change it back or delete it
AManWithNoPlan (
talk) 19:48, 26 July 2018 (UTC)
Personally I love the new functionality. I'll be very sad to see it go. Headbomb {
t ·
c ·
p ·
b} 14:27, 28 July 2018 (UTC)
@
Headbomb: You want the bot to remove publisher fields from the citation if manually provided? Why? (
t)
Josve05a (
c) 21:02, 28 July 2018 (UTC)
This is NOT a new feature, it has been highly regarded for a long time. People seem to think that providing a published is too much information. Also, that changes over time and is generally not useful. I have written the code, but it is not in because of lack of agreement.
Well, it is a manually entered field, and the cite template had been changed to allow for both journal and publisher now, so consusnss over at the template's talk page seem to be to allow bot fields. (
t)
Josve05a (
c) 09:27, 29 July 2018 (UTC)
This should also be reported to the CS1 people too so they can have the templates do this just like they convert dashes?
AManWithNoPlan (
talk) 13:56, 27 July 2018 (UTC)
Headbomb {
t ·
c ·
p ·
b} 02:20, 28 July 2018 (UTC)
Type of bug
Inconvenience/Cosmetic
What happens
converts |journal=Historical Biology: An International Journal of Paleobiology to |journal=Historical Biology: an International Journal of Paleobiology
The bot is makeing the page better, but you are right it could do more ; especially if the ASIN is an ISBN
AManWithNoPlan (
talk) 02:32, 28 July 2018 (UTC)
It's better yes, but then another edit needs to be made (
User:CitationCleanerBot will cleanup what it can every now and then). The bot should also remove asin when isbn is present in general, the link-->asin is just an intermediate step. Headbomb {
t ·
c ·
p ·
b} 02:53, 28 July 2018 (UTC)
It seems to me that perhaps only if the asin is the same as the isbn.
AManWithNoPlan (
talk) 02:34, 29 July 2018 (UTC)
It should straight up be removed. ASIN / amazon links should only be used when there's nothing else. See
Help:CS1#Identifiers, ASIN section, or
CitationCleanerBot 3. Headbomb {
t ·
c ·
p ·
b} 04:27, 30 July 2018 (UTC)
A few subtilities here. Links with ASINs starting with letters / ASINs starting with letters should also be removed when ISBNs exist, or converted to |ASIN= when no ISBNs are set. If there is no ISBN, ASIN staring with numbers should be converted to ISBNs when possible (however those starting with |asin=630... aren't ISBNs). Headbomb {
t ·
c ·
p ·
b} 15:44, 30 July 2018 (UTC)
I updated the code. If there is an ISBN, then ignore ASIN. If the ASIN is an ISBN then add as ISBN, if not then add as ASIN.
AManWithNoPlan (
talk) 17:32, 30 July 2018 (UTC)
That doesn't sound right. I think it should be: if there is an ISBN or OCLC, remove the ASIN. If there is no ISBN and the ASIN starts with a letter or 630, leave the ASIN alone. If there is no ISBN and the ASIN is a valid ISBN, move the ASIN to |ISBN=. –
Jonesey95 (
talk) 17:40, 30 July 2018 (UTC)
It looks like it is all good now.
AManWithNoPlan (
talk) 19:32, 30 July 2018 (UTC)
Do we know for certain that 630-series numbers are not isbns? Have the isbn people given that series over to amazon? If there is some sort of official acknowledgement that 630-series numbers are not isbns (even though they validate as isbn numbers) then perhaps cs1|2 should stop adding articles to
Category:CS1 maint: ASIN uses ISBN when |asin= holds a 630-series number. Similarly, the documentation for |asin= should be updated to recognize the 630 series.
Not that I'm aware. Doesn't mean that such a thing doesn't exist though, just that I never found it. There is
List of ISBN identifier groups, however.Headbomb {
t ·
c ·
p ·
b} 11:25, 31 July 2018 (UTC)
Lowercase (but first-letter capital allowed after a . or :)
a
an
el
de
la
le
für
of
on
the
van
von
Some of the lowercase ones can be confused with abbreviations/other words. Headbomb {
t ·
c ·
p ·
b} 05:08, 4 August 2018 (UTC)
Upon further review, I think one of the main issues is when the journal is wikilinked, the bot goes cray with capitalization. Headbomb {
t ·
c ·
p ·
b} 06:01, 4 August 2018 (UTC)
Do you have an example of Wikilinks? We do not touch those. I really wish the databases we query actually formatted the titles right.
AManWithNoPlan (
talk) 13:15, 4 August 2018 (UTC)
Examples of wikilinks:
[9] (at the very bottom) and
[10] (look for Agricultural and Forest Meteorology and Proceedings of the National Academy of Sciences of the USA). Headbomb {
t ·
c ·
p ·
b} 15:56, 4 August 2018 (UTC)
With this edit, citation bot converted this somewhat correct template:
{{Citation|title=Reauthorizing the Elementary and Secondary Education Act|url=https://dx.doi.org/10.1057/9781137030931.0011|work=President Obama and Education Reform|publisher=Palgrave Macmillan|isbn=9781137030931|access-date=2018-07-09}}
{{Citation|work=President Obama and Education Reform|publisher=Palgrave Macmillan|isbn=9781137030931|doi=10.1057/9781137030931.0011|chapter=Reauthorizing the Elementary and Secondary Education Act|title = President Obama and Education Reform|year = 2012}}
The bot should have removed |work= when it added |chapter= because |work= (and its alias) is the mechanism that switches {{
citation}} from 'book style' to 'periodical style'.
We can't proceed until
Agreement on the best solution
Perhaps just delete |work= when empty or when has chapter and work is equal to series, journal, title, chapter, or publisher.
AManWithNoPlan (
talk) 16:39, 2 August 2018 (UTC)
Changing to {{
cite book}} wouldn't fix the problem for two reasons:
the bot created a new |title= by copying content from |work= and retained |work= so now we have redundant information in the rendered citation:
{{Cite book|work=President Obama and Education Reform|publisher=Palgrave Macmillan|isbn=9781137030931|doi=10.1057/9781137030931.0011|chapter=Reauthorizing the Elementary and Secondary Education Act|title = President Obama and Education Reform|year = 2012}}
style change from cs2 to cs1; and if there were short-form references depending on the automatic CITEREF links created by {{citation}}, those links are now broken
Good points. The real problem is that citation templates have so many parameters that are almost the same but not the same. We cannot fix that. It seems that we could implement code that checks for |work= and if the new title/chapter/publisher/journal matches it then drop it.
AManWithNoPlan (
talk) 17:09, 2 August 2018 (UTC)
In cs1|2 the internal parameter is Periodical. Any of |journal=, |newspaper=, |magazine=, |work=, |website=, |periodical=, |encyclopedia=, |encyclopaedia=, |dictionary=, |mailinglist= are aliases that feed into that internal parameter so all of them generally act the same.
Module:Citation/CS1 does look at the names that were used in the template source because for {{citation}} the name of the parameter gives a clue to how the citation should be rendered. For example, when the source for Periodical is |journal=, Module:Citation/CS1 knows to render |volume=, |issue=, and |page(s)= using academic journal style and to emit the journal style COinS metadata. {{citation}} balks at the combination of any Periodical parameter in the presence of any Chapter alias. In the example template, copying the content of a Periodical alias to |title= should blank the Periodical alias so that {{citation}} isn't confused.
just for the record is copying nothing: it just finds the same string again in its database search
AManWithNoPlan (
talk) 00:27, 3 August 2018 (UTC)
Just need some code that notices if work===title and such and the deletes work. Case insensitive of course.
AManWithNoPlan (
talk) 00:30, 3 August 2018 (UTC)
Really? What if work and title are off by one character because of a typo or whatever? If the bot is correcting a malformed citation, as it attempted to do in this example, and ends up with a configuration that is not supported then perhaps the correct response is to do nothing.
I noticed this because the referenced edit caused a url–wikilink conflict error. The original template has an inappropriate wikilink in |title=:
{{cite journal | doi = 10.1671/0272-4634(2002)022[0058:ADATDF]2.0.CO;2 | last1 = Lamanna | first1 = M.C. | last2 = Martinez | first2 = R.D. | last3 = Smith | first3 = J.B. | year = 2002 | title = A definitive abelisaurid theropod dinosaur from the early Late Cretaceous of [[Patagonia]]". | url = | journal = Journal of Vertebrate Paleontology | volume = 22 | issue = 1| pages = 58–69 }}
Lamanna, M.C.; Martinez, R.D.; Smith, J.B. (2002). "A definitive abelisaurid theropod dinosaur from the early Late Cretaceous of
Patagonia"". Journal of Vertebrate Paleontology. 22 (1): 58–69.
doi:
10.1671/0272-4634(2002)022[0058:ADATDF]2.0.CO;2.
From that, the bot made this:
{{cite journal | doi = 10.1671/0272-4634(2002)022[0058:ADATDF]2.0.CO;2 | last1 = Lamanna | first1 = M.C. | last2 = Martinez | first2 = R.D. | last3 = Smith | first3 = J.B. | year = 2002 | title = A definitive abelisaurid theropod dinosaur from the early Late Cretaceous of [[Patagonia]]" | url = http://www.bioone.org/doi/pdf/10.4202/app.00132.2014| journal = Journal of Vertebrate Paleontology | volume = 22 | issue = 1| pages = 58–69 | format = Full text }}
If you follow the doi you get to the article that matches the bibliographic data. If you follow the title-link you end up at a vaguely related article (they are both about abelisaurids) that does not match the bibliographic data.
The value in the original |title= is malformed: it has a wikilink (it shouldn't) and it has extraneous punctuation (the single unmatched double quote mark and a period – neither of which belong there). Still, the bot should not be adding a url when |title= is wikilinked either explicitly (has wikilink markup) or indirectly by |title-link=, or has wikilinks (which are almost always inappropriate). It could be argued that, for |title= parameters with single-word wikilink markup, the markup should be removed. More difficult to know what to do with wikilinks in the form [[target|label]] because this form of wikilink is commonly used when linking to sources at, for example, wikisource.
The problem is that it tries to add as a |hdl= and fails since it is already set. The solution is to view that as a success. This bug means that if you run the bot once you will get hdl set and then a second time it will add as a url.
https://github.com/ms609/citation-bot/pull/517AManWithNoPlan (
talk) 21:43, 7 August 2018 (UTC)
You are working my butt off by the way. Which is good.
AManWithNoPlan (
talk) 21:43, 7 August 2018 (UTC)
Well it's the first time in a long looooooong while that anyone's been working on CitationBot so I'm making sure to take advantage of the opportunity. Headbomb {
t ·
c ·
p ·
b} 22:25, 7 August 2018 (UTC)
Running CitationBot on
doi:
10.1073/pnas.171325998 finds
PMC58796, but not
PMID11573006. Basically, the bot should query both Pubmed and PubMed Central every possible ways up until each of doi/pmid/pmc are found. And iterate when new identifiers are found.
of citation templates in the NLM/NIH databases, and cross-reference things with each other.
The bot should also not assume the queries return 'complete' results. Very often, a PMID entry won't list the PMC, even if a PMC exists and could be discoverable by a DOI query (and vice-versa for PMCs listing a DOI, but not a PMID, or a PMID, but not doi, or every other such combination). Headbomb {
t ·
c ·
p ·
b} 04:42, 9 August 2018 (UTC)
I noticed that years ago. But, there were so many other issues to deal with that I forgot about it.
AManWithNoPlan (
talk) 14:11, 9 August 2018 (UTC)
completely unrelated to hdl issue.
AManWithNoPlan (
talk) 13:54, 9 August 2018 (UTC)
Seems exactly the same type of issue to me: failing to use |citeseerx=, just like it failed to use |hdl=, but you're the coder here. Headbomb {
t ·
c ·
p ·
b} 14:05, 9 August 2018 (UTC)
The difference is that in the case of hdl, it already had the hdl set, so it failed to add it and then fell back on adding it as a url. In the case of the citeceers, the case of citeseerx, the bot has no code to even add one.
AManWithNoPlan (
talk) 14:11, 9 August 2018 (UTC)
Work is such a poorly used parameter that removing published based upon it is dubious. I have added this code
https://github.com/ms609/citation-bot/pull/545 so that if the |work= is set and the journal title happens to be the same, then the |work= is changed to |journal=.
AManWithNoPlan (
talk) 17:59, 11 August 2018 (UTC)
Hence why this should be restricted to the cite journal template since in that template work and journal are aliases. Headbomb {
t ·
c ·
p ·
b} 18:48, 11 August 2018 (UTC)
@
AManWithNoPlan: not sure I know what's being done in that exactly, but will this strip |journal=
Journal of Foobar to |journal=Journal of Foobar? Because if so, it shouldn't. Headbomb {
t ·
c ·
p ·
b} 21:42, 9 August 2018 (UTC)
it removes all wikilinks from |title=. It remove all wikilinks from |journal= UNLESS the link is the entire name of the journal.
AManWithNoPlan (
talk) 21:49, 9 August 2018 (UTC)
If you look at the changed files, one of them is a test suite and you can see the changes.
AManWithNoPlan (
talk) 22:59, 11 August 2018 (UTC)
This should apply to every single character at the end of a string, or before a ':'. E.g. Journal of Physics E: Blah BLah BLuh or Chemical Physics A. Headbomb {
t ·
c ·
p ·
b} 18:07, 14 August 2018 (UTC)
This happened when added support for the Spanish "the" word "e". That fixed a lot of Spanish things, but we forgot about "j chem phys e" type stuff. But come on, who splits their journals five ways? Obviously physics people do.
AManWithNoPlan (
talk) 19:49, 14 August 2018 (UTC)