This page is within the scope of WikiProject Disambiguation, an attempt to structure and organize all
disambiguation pages on Wikipedia. If you wish to help, you can edit the page attached to this talk page, or visit the
project page, where you can join the project or contribute to the
discussion.DisambiguationWikipedia:WikiProject DisambiguationTemplate:WikiProject DisambiguationDisambiguation articles
on what statistics should look like for hatnotes, primary redirects, primary topics
Here's some more bits of info I've gathered after someone asked at
Talk:Tupelo:
after the move, the previously presumed primary topic barely registers though the common section does consistently get a measurably noticable chunk of traffic
after the move the previous presumed primary topic got ~7%, ~7%, and then with the fall of overall traffic it was below the anonymization threshold (< 10/273 = < 3.66% with every source-destination combination - which doesn't mean it wasn't actually something different though still a minority).
hatnote got a very small amount of traffic (max 220/13742 = ~1.6%), was only #5 in the top list
after the move, the previously presumed primary topic was sorted down in the list, got ~7% clicks the next month; sorted back up the following month it got ~8%, ~7.5%, ~5% clicks, though ~53% / ~55% / ~60% / ~47% of identifiable outgoing
disambiguated after 18 years of being presumed primary topic, 15 years with a hatnote (March '22 1199/9356 = ~13%)
after the move, we continue seeing seasonal spikes for the previously presumed primary topic, and regardless of spikes it gets ~23%, ~15%, ~11% interest
primary redirect, hatnote got < 0.5% interest compared to total incoming traffic at destination, nobody checked a comparison of redirect and disambiguation traffic
after the move, the previously presumed primary topic got ~16%, ~23%, ~17% interest
the hatnote got ~0.7% before the move, yet there were hints that it could do better if reorganized because of ~20% measurements related to the primary redirect (which we sadly can't have in a lot of cases)
This isn't to say that all of these moves were truly warranted or that there aren't a plethora of individual factors at play. But even with this spread of outcomes, there's something distinctly off with our current near-consensus interpretation of how stats should look like for primary topics by usage. This also means little for considerations of long-term significance. --
Joy (
talk) 15:30, 1 March 2024 (UTC)reply
Your numbers for % of "interest" are rather misleading, as they don't mention that in many cases, the largest percentage by far was "no traffic at all", i.e. no clickthrough, for whatever reason (visits not by humans? wanted target not available? info on disambig page sufficient?). E.g. for Hamme, you note "after the move, the previously presumed primary topic got ~16% interest", which was more than 4 times as much as the other topics "combined".
Fram (
talk) 14:51, 11 April 2024 (UTC)reply
Yes, I've mentioned this several times. I agree it is misleading to simply assume that every incoming view counts for something meaningful. Truth is that there is no way for us to know what if anything those 'dead-end' incoming page views signify.
older ≠
wiser 15:34, 11 April 2024 (UTC)reply
@
Bkonrad exactly, but it's not exactly that we don't know that they don't signify anything. There's multiple hints that they do:
First of all, there is a spread of cases here, ranging from where we see a lot of the incoming views translate into clicks, to where we see few of the incoming views translate into clicks. I didn't summarize all of that information on this talk page, you have to go click through the links to examine that. (I may find some time later to extract that dimension of data and extend the list above.)
That means that we're not just consistently seeing some ghost traffic always, rather, we've got to be observing actual reader behavior at least to an extent. So we can't just see e.g. 60% of traffic translate into clicks in a fresh case and then jump to the conclusion that most of the remaining 40% is ignorable.
Secondly, there are cases where we see almost all of the incoming views translate into clicks. The most recent such example I found is described at
Talk:Forced march#post move to disambiguation, where our identification rate went from 34/55 (~62%) in the first month observed, to 96/96, to 95/95, to 135/135 in the last three months, amazingly enough, even at such a small amount of traffic.
This negates even the idea that there's always got to be at least some of this ghost traffic, because apparently we have a falsifying scenario that seems quite consistent. So we can't just see e.g. 75% of traffic translate into clicks and then jump to the conclusion that any part of the remaining 25% is ignorable.
On the individual points raised by @
Fram earlier:
meta:Research:Wikipedia clickstream says it tries to exclude visits not by humans: We attempt to exclude spider traffic by classifying user agents with the ua-parser library and a few additional Wikipedia specific filters. It's certainly possible that it misses, but then the page views "User" category is likely missing, too, so I don't know that we should rely on that being a major effect.
Wanted target not available - how would this improve the odds for the claim that there would have to be a primary topic, when there'd be topics that detract from there being a primary topic yet they're not even available? That would seem to just raise the risk of astonishing more contingents of readers. "These people don't even know about meaning X, and they proclaimed meaning Y as the main one - pfft!"
Info on disambig page sufficient - this use case is indeed not studied at all, and I agree that it seems possible for at least some cases. Ultimately, why would we consider all navigations that do not result in another click bad? IOW surely this also detracts from the idea of there being a primary topic, if there is also the contingent of readers who we cannot convince to click on the link to read about the proposed primary topic (which is also usually the very first link in the list).
More than 4 times as the other topics combined - I've explained already at
Talk:Hamme (disambiguation) how you are making weirdly incorrect statements. Even if we compare 28 identified clickstreams and the 10 identified clickstreams, the ratio between those two numbers is 2.8, it is simply not 4. Likewise, both 28 and 10 are so close to the anonymization threshold that it's not at all clear that this ratio has to be precise. In other words, this could have been 4 or it could have been 2 with just a few more src-dest pairs of views identified as opposed to anonymized out. And none of these ratios are impressive when we also see a lot more traffic interested in neither of these.
In any event, thanks for the interest. --
Joy (
talk) 12:33, 14 April 2024 (UTC)reply
Yes, I'm not saying the count of incoming views should be completely ignored, only that trying to read into what it signifies is highly speculative and we should be very cautious about what significance we attribute to such dead-end views. If there is a sudden change in the number of such incoming views, that likely merits some further consideration. Similarly, if there is a consistent, very large gap between incoming and outgoing views, that also may merit some consideration. But even in such cases, deciding what readers reaching such dead-end views were looking for when arriving at a particular disambiguation page is still highly speculative. It could play a factor in arguing that there is no primary topic where there is none at present (i.e., where there is a request to replace a disambiguation page with a primary topic). I'm not sure what significance we could read into such dead-end views of a disambiguation page where there is an existing primary topic. It could be readers are just curious about what else might have the same name, without intending to look at any of them in more detail. We just don't know why such readers behave that way.
older ≠
wiser 13:46, 14 April 2024 (UTC)reply
Agreed, the change in pattern would be a significant indicator. But on that front, I point again to evidence above - we often observe a clearly consistent pattern, and then we do a switch for whatever reason, and then we the data switches to observing a clearly different consistent pattern. Well, it often takes a few months for things to settle, and in the interim period there's a swing or two, but still.
In cases where there is already a primary topic selected, it's very hard to read into the no-clickthrough traffic. Because the content is larger and varied, it could be any number of possibilities. Just like it could be readers who are navigated wrongly and just immediately click away, it could also be misnavigated readers who stayed and learned something and then clicked away, or it could be a bunch of completely content readers who were absolutely happy to read what was in front of them and had no need to immediately learn more about another related topic. We don't have the tools to discern these.
With simpler pages like the disambiguation lists, however, it's less hard to understand the general reader behavior because we don't present people with huge amounts of possibilities, we reduce that number and streamline their options, and make it more likely we can understand the measurements of our existing tools.
What I think we should learn from all this is that we should not be too cautious and instead we should not be afraid to experiment as much as we have been so far.
In all this data I've tracked, we've yet to observe a case where there was a fresh reader complaining about disambiguation lists being the wrong choice. As long as we apply
MOS:DABCOMMON, and we do, we have no indication that we're confusing or troubling any appreciable amounts of readers even in contentious cases. --
Joy (
talk) 14:35, 14 April 2024 (UTC)reply
In all this data I've tracked, we've yet to observe a case where there was a fresh reader complaining about disambiguation lists being the wrong choice. This seems a peculiar criterion. Quite aside from reactions to the lists you have been compiling, I can't recall the last time I came across a "fresh" reader ever complaining about an incorrectly placed disambiguation page where the complainant was not a myopic partisan seeking to promote their preferred topic.
Regarding What I think we should learn from all this is that we should not be too cautious and instead we should not be afraid to experiment as much as we have been so far. I'm glad you are taking a deeper dive into the data, but I hope no one is being misled that the reems of data of uncertain quality based on poorly documented functions represents an agreed upon approach to making decisions.
older ≠
wiser 16:02, 14 April 2024 (UTC)reply
We actually have some interesting data points about that, too, cf.
Talk:Tito (disambiguation), where nobody really paid attention for over a decade as disambiguation was in place, and then a consensus of editors practically instantly chose to apply a primary topic redirect mainly for long-term significance. (On the plus side, that flip allowed us to measure something else afterwards,
Wikipedia talk:Disambiguation/Archive 56#on the quality of clickstream and pageviews usage data explains more.)
I'm pretty sure if we go through other cases we can also find similar timeframes, where some arbitrary navigation choice has been in place for years and decades, and then we arbitrarily decide to congregate, make fun new decisions and pat ourselves on our collective backs :)
IOW our decision-making process seems perfectly sound (mostly to me too, I'm not excluding myself here), but so much goes through the cracks that it's doubtful that much of it really matters as much as we think it does. --
Joy (
talk) 16:49, 14 April 2024 (UTC)reply
In the discussion linked above, we have something of a weak consensus to stop strictly sequestering disambiguation from set indices in all cases. Does anyone see any reason not to draft changes to this guideline to incorporate some of these possibilities? --
Joy (
talk) 15:44, 6 March 2024 (UTC)reply
As there's no objection, I think we need these kinds of changes:
Make it clear in the text of
WP:PTM that the exception applied to toponymy can apply to anthroponymy as well, and we shouldn't shy away from listing some of the latter on disambiguation pages
Make it clear in the text of
WP:NAMELIST that the strict separation of disambiguation and set index content should be weighed against the possibility that we introduce extra hurdles to readers just because some ambiguous topics also qualify for a set index
Possibly a section here about the practical interaction of disambiguation and set index articles, as opposed to just a See also link
a change in page views between primary topic and primary redirect
A few years ago, I had noticed
Sumber, Cirebon was at "Sumber" and moved it away, but kept a primary redirect because it wasn't clear if there was not a primary topic. Now that I had a look at the
graph of monthly page views it looks like that change significantly altered the overall traffic at "Sumber" - it went from an average of >200 a month to <20 a month.
This difference of an order of magnitude seems to be another confirmation for how our choices about navigation influence traffic in unforeseen ways - it looks like if we put something in a presumed primary topic position, the search engines drive traffic there way more than they do otherwise, even with a primary redirect in place. This sort of puts a significant dent in our logic of figuring out primary topic by usage - once we have a presumed primary topic, we can't really trust our statistics about that to tell a straightforward story.
Now, this property of the system could have both positive and negative effects - maybe we should think of this in terms of: by choosing to put a topic in the primary topic position, we can intentionally drive traffic to it and more effectively contribute to the spreading of knowledge. At the same time, this indicates a need to have more substantive long-term significance discussions, because we if we have a chance to influence reader traffic like this, we want to make sure we do it for the right topics, as opposed to doing it arbitrarily and/or effectively hiding ambiguity. --
Joy (
talk) 10:16, 9 March 2024 (UTC)reply
At
Talk:Bold#post-move, we see a change from primary redirect to disambiguation page leading to significantly more traffic at the latter. --
Joy (
talk) 21:20, 14 March 2024 (UTC)reply
What would be the best way to name this article, given that we have
Fifth Avenue Hotel for a different (former) hotel on the same avenue? I don't think using the word "The" is the correct way — Martin (
MSGJ ·
talk) 11:31, 5 April 2024 (UTC)reply
As long as the correct title is
The Fifth Avenue Hotel in accordance with
WP:THE, then
WP:SMALLDIFFS kicks in, and we needn't add anything further to the title. All we need is hatnotes on each article pointing to the other.
Station1 (
talk) 03:55, 6 April 2024 (UTC)reply
Proposed DAB category - Greek Letter Organizations
examples include:
Sigma Phi Beta and
Phi Kappa. Would contain about 20-30 dab pages, willing to generate a list. Can't find the guideline here, though it seems smaller than most of the existing.
Naraht (
talk) 15:25, 29 April 2024 (UTC)reply
How should redirects to list entries be disambiguated?
Following
this RfD, Quantum realm was retargeted to
Features of the Marvel Cinematic Universe#Quantum Realm. There were mentions in the RfD discussion about putting a hatnote to
Quantum mechanics at the target; however, as the redirect points to an entry in a list rather than to a section, there doesn't seem to be a good way to include a hatnote template. I was therefore wondering if there was any guidance or ideas about how these sorts of redirects can be disambiguated.
I don't know a standard way to do this but we might replace
* The '''{{visible anchor|Quantum Realm}}''' ...
by something like
{{anchor|Quantum Realm}}{{redirect|...}}
* The '''Quantum Realm''' ...
However, that's a bit intrusive for readers simply browsing the list.
Certes (
talk) 15:59, 6 May 2024 (UTC)reply
That's the option that occurred to me, as well. It is, as you say, a bit intrusive, but something needs to be there so that users looking for
quantum mechanics aren't left hanging. Another possibility is an inline pointer:
* The '''{{visible anchor|Quantum Realm}}''' (not to be confused with [[quantum mechanics]]) ...
But that seems kind of awkward, especially because there's already a parenthetical in that sentence. —
ShelfSkewedTalk 17:06, 6 May 2024 (UTC)reply
That's an innovative approach. It feels wrong, but perhaps only because we don't do it elsewhere. It's analogous to {{
confuse}}, which is mainly for typographically similar words such as
Astroid vs
Asteroid, but "quantum realm" is the correct spelling of a colloquial term for quantum mechanics as in
[1]. I think {{
redirect}} is the tool for the job, as per your first instinct.
Certes (
talk) 18:31, 6 May 2024 (UTC)reply
Well, let's give it a try. I added a hatnote (with a leading colon to distinguish it from the section above) with an edit summary linking to this discussion. Maybe someone will have another idea. —
ShelfSkewedTalk 19:08, 6 May 2024 (UTC)reply
One thing I'd note here is that there was still some amount of acrimony about whether the term was truly ambiguous. In this sort of a case, the primary redirect method should not generally be preferred because it doesn't allow us precise measurements. We should instead default to erring on the side of caution and first disambiguating these with normal disambiguation lists, and then be able to look at the page views and clickstream statistics specifically for that. I know it seems like seven editors agreeing at RfD is a true sign of consensus, but when it's not backed by a volume of empirical data, but is rather largely assertions, it could well be wrong. I came to this stance after seeing how the stats changed after RfDs for
forced march and
Celebi. --
Joy (
talk) 19:22, 6 May 2024 (UTC)reply
how many months it takes for disambiguation usage statistics to change
I'd like to point out an aspect of
#on what statistics should look like for hatnotes, primary redirects, primary topics that is becoming increasingly clear - we don't need a lot of time to detect changes in readership navigation patterns. Usually whatever happened with one month after a change was quite indicative of the pattern of traffic going forward, there's never a serious fluctuation.
We should use this to our advantage - to be more willing to experiment for e.g. two months, because that will usually suffice to get measurements and decide if the change was good or not. --
Joy (
talk) 18:16, 10 May 2024 (UTC)reply