-
This may all be down to clumsy language on my part, though. What I was trying to express was the difference between the accuracy of the test as a pharma company expresses it on the tin (presumably something akin to an F1 score). This is in contrast to the accuracy (which I think greenbank was expressing), which is that achieved in the population as is influenced due to other factors (prevalence being one of them). The latter does not impact the former.
No, I don't think you're understanding what I've said. It doesn't have to involve the general population at all, the same problems occur when you use the same test group you used to measure the sensitivity/specificity.
Imagine you have a test group of 10,000 people. You know everyone's exact status through other testing. 9,500 (95%) are negative. 500 (5%) are positive.
Imagine the test has a specificity and a sensitivity of 95% (i.e. you get 5% of false positives and 5% of false negatives).
Now apply your test to all 10,000 people in your test group.
Consider just the positive results, where could these have come from?
Firstly there are 95% of the 500 people who are truly positive. So that's 475 people. (The other 5% get a false negative result.)
The other positive results will be the false positives from the people who are truly negative. How many of them will there be?
That'll be 5% of the 9,500 people who are truly negative. That's another 475 people.
How accurate is a positive result if it's only correct for 475 out of 950 people who test positive? 50%.
So even if you apply your own test to the same test group you used to measure the sensitivity and specificity of the test you find that a positive result is not as accurate as you expected.
It has nothing to do with the general public, the above numbers come from the same people who were used to calibrate the test, although the general population prevalence can skew the numbers even more if the prevalence in the population is different to that of the test group that was used to help measure sensitivity and specificity.
-
We now have two 10k groups. The test group from which the sensitivity and specificity are learned ("the same test group you used to measure the sensitivity/specificity"). And a second 10k (which may or may not be the same as the first, but in your example they are). This second group is having the test applied to them. The important thing is that these are two different things which I think we both agree on.
The first group is used to judge and measure the test. Sensitivity and specificity are learned from this group. Whatever you do with the test after this, sensitivity and specificity do not change (unless they were wrong to start).
The second group is that which the test being used on. In this case, we can say that the second 10,000 people in your example are being used as an analogue for the general population, no? The results will depend on the makeup of this 10,000 people. They will not mimic reality (that is, the tests are not perfect).
So even if you apply your own test to the same test group you used to measure the sensitivity and specificity of the test you find that a positive result is not as accurate as you expected.
The tests are not 100% accurate, this was never a point of confusion.
So again, the sensitivity and specificity which are learned from the first group is what I thought at the very beginning you were claiming shifts, depending on prevalence in the population (this is what I quoted in my reply to Chalfie, and if you remember, the first thing I said to you after you replied was: "Ah, sorry. So the raw number of false positives/false negatives will shift depending on how many true negatives/true positives there are. Okay - I mistook your "-ve" and "+ve" to be analogues for specificity/sensitivity.")
The results in the second group (be it a sample or the general population) are dependent on the makeup of that population and the sensitivity/specificity of the test. I agree with this and always have.
If I'm still misunderstanding you we can either take this off public chat or you can rest assured that you tried and I'm a moron.
Happy to have you chime in.
For the record, and I've said this to him, I've never doubted him being right about something. I just didn't understand what he was trying to express here:
I've come to the conclusion that this is a claim about the test's result accuracy in the general public when deployed, and as a count (albeit expressed as a percentage). I noted this a few times yesterday, but it was never acknowledged that this was in fact the source of the misunderstanding (or I missed it as I slowly had more beers/was making dinner). I.e.:
"Just to reiterate, I do understand that the overall number of accurate results will depend on the prevalence of the disease in the population. But the accuracy of the test, in my understanding, should be independent of this."
This may all be down to clumsy language on my part, though. What I was trying to express was the difference between the accuracy of the test as a pharma company expresses it on the tin (presumably something akin to an F1 score). This is in contrast to the accuracy (which I think greenbank was expressing), which is that achieved in the population as is influenced due to other factors (prevalence being one of them). The latter does not impact the former.
Assuming I understand everyone now.