-
We now have two 10k groups. The test group from which the sensitivity and specificity are learned ("the same test group you used to measure the sensitivity/specificity"). And a second 10k (which may or may not be the same as the first, but in your example they are). This second group is having the test applied to them. The important thing is that these are two different things which I think we both agree on.
The first group is used to judge and measure the test. Sensitivity and specificity are learned from this group. Whatever you do with the test after this, sensitivity and specificity do not change (unless they were wrong to start).
The second group is that which the test being used on. In this case, we can say that the second 10,000 people in your example are being used as an analogue for the general population, no? The results will depend on the makeup of this 10,000 people. They will not mimic reality (that is, the tests are not perfect).
So even if you apply your own test to the same test group you used to measure the sensitivity and specificity of the test you find that a positive result is not as accurate as you expected.
The tests are not 100% accurate, this was never a point of confusion.
So again, the sensitivity and specificity which are learned from the first group is what I thought at the very beginning you were claiming shifts, depending on prevalence in the population (this is what I quoted in my reply to Chalfie, and if you remember, the first thing I said to you after you replied was: "Ah, sorry. So the raw number of false positives/false negatives will shift depending on how many true negatives/true positives there are. Okay - I mistook your "-ve" and "+ve" to be analogues for specificity/sensitivity.")
The results in the second group (be it a sample or the general population) are dependent on the makeup of that population and the sensitivity/specificity of the test. I agree with this and always have.
If I'm still misunderstanding you we can either take this off public chat or you can rest assured that you tried and I'm a moron.
-
I'm not sure talking about groups/populations is helpful. You just test an individual and there are 2 possibilities: a positive result or a negative. Of the positives approx. 50% are false results, which is a failure of the test. Of the negatives, 99% are accurate and 1% are false negatives. It is much more likely that the testee is negative so the false positives are fairly common and false negatives are an (unlikely) test failure on top of an (unlikely) result, meaning a very small proportion. Hence, negative results are more likely to be true than positive results.
You are correct that the sensitivity and specificity never change. Just the likelihood of the result being true or false, which is what @Greenbank meant by "66% accurate" or whatever (i.e. 66% of these are true, the rest are test failures)
I don't think you actually disagree with each other
-
We now have two 10k groups. The test group from which the sensitivity and specificity are learned ("the same test group you used to measure the sensitivity/specificity"). And a second 10k (which may or may not be the same as the first, but in your example they are). This second group is having the test applied to them. The important thing is that these are two different things which I think we both agree on.
Yep.
The first group is used to judge and measure the test. Sensitivity and specificity are learned from this group. Whatever you do with the test after this, sensitivity and specificity do not change (unless they were wrong to start).
Yep.
The second group is that which the test being used on. In this case, we can say that the second 10,000 people in your example are being used as an analogue for the general population, no? The results will depend on the makeup of this 10,000 people. They will not mimic reality (that is, the tests are not perfect).
I used the same group of people again to highlight the fact that even if you use the same group of people you can get weird looking results for the accuracy of a specific outcome (not the test in general).
In my example above the overall accuracy was 95%. I don't doubt that. I've been talking about the accuracy of a positive result of the test. With the numbers above if you get a positive result then it's only a 50:50 chance of being accurate, despite the test having an overall accuracy of 95%.
So even if you apply your own test to the same test group you used to measure the sensitivity and specificity of the test you find that a positive result is not as accurate as you expected.
The tests are not 100% accurate, this was never a point of confusion.
Yes, but they're not 95% accurate for people who get a positive result.
So again, the sensitivity and specificity which are learned from the first group is what I thought at the very beginning you were claiming shifts,
No, I was calculating the accuracy of the test per outcome and showing how it differs massively for a negative and a positive result.
depending on prevalence in the population (this is what I quoted in my reply to Chalfie, and if you remember, the first thing I said to you after you replied was: "Ah, sorry. So the raw number of false positives/false negatives will shift depending on how many true negatives/true positives there are. Okay - I mistook your "-ve" and "+ve" to be analogues for specificity/sensitivity.")
The point is that an individual test may be 95% accurate but only if you know your true status (in which case the test is pointless).
When you get your test result the only thing you know is your result, so you have to look at the estimated/calculated accuracy for the individual result, and that is where they can be skewed well away from the expected 95%.
(I know you're not a moron. I'm probably using the wrong words/terms all over the place, apologies if I am.)
No, I don't think you're understanding what I've said. It doesn't have to involve the general population at all, the same problems occur when you use the same test group you used to measure the sensitivity/specificity.
Imagine you have a test group of 10,000 people. You know everyone's exact status through other testing. 9,500 (95%) are negative. 500 (5%) are positive.
Imagine the test has a specificity and a sensitivity of 95% (i.e. you get 5% of false positives and 5% of false negatives).
Now apply your test to all 10,000 people in your test group.
Consider just the positive results, where could these have come from?
Firstly there are 95% of the 500 people who are truly positive. So that's 475 people. (The other 5% get a false negative result.)
The other positive results will be the false positives from the people who are truly negative. How many of them will there be?
That'll be 5% of the 9,500 people who are truly negative. That's another 475 people.
How accurate is a positive result if it's only correct for 475 out of 950 people who test positive? 50%.
So even if you apply your own test to the same test group you used to measure the sensitivity and specificity of the test you find that a positive result is not as accurate as you expected.
It has nothing to do with the general public, the above numbers come from the same people who were used to calibrate the test, although the general population prevalence can skew the numbers even more if the prevalence in the population is different to that of the test group that was used to help measure sensitivity and specificity.