Week In Review: Friday Apr 8th
Gee, I wonder what the big news in XC skiing this week could be? It sounds from the translated reports from Norway and Estonia that I’ve read that Andrus Veerpalu tested positive (both A and B samples) for human growth hormone, but is denying any wrongdoing. Naturally, this means I started the week off with:
- A new post revisiting my older one on Veerpalu. A common accusation against Veerpalu (and other suspected dopers) is that they have an unusual ability to show up at major events and ski much faster than they “normally” do. What I hope people take away from these two posts is that (a) our intuitive sense for “unusual” results does not always match the data, and (b) the answer you get will depend strongly on how you measure performance. The resulting situation is pretty ambiguous, which is why I would never recommend this line of reasoning as a serious accusation against someone.
- A look at what we can learn about pacing from split times.
- The first of several posts looking back at the careers of skiers who have decided to retire. This week was Pirjo Muranen’s turn.
Finally, I’m going to take this opportunity expand briefly on something I tweeted about. I read that one of the statements made at Veerpalu’s press conference in his defense was that he had passed more than 100 drug tests in the past. Although sports fans are becoming more educated about the statistics of drug testing, some confusion still remains. Without getting into the technical details, here’s the basic story.
Drug tests can make two types of mistakes: false positives, where we incorrectly label a clean athlete as a doper, and false negatives, where we incorrectly label a doped athlete as clean. In general, any testing scheme will involve a trade-off between these two types of errors. If you tweak your methodology in order to reduce the number of false positives, you will inevitably increase the number of false negatives. This trade-off cannot be outwitted! I often hear people suggest that maybe if we combine two, or three tests, or engage in some other complicated scheme, that you can reduce both types of error at the same time. Some testing procedures will be better than others in terms of both types of error, but whatever complicated combination of procedures and tests you invent, the end result will always amount to a single, big test that is itself subject to this very trade-off.
Drug testing in sports, for obvious reasons, is often calibrated in a such a way that false positives are considered far worse than false negatives. Specifically, tests are often constructed in order to be very careful to avoid falsely accusing an athlete of doping. Sadly, this means that negative results are simply less informative, in that they are much less likely to actually mean the person is clean.
My general conclusion as a stat guy is that I’m much more likely to believe that a single positive result is accurate than I am a single (or even many) negative results. A string of negative results doesn’t receive zero weight in my book, but they don’t receive much. (This is ignoring extra-statistical issues like mishandling samples at the lab, corrupt labs, or other similar human factors.)