Effect of TdS on Olympic Results

Multiple people emailed me asking about whether there was any potential link between whether an athlete participated in the Tour de Ski and their performance at the Sochi Olympics.

Personally, I like dealing with questions like this because they are a great example of how something can both be poor statistical reasoning, but still true at the same time. What could I mean by that?

Well, all of the emails I received made some attempt to list people on both sides: those who skipped the TdS (entirely, or in part) and then seemed to ski well at the Olympics, and those who skied the whole Tour but then seemed not to ski so well at Sochi.

The problem with this sort of thinking is that you have to stop and ask “skied well (or badly) relative to what?”. And that is devilishly hard to establish. For instance, Marit Bjoergen skipped much of the Tour, and then she won 2 gold medals.

But of course, we don’t really know how Bjoergen might have skied in Sochi had she finished the Tour. That would require time travel, or Dr. Who style alternate universes or something. Imagine an alternate universe in which she finished the Tour, and then got the exact same results, under the exact same scenarios, at Sochi. Ask yourself if you’d react to that universe with shock and surprise that Bjoergen skied well enough to win two gold medals (and two bad results from a crash and bad wax) at Sochi after doing the Tour in January. No? I thought not.

So with all my statistical hand waving out of the way up front, here’s a crude way to look at this, as best we can. I took all the top 30 finishers in the individual races at Sochi and collected all their results from 2012-2013 forward. I calculated the difference in the average performance of each skier at Sochi and prior to Sochi (measured by looking at the difference in rank or FIS points compared to each of the skiers in the cohort), and then plotted that relative to the number of stages they started at the Tour de Ski:



Due to the somewhat unfortunate repeated collapsing and then subtracting, negative values here represent doing better at Sochi against this specific cohort of skiers than they had in the previous year and half or so. So if more starts at the Tour had an overall negative effect on performance, you’d see things trending generally upward. But mostly people are just all over the map.

But the key here is that just because we have no evidence of an overall effect, for the whole population on average, that in no way means that specific people weren’t adversely affected by racing deep into the Tour. That’s the key: it’s almost certainly true that some people may have benefited from skipping the Tour, and some people suffered by doing the Tour. But that’s not the same thing as some sort of general, over-arching effect across all people.

I skipped adding any trend lines, because really, the story here is the variation. Sure, you might be able to convince yourself that there’s an effect present for the male sprinters.

Race Snapshot: TdS Final Climb





Race Snapshot: TdS 5/10k Classic





Race Snapshot: TdS 15/35k Pursuit





Race Snapshot: TdS Classic 10/15k Mass Start





Race Snapshot: TdS Freestyle Sprint





Tour de Ski Standings Without Bonus Seconds

I’ve gotten requests to do this in the past, and did again this year. Nothing super complicated, we just add up all the bonus seconds earned by all the Tour de Ski finishers and add them back onto each person’s time. Obviously, some bonus seconds were earned by folks who ended up not finishing the Tour. Those seconds are lost forever, they aren’t reallocated to other skiers, so we’re ignoring them as well.

To get all nerdy on you, playing “what ifs” like this remind me of a common danger when interpreting the results of a multiple regression analysis. (I know, right? What a totally obvious metaphor…)

It’s tempting when looking at regression coefficients to to say things like, “If we change this variable by X units, then the response variable will change by Y units”. This is intuitive, but often misleading. The reason is that it’s rare that you can really alter one variable without others changing as well. (One exception, of course, is when you can design a controlled experiment.)

What does this have to do with Tour de Ski rankings with the bonus seconds removed (or added back on, if you prefer)? Well, the athletes likely would have raced differently if the gaps between them, or their current standings, had been different. So interpret the following with caution.

First of all, who earned bonus seconds at all?

Bonus seconds

That’s who.

If we add those seconds back onto each person’s total Tour time, we this: Read more

Next Page »