Sprint Qualification Pacing Analysis

I’ll be the first to admit that sprinting doesn’t get as much love here at Statistical Skier.  To be honest, that’s probably my subconscious at work, as I have a stronger connection to distance events I suppose.  But it’s not terribly fair, so here is some World Cup sprint qualification analysis for all you sprint lovers out there.

The first step to success in sprinting is qualifying for the elimination rounds.1  So I thought it might be fun to look at what sorts of efforts it takes to qualify in a World Cup sprint race (and any trends or patterns that might arise).

We’re going to do this using two measures: percent back and pace (seconds/km).  Using pace means we’re implicitly assuming that the courses are measured accurately, which probably has not always been the case.  Just for starters, the skier’s times are measured to the tenth of a second, but the length of the courses is typically only reported to the nearest 100m.  And it takes a bit more than a few tenths of a second to ski 100m.  Also, we should keep in mind that there can be pretty extreme variations in course design and weather/snow conditions.  So keeping all that in mind, let’s dive in…

Read more

  1. Everyone skis the course one at a time, i.e. the qualification round, and then the top 30 move on to elimination heats of 6 at a time or so.

Animated WJC Results History

I had fun making those animated charts for World Cup points the other day, so I thought I’d try using the same charts to look at World Juniors. All I’ve done is tally up “World Cup” points for each nation and each year, scoring each (individual) race using the traditional WC point scale. Then I divided each tally by the total number of such points awarded that year (since the number and type of races has changed over the years). So each point represents a single nation, with the x and y coordinates being the proportion of total “points” earned by that nation, that year.

As before, there’s really only a single data point for each nation for each year, but this Google chart API move the points smoothly between them. And they require Flash. Men first and the women below. Important: I’ve been noticing that the “Trails” option grinds the whole animation to a halt for me, so I’d recommend unchecking that option before you start playing around with these.

Do The Japanese Prefer Classic Skiing?

In my race recap for the Davos distance race I noted the strong performance of Masako Ishida, and pointed out what an extreme classic specialist she is.  A certain world famous XC skiing journalist wanted to know if the conventional wisdom he’d heard was correct, that Japan’s skiers typically do better in classic skiing overall.

This was particularly fun to tackle, since it turned out to be a situation where simply graphing the data in a clever way wasn’t enough.  We actually have to do statistics!  Woo hoo!  Don’t worry, though, while the techniques I ended up using for this analysis are fairly sophisticated, the results are pretty easy to understand and explain.

I’ll start with my first pass:

My first thought when approaching this kind of question is always to simply throw the data up on a graph and see what I can see.  So this is all of Japan’s WC, WSC and OWG results back to 1992.  Note that I’ve plotted rank, not FIS points, to keep the distance and sprint panels on the same scale.

The classic preference is clear in recent years on the women’s side, but since we already know that Masako Ishida has a monster proclivity for classic skiing, this could just be due to her results.  And other than the recent results for the women, it’s tough to make out any obvious patterns.  That’s because this graph treats all of Japan’s results from every athlete as a single group.  But clearly different skiers will have different abilities in skating vs. classic.  So we need a way to look at each individual skier’s races.  The problem is that there are more than 20 Japanese skiers (in my database at least) with a fair number of WC starts.  Can you imagine looking at a similar graph but with more than 40 panels (distance and sprint for each athlete)?  That wouldn’t be very illuminating, I think.

One option would be to artificially limit myself to a small number of skiers, but then our answer would apply only to those skiers we picked, not all of Japan’s skiers.  If we really want a good answer to this question we really need to include as much data as possible.

The solution is to use a model (gasp!).  In particular, a hierarchical linear model.  I’m not going to bore people with a detailed description of how this worked; if you’re really curious ask questions in the comments.  The bottom line is that this tool allows me to estimate the difference in results performance both overall and for each skier individually at the same time (by doing both at the same time, it often does a better job at each).

I probably could have squeezed this all into a single model, but I decided it would be easier to explain to folks if I modelled sprint and distance races separately.  That also allows me to use FIS points as a measure for distance races and rank for sprinting, which makes somewhat more sense anyway.

In distance races Japanese skiers (men and women) tend to ski about 3.49 FIS points slower in freestyle races (95% CI -1.75,8.73).  That little parenthetical just now meant that the 95% confidence interval for this effect ranges from -1.75 FIS points to 8.73 FIS points.  Since this interval includes zero, we would typically say that this does not meet the threshold for “statistical significance”, meaning that we can’t say with much confidence that the real difference isn’t actually zero.  Also, 3.49 FIS points is not a very large difference in practical terms.

But remember that this fancy-shmancy model I’m using doesn’t just estimate the overall effect, it also estimates this difference for each individual skier.  The following graph displays the results, along with their associated 95% confidence intervals: Read more

World Cup Survival Analysis

When most people say that World Cup skiers are animals, they probably mean they are fierce, strong competitors.  I got my PhD in statistics in a department that found itself working quite often with very strong wildlife biology and ecology departments, so for me that reference leads me to think, “Well, what if they really were animals?  What sorts of statistics might I end up doing on these data?”

A common statistical analysis when your subjects actually are wild animals is called survival analysis.  Very generally, the aim is to determine what factors influence the survival of, say, bears1.  The poor biologist would spend countless hours over multiple summers capturing, tagging and then tracking and recapturing the shrews or slugs or whatever2.

The end result would be a bunch of lifetime data (along with other variables) on individual organisms.  Then the question is, which variables seem to influence survival rates?  There are all sorts of technical details with this kind of data (censoring, mainly) on how to model it that I’m not going to get into here.  If some nerdy biologist is reading this and wants more details, let me know, and I’ll put them in the comments.

Read more

  1. Usually, the organism isn’t nearly this exciting.  Typically I’d see data on something like the western spotted shrew, or the golden mantled ground squirrel.  One of those animals I made up, the other I did not.
  2. Stats grad students would frequently talk about how grateful we were that we didn’t have to do field work.

In Which I Connect Triathlon Data To XC Pursuit Races

The connection is pretty obvious, actually.  Both are mass start races that involve switching activities at least once during the race.  The change in activities is certainly more extreme in triathlon, but you get my drift.

Some people (like, say, me) complain on occasion that pursuit races in cross-country skiing place too great an emphasis on the skating portion.  The order of techniques has settled into always doing classic first and then skating for practical reasons (having the classic skis waxed properly and delivered at the right time would be hectic, to say the least).  But the result has been races that plod1 along during the classic half and then finally people start to accelerate during the later stages of the skating portion.

The triathlon data I ran into recently happened to give a very stark picture of what happens to the relative importance of each activity in these types of races.  While noodling around with the data, I plotted scatterplot matrices of the ranks for each stage of the triathlon for men and women (click through for larger versions): Read more

  1. No offense intended, obviously.  I probably couldn’t keep up with the “leisurely” pace of the classic portion.

Triathlon Racing Strategy

An old Dartmouth teammate of mine contacted me recently and asked if I’d be interested in looking at some triathlon data.  He has some ulterior motives here, as his sister is a very good triathlete.  Since he volunteered to gather the data himself (I believe from triathlon.org mostly) and send it to me, I just couldn’t say no.

My friend, Adam, had a very specific question, which I’ll get to in a second, but it turned out that there’s a bunch of interesting stuff to look at in these data, most of which I can’t get to in one post.  So I’ll be violating my nordic skiing theme some more with triathlon data.

First some background on triathlons in case you’re unfamiliar with the sport.  We’re discussing the Olympic distance triathlon (1.5km swim, 40km bike, 10km run).  Other than the distances, a major difference between these triathlons and the iconic Ironman variety (e.g. the one in Hawaii) is that drafting is legal during the bike.  This means that you are allowed to ride right behind people, which conserves a ton of energy.

An athlete’s time can be broken down into the five parts of the race: Swim, Transition 1 (T1), Bike, Transition 2 (T2) and Run.  The transitions are exactly what they sound like: you have to switch gear in a sort of pit stop area.1

As I mentioned, Adam’s question was very specific: Suppose you finish the biking portion of the race just behind several other competitors.  Is it better to rush through T2 in order to start the run ahead of some of them, or should you “chill” during T2.

If you’ve never done triathlons, this might seem like a strange question.  Shouldn’t you always go as fast as you can?  I mean, it is a race after all.  I’ve never done triathlons, but based on what I know, switching activities can be pretty jarring both physically and mentally2.  So it seems reasonable that there might be a school of thought within the sport that it’s worth being 5-10 seconds slower through a transition if you feel like the added time helps you adjust to the new activity more quickly.

Read more

  1. Sadly, it appears the times in the data have been rounded to the nearest second.  This means that when I add the five stage times I’m off by +/- 3 seconds from the recorded total time.  I doubt this will influence what I’m doing here drastically, but it obviously isn’t ideal.
  2. Seriously, go try it sometime.  Bike 40km as hard as you possibly can and then immediately switch to running 10km.  Trust me, it’ll feel pretty awkward.