Skip to content

Career Retrospective: Rene Sommerfeldt

Long time German skier Rene Sommerfeldt is retiring this year.  His international career spans roughly 15 years, which is quite a long time to ski internationally.

I thought it would be fun to honor retiring skiers with a fairly comprehensive look back at their careers.  But since this is a largely data visualization site, it will all be in graphs.  But first, some highlights: Sommerfeldt’s only Olympic medals have come in relays (Silver 2006, Bronze 2002).  He does have one individual podium at World Championships (Silver 2001) in addition to two more relay medals (Silver 2003, Bronze 2001). Continue reading ›

Tagged , ,

Victims & Nemeses: Marit and Justyna Edition

Earlier this week I described the notion of victims and nemeses, and showed you who Petter Northug’s were for the 2009-2010 season.  You can check back at that link for a description of victims/nemeses.  The basic idea is to count up the number of times you beat someone by a narrow margin (they become a victim) or are beaten by someone by a narrow margin (they become a nemesis).

This time let’s look at two of the fastest women around today: Norwegian Marit Bjoergen and Poland’s Justyna Kowalczyk.  Only this time, instead of limiting ourselves to just a single season, let’s look over their entire careers.

First Justyna’s victims and nemeses:

[table id = 40 /]

[table id = 41 /]

Marit Bjoergen actually isn’t in the lead here in either case.  Instead it’s Kristin Stoermer Steira who’s been edged out by Kowalczyk the most.  Interestingly, it’s Finnish skier Aino-Kaisa Saarinen who ends up being Justyna’s biggest nemesis, not Bjoergen, although just by a hair.

Next up Marit Bjoergen’s victims and nemeses:

[table id = 42 /]

[table id = 43 /]

Once again Kristin Stoermer Steira gets the win in the victims category.  For some reason I wasn’t expecting Petra Majdic to pop up as Bjoergen’s top nemesis, but there she is.

Tagged , , ,

A Few More Seasons…

Added a few more seasons of athlete rankings, showing variability estimates.  Use the drop down menus in the Athlete Rankings tab to find them and check here for an explanation of how I made them.

Tagged

How Well Prepared Are World Cup Rookies? (Part 1b: Distance)

In a previous post, I talked about rookie skier performance on the World Cup.  You can check back there for an explanation of what I mean by a rookie skier and the specifics of how I selected which data to use.  I left off in that post with the following graph, showing rookie FIS point results over time:

dist_trend_glob.png

Continue reading ›

Tagged , , ,

Athlete Rankings: What’s In Your Bucket?

You might have noticed the Athlete Rankings tab at the top of the page.  If you click on that tab, rather than navigating through the mouse over menus you’ll see a brief description of what they are, but I thought I’d put a quick note on them here.

First I need to explain why the heck I thought the world needed my own special athlete ranking.  The answer is actually not in the ranking itself, it’s in the error estimate.  If you go and look at the graphs, you’ll see that I’ve included an error bar for each athlete.

If you check out parts of the FIS website you’ll find that they compile athlete ranking lists several times per year.  The basic methodology is to average a skier’s best (lower FIS points are better) five races over the previous year.[1. There are caveats for athletes with fewer than 5 races.  In that case the average is inflated by the rather arbitrary factors 1.1, 1.2, etc. for skiers with 4, 3 races and so on.]  I’ve always been bothered by the fact that they are measured to two decimal places but they provide no error estimate.

It leads a person like me to wonder whether a skier with an average of 4.58 FIS points is really “faster” than a skier with 5.28 FIS points.  So this is why I was interested in creating my own FIS point-like rankings.  I simplified things somewhat, looking at races within a season, rather than over the previous year, and I simply omit skiers with fewer than 5 races in a season.

The point of these rankings is not that they do a better job of ordering skiers, but that they give you a sense of what magnitudes of differences in FIS points are meaningful.  My method for calculating the error bars is somewhat involved, but I had a rationale for making it that way.

The notion of variability in a skiers races over a season is a little subtle.  Typically, statistics focuses on what’s called sampling variability[2. There are many other sources of variability we might be concerned with, but I’m trying to keep things simple.], which arises when we collect a sample from a population.  If we could record data on every individual of a population, that would be a census, and there would be no variability to measure and hence no need for statistics.  But if we only have access to a sample (hopefully random), then we need to measure the variation that arises from the sampling process.  Namely, we might have gotten slightly different data.

When it comes to the races a skier does in a season, one could argue that we’re actually doing a census.  A skier did 10 races and we have access to them all.  This doesn’t seem like collecting a sample at all.  So where does the variability coming from?

I prefer to take the following view.  It requires a rather silly extended metaphor to explain, so bear with me.  Imagine each athlete has with them at all times an enormous bucket of chips (eg poker chips, or something) that represent “race efforts”.  These race efforts vary quite a bit in quality.  Skiers can influence the general content of their bucket via training and other forms of preparation, but there will always be some level of variability in the quality of chips available in the bucket.

When they actually do a race, they reach in and grab a chip at random.  If they’ve done their preparation well, their bucket will be filled with tons and tons of superb race efforts, and they are likely (but not guaranteed!) to do well.  If they haven’t prepared well, their bucket is likely to contain far too many poor race effort chips, and things might go badly.

I consider the collection of races a skier actually does to be a sample (again, hopefully random) from their bucket of race effort chips.  So that’s where the variability comes from, in my view, and we need to measure it somehow.

There’s a slick technique in statistics for handling “non-standard” situations like these, called bootstrapping.  The goal is to somehow estimate how variable the contents of some skier’s race effort bucket was, using only the races efforts we saw during the season.  Bootstrapping tells us that we can get a rough sense of this by drawing many, many samples, with replacement, from our actual data.

That probably got too technical for some of you, so here’s what’s going on.  Let’s say we have someone who did 5 races, scoring 1, 2, 3, 4 and 5 FIS points.  When I say “draw a sample with replacement from this collection of races”, what I mean is that we generate another slightly different collection of five scores, using only these particular 5 scores.  Some examples might be (2,2,4,5,5) or (1,1,1,1,1) or (5,3,2,2,1).  Some of our original races might not appear at all, and some might appear multiple times.  These are called “bootstrap samples”, in contrast with our original sample (1,2,3,4,5).

For each of these new sets of scores, we calculate whatever measure we’re using (average, average of the best five, etc.), and then measure how much these values vary.

Phew.

So that’s what those blue lines are in my ranking graphs.  I generate a bunch of bootstrap samples and calculate the skier’s average of their best five races within each bootstrap sample.  The blue lines represent how much variability we saw.  This gives a sense of the contents of each skier’s race effort bucket, at least during that season.

What’s in your bucket?

Tagged

How Well Prepared Are World Cup Rookies? (Part 1a: Distance)

Every season a new crop of young skiers cut their teeth on the World Cup circuit.  These might be new national team members racing on the World Cup full time, or they might be Nation’s Group athletes who receive a start as part of a host nation’s extra start allotments.  Each nation likely has a somewhat different strategy for choosing which up and coming athletes are awarded this opportunity.  One interesting question we can ask is whether there are differences in the level of preparation these athletes have seen between nations, or groups of nations.

The first two installments in this series will look at distance events, and then we’ll turn to sprints.

Continue reading ›

Tagged , , , ,

Welcome!

Welcome to Statistical Skier!

I’ve somehow managed to amass a fairly impressive database of international skiing results (what can I say, I really like data) and cross-country skiing has been a huge part of my life.  I wrote a few articles[1. I’ve reposted them in the preceding posts.] for FasterSkier.com this spring exploring some of this data and I was excited that at least a few people seemed to enjoy them.  More importantly, though, I had a ton of fun writing them.

I had so much fun, in fact, that I’m going to keep exploring the data here.

Things are just getting off the ground, website-wise, so I apologize if things look a little rough around the edges.  Like everything else on the web, it is a work in progress.

As for content, we’ll just have to see what happens.  A lot of the stuff I plan on writing about isn’t nearly as “serious” as the articles I wrote for FasterSkier.com, but of course some things will be more along those lines.  But trust me, I have plenty of ideas for topics!  My goal is to post something new at least several times a week, even during the summer, a decidedly slow time for skiing news.

So have a look around as we get up and running.  Please be sure to check back in for updates and/or subscribe to my RSS feed.

Finally, a special request: if you happen to have results for World Cup, Olympic or World Championship races from the 1990-1991 seasons or prior, I’d love to hear from you.  What I’m looking for are complete results including times.  If you’re willing to share them, I’d be grateful.  Paper or electronic media of (nearly) any format is fine.  You can email me at statisticalskier at gmail dot com.