Tuesday, April 17, 2007

Hold or fold?

Introduction
"Keeper leagues" are becoming increasingly popular among fantasy sports enthusiasts. In a keeper league, instead of starting over and drafting an entirely new team each year, a manager has the option of retaining some of his players. This poses a problem at the end of every season: should you hold on to a player, or take your chances in the draft? To help answer this problem, here is my attempt at a Bill James-esque formula to determine whether you should hold or fold. Please note that I've never actually taken a class in statistics, so forgive me if my format or method is not kosher...


Variables
A: Rank at begining of the year
B: Rank at the end of the year
C: Player's age
D: Games played
E: Total games in a season
F: Player's value rating
G: Player's rank among other players on the fantasy team
H: Teams in the league
I: Keeper value rating
J: #1 overall draft pick's F value

Formula
If: D/E < .75, then:
F = A + |C - 26|^1.25 + (2xE/D)^3 - J

If D/E > .75, then
F = (4xA + 3xB )/7 + |C - 26|^1.25 + (2xE/D)^2 + [(B - A) x (10/A)] - J

Once F is determined:
(GxH - H/2) - F = I

Make a list of your player's keeper value ratings. Add them up, starting with the first player on your roster. Find the point at which the combined I value is the highest, and keep all the players up until that point.


Explanation
I'm sure that just looks like gibberish, but if you're interested in the formula's derivation, read on.

The goal of the Bill James' theory of statistics is to identify and properly weight different statistical catagories to come up with a numerical representation of value. In my attempt to do this, I first identified several statistical catagories of particular interest in a keeper league. They were:
1. The player's rank the year before.
2. A player's age.
3. Whether that player tended to miss a lot of games from injury.
4. Whether that player tended to exceed expectations or not.

My goal was to make a ranking system that would rank the player in order of their keeper value (Variable F). To find that number, I had to decide how to weight the four catagories I mentioned above. Here's what I decided, in terms of each catagory. (Note: the player's value rating, or F, is designed to be a list from 1 up of the best to worst keeper players, so the lower the number the better).

The categories
1. Last year's ranking: (4xA + 3xB )/7
The player's rank from the year before will be the major determining factor. If a player was injured for more than 3/4 of the year (D/E > .75), then the ranking was left as-is. A player who played all year, however, has his ranking from the beginning of the year and the end of the year averaged. I weighted the original ranking as a third more important as the end of the year ranking, which is a little more fickle. For example, a player who enters the year ranked 67th overall and ends the year ranked 81 has a 73 rating for this variable.

2. Age: |C - 26|^1.25
The player's age was the next statistic considered. In most cases, a player's prime fantasy years occur between his 26th and 30th birthday. I made the optimal age 26 because this puts a player in the begining of his prime, with good years in front of him and enough experience to be productive. I took the absolute value of the age difference, and raised that number to the 1.25th power. This means that a player has a a progressively larger penalty added to his value rating based on how old he is. A player who is 31 adds 7.5 onto his value rating, reflecting that he will be of progressively less value in future years. A 22 year old adds 5.7, a penalty for the years that the owner will have to wait before the player hits his prime.

3. Succeptibility to injury: (2xE/D)^2
To take into account how injury-prone a player is, the total games during a season are divided by the number of games that player has played. This number is doubled and then squared to weight longer absences more than short ones. For example: a baseball player who misses 20 games out of the 162 games in a season adds 5.2 to his value rating. Furthermore, a second eqution is used to ensure that a player who misses most of a season due to injury still retains some of his value.

4. Tendency to exceed expectations: (B - A)x(10/A)
I used the change in rankings over a season to measure whether a player tends to play beyond or below expectations. Their change is measured in the (B - A) part of this term. Because an improvement of a player who is already ranked highly is more difficult to come by than a lower ranked player, the second part of the term weights the player's improvement/regression by where they stood initially. Example: Players 1, 2 and3 have initial rankings of 5, 20, and 100 respectively each improve 3 positions by the end of the year. Player 1 reduces his F value by 6 points, Player 2 by 1.5 points, and Player 3 by just 3/10ths of a point.


Other parts of the formula explained
1. The two different formulas resulting from: If: D/E > .75, then:
A player who misses a significant portion of the previous season will throw off the methods of the longer equation, so a smaller equation was used which still penalizes him somewhat for missing games, but does not also dock him for not improving during the season.

2. - J
Once each of these statistical categories are accounted for, #1 overall player's value rating is subtracted from all the other players total value. After all the other categories are added onto the player's F value, the ranking begins at a number higher than 1. Subtracting the absolute value of the #1's position restores a numerical ranking.

3. (GxH -H/2)
Once the players are ranked in this system, they must be listed in terms of the rounds of the draft. Multiplying the player's ranking on your own team by the number of teams in the league gives you the number of the last pick of that respective round of the draft. Subtracting half the teams in the league then gives you the median pick of that round. For example, the 5th best player on your league in a league of 10 players would be kept in the 5th round, whose median pick it the 45th player chosen in the draft (10 teams x 5 rounds= 50 picks, 50 picks -half the picks in a round= 45).

4. (GxH -H/2) - F = I
Once you have determined the pick you would have to surrender to keep a player, you can compare it to your player's ranking according to the player value rating (F). If that player is ranked higher (thus having a correspondingly smaller F value), their keeper value (I) will be positive. We now have a number that represents whether it would be worth it or not to keep a player in the round that you would take him.

5. Final analysis: Player 1's I + Player 2's I + Player 3's I.....
You may find that your 7th and 8th round picks have a good keeper value rating, while your 6th round pick does not. To help make the decision on when to stop retaining players, I chose to add the player's I values together and keep the players whose combined I total is greatest.


Notes:
1. This formula assumes that the draft order is determined after each manager submits his list of players-to-be-kept.
2. Formula also assumes that the league has a Yahoo! style ranking system.
3. The constants in the equation are somewhat arbitrary, picked by plugging in values to see what might work. A sophisticated version of this system would determine those numbers by looking at a whole field of players together to figure out what constants best described the results.

Additional note: despite the unresonable ammount of time I was willing to put into this, and my equally unreasonable love for keeper leagues, there is almost no chance I will actually ever bring myself to applying this method to one of my own teams.

No comments: