Wednesday, February 10, 2016

統計 (Statistics)

I love numbers. I like poring over baseball stats, over sales numbers, over Guiness Book of Records numbers, over stock market prices, over election coverage poll numbers. There have been plenty of characters in movies and TV who show an appreciation or affinity for numbers - Erin Brockovich, Yabushita Yoriko of a Japanese TV show I watch called "Date," The Rain Man, and many more. Count me in as one of those who loves numbers.

To me, numbers (and statistics, despite the old adage of "there's lies, damn lies, and statistics) tell a better story about an event than just mere anecdote. The poll numbers tell a better story about the 2016 election than just merely "Trump is winning," (surprisingly still in Republican voting). Tonight his win in the New Hampshire primary was convincing, but there were so many other Republican candidates out there that his margin may be misleading, and the numbers for Kasich, Jeb Bush, Cruz, and Rubio all spell out a very interesting story and is a summation of all their campaign efforts in the state. I can use the numbers to see how candidates did before and how they're doing now,


Statistics factor so heavily into fantasy sports, but especially fantasy baseball. Not only are there the season totals of a player that go hand in hand with a player (I don't even think of a player anymore without associating him with a certain set of numbers, like a .300-30-100 player - i.e., .300 batting average, 30 home runs, and 100 RBI) but also nowadays so many other statistics like strand rate, home run rate, BABIP (batting average on balls in play) that delve much deeper into how a player is actually performing. It's tremendous the amount of data that can be gathered from baseball just because of how many repetitions there are (so many pitches in a given game) but also the different aspects of one single play (on a given play, the pitcher's pitch location can be tracked, the speed of the pitch, the movement of the pitch, the alignment of the defense, the swing rate by the batter, the area where the ball gets put into play, etc. etc.) This is where the world of big data comes in, reported nicely by a book called "Big Data Baseball," a story of the 2013 Pittsburgh Pirates. A great read about how the Pirates simply used numbers that were better indicators of results to game the system and improve their team without spending. Absolutely my dream scenario for my fantasy baseball teams as well as building my

I've always been fascinated by the relationship between fantasy baseball and the numbers. One of the areas I wanted to explore is how the pure numbers get put together to assess a player's value. In standard fantasy baseball leagues, the statistics tracked are Runs, Home runs, RBI, Stolen Bases, and batting average for hitters (5), and Wins, ERA, WHIP, Saves, and Strikeouts (5) for pitchers. However, each of these categories weights one counting stat (or ratio stat) differently than others, like 1 win for a player is not worth the same as 1 strikeout, because there are so many strikeouts that that one K can be lost in the shuffle, whereas wins are much more scarce. How do we rank a player's value (much less his future value based on the assessed information) if all these categories are all jumbled and weighted differently? I guess you have to give some sort of weight variable to each category, come up with something like 11.5 strikeouts = 1 win or something. Even more confusing, though, are the ratio stats, ERA, WHIP, and BA. How do we measure a player's impact on those if they pitch different amounts of innings and have different weighted impacts on those ratio-based categories? Like a reliever's 100 IP of 1.00 ERA can be very valuable, but is it more valuable than a starter's 200 IP of 2.00 ERA? I'm not sure, because 2.00 ERA is still helpful, and you get more of it, whereas 1.00 ERA is more helpful but less impactful due to the volume. I think one also has to figure out the median or replacement value of the averages so that there's a baseline of how much one's positive value impacts the replacement value/median.  It's all very confusing but fascinating to me at the same time, and a big reason I play fantasy baseball/ basketball. Sure it's exciting to watch one's player have a dominating start or hit a home run, but even more thrilling for me is to measure what impact that has on my chances of winning and how valuable that great start or that home run was.

At heart, I think I'm just a big numbers nerd who likes to crunch numbers and act smart.

Fantasize on,

Robert Yan

No comments: