tag:blogger.com,1999:blog-38600807.post6783956920623033679..comments2018-06-02T14:19:34.554-04:00Comments on Advanced Football Analytics (formerly Advanced NFL Stats): Verducci Follow-UpUnknownnoreply@blogger.comBlogger29125tag:blogger.com,1999:blog-38600807.post-67968867883600654072009-08-04T17:20:48.210-04:002009-08-04T17:20:48.210-04:00I don't know the Verducci effect research very...I don't know the Verducci effect research very well, but I would think that they decide whether a pitcher is at risk for the effect in a given year based solely on previous years, which would mean that your model doesn't fit. That's what's implied by the definition that you linked to in your previous post (pitchers tend to underperform the year after they've had a substantial increase in innings pitched). And every year there are warnings about which pitchers could be in trouble this year because of the Verducci effect, and those warnings have to be based only on previous seasons.Vincenoreply@blogger.comtag:blogger.com,1999:blog-38600807.post-59110705661838772912009-08-03T08:30:10.210-04:002009-08-03T08:30:10.210-04:00There's also a slight discrepancy in your meth...There's also a slight discrepancy in your method. It's not perfectly 25%, it's 24.967..% or 24.992% depending on how you handle the case of five injury years in a five year career.<br /><br />There are 3,125 different career possibilities, In 1024 of them, no injury. In 1280 of them one bad year, which follows the best year in 320 of those cases. In 640 of them, two bad years, and 320 of those cases have a bad year after the best year. Another 160 cases have 3 bad years, with 120 of those showing the effect. 20 years have 4 bad years and all 20 of those have a bad year after the best year. Finally, 1 case has five bad years. I'm not sure what to select for the best year in that case. We can either throw the case out (24.967..%) or count it as a bad year after the best year (24.992%).<br /><br />If the five bad year case is ignored, there are 3124 different cases, with 780 injury years after the best year. This gives the 24.967..% rate. Counting the five bad year case as a bad year after the best year gives 781 positives out of 3125 cases, or exactly 24.992%. Experimentally both are hard to distinguish from 25%.<br /><br />Also, Vince has it right on the independence. If you're taking the best year considering all years, you've thrown away independence after that selected year by your selection method and so the impact you're describing does happen. If you select the critical year looking only at that year and the years before, then the following years are independent.<br /><br />The only way to tell which Verducci used is to read the research if it's been published in sufficient detail. Anyone have links to the original articles as I'm somewhat curious as it's a story about having to be careful about your sampling, and I'd like to get the punchline correct when I tell the story.Dan Rnoreply@blogger.comtag:blogger.com,1999:blog-38600807.post-17711848232859162732009-08-02T16:26:06.632-04:002009-08-02T16:26:06.632-04:00I think you might be right. That's the differe...I think you might be right. That's the difference. If you pick out the first 3 or higher, I think you would see 20%. But if you pick out the last 3 or higher, or the highest number, you'd get 25%. I coded a couple simulations, and that's what I'm seeing.<br /><br />The question is, which is closer to reality when analyzing pitcher injuries.Brian Burkehttps://www.blogger.com/profile/12371470711365236987noreply@blogger.comtag:blogger.com,1999:blog-38600807.post-29275992531199127592009-08-02T15:28:13.616-04:002009-08-02T15:28:13.616-04:00I'm also enjoying the argument - that's a ...I'm also enjoying the argument - that's a main reason why I'm continuing it (the other main reason is that I'm pretty sure that you're wrong about this). The problem with your latest simulation is that you're picking out the highest number in the whole row, which uses information about what comes afterwards. As I've mentioned in earlier comments, you have to pick out a cell (or a year or a card) based solely on what's come before it - then independence will guarantee that whatever comes after it will match the base rate. That's how the Verducci analysis works - it picks out the first year in which your workload is substantially higher than its been before, without taking into account what you do in the following years, and then looks at your next season.<br /><br />What if you call the first cell that is at least a 3 the "break-out year" and look at the cell to its right? Then you'll see a one there 20% of the time.Vincenoreply@blogger.comtag:blogger.com,1999:blog-38600807.post-55281264828134451112009-08-02T15:08:58.038-04:002009-08-02T15:08:58.038-04:00I don't mean to be argumentative. I am enjoyin...I don't mean to be argumentative. I am enjoying our discussion and I respect your thoughts, or I wouldn't be responding. Too bad we aren't sitting in front of a white board.<br /><br />But seriously try it! I did it again, this time with an Excel spreadsheet (so no shuffling or replacement issues, and pure independence).<br /><br />I filled in each cell with a random number 1 through 5. I called the highest number in the row the break-out year, and looked at the cell immediately to the right. According to your reasoning I should see a one there 20% of the time. According to my reasoning I should see a one there 25% of the time.<br /><br />For a sample of 200 rows, I found 55 ones to the right of the 'breakout' year. That's 27.5%, significantly different than the 20% you would expect (p=0.008 for n=200).Brian Burkehttps://www.blogger.com/profile/12371470711365236987noreply@blogger.comtag:blogger.com,1999:blog-38600807.post-53908037606949390112009-08-02T14:51:50.155-04:002009-08-02T14:51:50.155-04:00Vince-I disagree. Try it yourself. The cards to th...<i>Vince-I disagree. Try it yourself. The cards to the right will be 1/3 diamonds, and the cards to the left will be 1/3 diamonds.</i><br /><br />That's impossible (unless you're using some other procedure or you're getting small sample size randomness). If you pick out the leftmost spade from the row, then the cards to its right will contain identical proportions of diamonds, hearts, and clubs (since nothing distinguishes those 3 suits), and some of the cards will be spades, so there must be less than 1/3 diamonds.Vincenoreply@blogger.comtag:blogger.com,1999:blog-38600807.post-37390547370662143412009-08-02T14:49:59.116-04:002009-08-02T14:49:59.116-04:00I'll also disagree about what the Verducci met...I'll also disagree about what the Verducci methodology is. They must have had a finite data set. Your model waits for the future to happen, in which case it's a 1 in 4 chance. Their model excludes 1 card from the denominator in every case, after seeing all the cards, which is my model.Brian Burkehttps://www.blogger.com/profile/12371470711365236987noreply@blogger.comtag:blogger.com,1999:blog-38600807.post-6210032068385916772009-08-02T14:28:00.020-04:002009-08-02T14:28:00.020-04:00Vince-I disagree. Try it yourself. The cards to th...Vince-I disagree. Try it yourself. The cards to the right will be 1/3 diamonds, and the cards to the left will be 1/3 diamonds.Brian Burkehttps://www.blogger.com/profile/12371470711365236987noreply@blogger.comtag:blogger.com,1999:blog-38600807.post-32468118989753029292009-08-02T14:10:41.949-04:002009-08-02T14:10:41.949-04:00But if you deal all 4 cards, and then look them ov...<i>But if you deal all 4 cards, and then look them over removing 1 non-diamond, the remaining cards will be a diamond 1 out of 3 times.</i><br />It depends on which cards you look over, and on how you decided which card to remove. If you remove a non-diamond and then look over every remaining card in the row, then 1/3 will be diamonds. But if you remove the leftmost non-diamond in the row (or the leftmost spade), and then you only look over the cards to the right of the card that you removed, then 1/4 will be diamonds. In general, if you decide which card to remove based only on that card and the cards to its left, and then you only look over cards to its right, then 1/4 of the cards that you look over will be diamonds. That has to happen because of independence.<br /><br />And that's the way the Verducci analysis works. They pick out the "high workload year" based solely on that year and the previous seasons, and then they look at what happens in the following year. Their analysis matches my model, not yours. And they still find an effect, so it must have some explanation besides the one that you're giving.Vincenoreply@blogger.comtag:blogger.com,1999:blog-38600807.post-42105232931397164872009-08-02T13:52:16.724-04:002009-08-02T13:52:16.724-04:00Vince-Your recent example above is good. I think w...Vince-Your recent example above is good. I think what you're saying is "here's the right way to model pitching injuries." And I agree. Your model predicts no Verducci Effect.<br /><br />What I'm saying is that Verducci/Carroll are not modeling it that way. They're modeling it the way I am with my card game and finding an effect.Brian Burkehttps://www.blogger.com/profile/12371470711365236987noreply@blogger.comtag:blogger.com,1999:blog-38600807.post-76281467018936874402009-08-02T13:47:56.171-04:002009-08-02T13:47:56.171-04:00Just from a brief examination of the past two arti...Just from a brief examination of the past two articles, what you are explaining sounds very similar to the Hurst Exponent. Hurst actually discovered this phenoma in the same manner that you are experimenting with the card deck. Not sure how it exactly relates but will look into it more and update further.John Candidonoreply@blogger.comtag:blogger.com,1999:blog-38600807.post-59933372843436941322009-08-02T13:41:46.194-04:002009-08-02T13:41:46.194-04:00Vince-That's true. Every single card has a 1 i...Vince-That's true. Every single card has a 1 in 4 chance of being a certain suit. As you deal cards forwards in time (with replacement), that will always be the case.<br /><br />But if you deal all 4 cards, <b>and then look them over</b> removing 1 non-diamond, the remaining cards will be a diamond 1 out of 3 times.<br /><br />That's the paradox. It all depends on <i>which direction in time</i> we are looking.Brian Burkehttps://www.blogger.com/profile/12371470711365236987noreply@blogger.comtag:blogger.com,1999:blog-38600807.post-86047288805632136232009-08-02T13:34:22.097-04:002009-08-02T13:34:22.097-04:00Vince and Anon above- I shouldn't have said &q...Vince and Anon above- I shouldn't have said "let's be real..." That sounds flippant. You are correct, period. I should have said: the non-replacement effect is extremely small at the beginning of the deck.Brian Burkehttps://www.blogger.com/profile/12371470711365236987noreply@blogger.comtag:blogger.com,1999:blog-38600807.post-57630516455760544862009-08-02T13:32:58.858-04:002009-08-02T13:32:58.858-04:00The chance of randomly finding an injury year is n...<i>The chance of randomly finding an injury year is now 200 out of 800, or 1 in 4.</i><br />But you're not randomly selecting a year - you're looking at the year immediately after the year that you removed (and you didn't remove a year at random - you removed the first breakout year). As I said in my last post, in your model the years prior to a breakout year are more likely than average to be injury years (since we know they weren't good years - they must be either injury years or lousy years), but the years after a breakout year only have injuries at the average rate (1/5, in this case).<br /><br />Here's a simpler way to do it with cards, instead of dealing with face cards. Diamonds are injury years, spades are good years, and clubs & hearts are lousy (injury-free) years. In each row of cards, find the first spade. 1/3 of the cards prior to the first spade will be diamonds (1/3 clubs, 1/3 hearts, 0 spades). But the first card after the first spade (or any of the other cards after the first spade) will be equally likely to be a diamond, spade, club, or heart.Vincenoreply@blogger.comtag:blogger.com,1999:blog-38600807.post-49544415927621279082009-08-02T13:29:46.655-04:002009-08-02T13:29:46.655-04:00If you have a 1/5 chance of an injury season, does...If you have a 1/5 chance of an injury season, doesn't that number remain constant regardless if you have an injury year or a career year. Your risk for injury is still 1/5 the next season no matter what you do.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-38600807.post-3069663254060978072009-08-02T13:24:01.068-04:002009-08-02T13:24:01.068-04:00Brian - It seems like you're explaining why th...Brian - It seems like you're explaining why the effect exists in the sample of 1999-2005 pitcher seasons (or whatever)<br /><br />I think the people disagreeing with you are thinking that you are explaining why some pitcher who has a Verducci Season this year will be extra likely to get hurt next year. Which you don't seem to be arguingJB Hnoreply@blogger.comtag:blogger.com,1999:blog-38600807.post-4176957249296275262009-08-02T13:13:42.707-04:002009-08-02T13:13:42.707-04:00Anon-That is how it works. In the example, any sea...Anon-That is how it works. In the example, any season has a 1 in 5 chance of being an injury, and a 4 in 5 chance of being a non-injury year.<br /><br />You could have more than 1 "career"-type years, in which case removing additional known non-years them would enhance the illusion. The way I conceive of it is that when Verducci looks at a year-pair, at least one must be a non-injury year, removing it from the equation.Brian Burkehttps://www.blogger.com/profile/12371470711365236987noreply@blogger.comtag:blogger.com,1999:blog-38600807.post-53027674332942941322009-08-02T13:06:39.516-04:002009-08-02T13:06:39.516-04:00Alternatively, lay out the entire deck of cards on...Alternatively, lay out the entire deck of cards on the table. <br /><br />--13 rows of 4 cards<br />--52 cards<br />--13 cards are diamonds (1 in 4)<br />--Remove 1 non-diamond from each row<br />--39 total cards remain<br />--13 diamonds remain<br />--randomly selecting any card gives you a diamond 13 out of 39 times = 1/3<br /><br />In any statistical analysis there is a finite sample. There is a finite number of pitchers, seasons, and injuries. When Verducci/Carroll are looking through their data sample, they are essentially dealing with large but finite deck of cards.Brian Burkehttps://www.blogger.com/profile/12371470711365236987noreply@blogger.comtag:blogger.com,1999:blog-38600807.post-58776640449658260492009-08-02T13:05:18.876-04:002009-08-02T13:05:18.876-04:00Why are you removing the diamond cards? Why can...Why are you removing the diamond cards? Why can't the injury rate be 1/5 the next season with a 1/5 chance of a career year again?Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-38600807.post-33524655863478399652009-08-02T12:55:24.972-04:002009-08-02T12:55:24.972-04:00PS The card game just illustrates the concept. Loo...PS The card game just illustrates the concept. Look at the math in the pitcher-injury example: <br /><br />--200 pitchers, 5 seasons each<br />--1000 "pitcher-seasons"<br />--1 out of 5 is randomly an injury year (<i>independently, with replacement</i>)<br />--200 injury years in the sample<br />--Remove a known non-injury year from each pitcher, 800 pitcher seasons remain<br />--The chance of randomly finding an injury year is now 200 out of 800, or 1 in 4.Brian Burkehttps://www.blogger.com/profile/12371470711365236987noreply@blogger.comtag:blogger.com,1999:blog-38600807.post-22474446480507338172009-08-02T12:50:32.398-04:002009-08-02T12:50:32.398-04:00Anon-I disagree. The Verducci analysis is looking ...Anon-I disagree. The Verducci analysis is looking back in the past to find a relationship, which naturally occurs due to post hoc selection bias. The analysis infers causation where non exists. <br /><br />Vince/Anon-I see your point about shuffling the deck for every card, but let's be real. That's not going to make a practical difference.Brian Burkehttps://www.blogger.com/profile/12371470711365236987noreply@blogger.comtag:blogger.com,1999:blog-38600807.post-2745346094623015182009-08-02T12:38:37.331-04:002009-08-02T12:38:37.331-04:00Verducci is saying that pitchers with increased wo...Verducci is saying that pitchers with increased workloads often get hurt the next season<br /><br />You seem to be saying that in a sample of X seasons, if you remove the season with the most IP, the injury rate will be higher than normal in the remaining seasons.<br /><br />The two statements hardly seem related. You requier knowledge of the entire sample before manipulating the sample. Verducci is making predictions about the future.<br /><br />You would need to shuffle the deck every time you look at a card for your game to accurately model the Verducci effectAnonymousnoreply@blogger.comtag:blogger.com,1999:blog-38600807.post-42236107210626628642009-08-02T12:18:37.608-04:002009-08-02T12:18:37.608-04:00The cards prior to the first non-diamond face card...The cards <i>prior to</i> the first non-diamond face card will be disproportionately diamonds (around a third of them will be diamonds - I think the proportion should be 13/40 with this setup, since they can't include any of the 12 non-diamond face cards), but the card <i>following</i> the face card will not be any more likely to be a diamond - it will be a diamond about 1/4 of the time (technically 13/51 since the cards in the simulation aren't entirely independent).<br /><br />If you want to make the cards completely independent, you could take each of the 4 cards from a different deck. Take one card from each deck until you get to a non-diamond face card, then flip a card from the next deck and see if it is a diamond. It will be a diamond one fourth of the time. (If none of the first 3 cards is a non-diamond face card, then you have to throw out that iteration and move on to the next pitcher's first four years.)Vincenoreply@blogger.comtag:blogger.com,1999:blog-38600807.post-11219864418185397822009-08-02T11:30:31.186-04:002009-08-02T11:30:31.186-04:00Vince-Removing any non-diamond card would make it ...Vince-Removing any non-diamond card would make it 1 in 3. It doesn't have to be the highest, first, last, or whatever--just not a diamond.<br /><br />But I think your way of looking at it is similar to the way the Verducci analysis works in real life. Maybe, to be as realistic as possible, you'd remove the first non-diamond over 10 or something. But still, it doesn't matter as long as you take one non-diamond out of the mix. It's going to be 1 in 3.Brian Burkehttps://www.blogger.com/profile/12371470711365236987noreply@blogger.comtag:blogger.com,1999:blog-38600807.post-38705238612545987052009-08-02T11:18:07.032-04:002009-08-02T11:18:07.032-04:00You're choosing which card to pick out based i...You're choosing which card to pick out based in part on what comes after it, which does not match how people look at the Verducci effect. What if, instead of picking out the highest non-diamond, you picked out the first non-diamond face card? Then I think that the probability of having a diamond on the next card would still be 25% (or 13/51, if you're dealing the 4 cards without replacement from a single deck, making them non-independent).Vincenoreply@blogger.com