Home

0695417
Ratings Generated in 2013
Featured Stallion

Reader Q&A--Statistical Inquiries on Nicking

Alan Porter's recent post on the Galileo/Danehill nick inspired some lively discussion in the comments section. One of our most outspoken and analytical community members, known by the handle sceptre, posed a very good set of questions that we thought we'd address here in a new blog.

1. Consider two stallions, one has but 90+ named foals of which 16% are SWs. The second stallion has sired 14+ times as many foals as the other and 8% are SWs. All else equal (including age not a factor), how certain are you that you would prefer to breed to the first referenced stallion rather than the second? (speaks to the broader question of statistical significance).

Thanks for taking the time to post. Not to be smart, but your first question – regarding percentage of stakes winners to foals – is a multifactoral question and requires consideration of context to answer properly. For example, if Stallion A (90+ foals, 16% SW) stands in a regional market in the U.S. and achieves that statistic against only state-bred competition, and the other stallion (1,200+ foals, 8% SW) is standing in Ireland and duking it out with the best in Europe, then the stallion in Ireland may well be the better sire. Equally if Stallion A began to attract better mares, and proved that his progeny could make it outside of restricted company (for example Unusual Heat in California), we might elect to go with Stallion A, all other things being equal.

Bearing down on your question though – is 90 named foals significant enough? Again, this depends on a number of factors. If 50 of those foals have started and they have made an average of eight starts each (this is the average number of starts required to win a race) and there are 14 stakes winners, then within the context above, (i.e that they were not all restricted state-bred stakes winners), we would say there is sufficient evidence to establish that this is a very good stallion. Incidentally, one of the sub-rules of calculation within TrueNicks is that the starters within the cross have to have collectively enough starts to win a race before it will make the calculation, which is something we think is really important when it comes to statistical significance.

Of course with regard to the blog item that you were responding to, we were considering not the overall record of a stallion, but the record of a very specific cross (Galileo with mares by Danehill). That cross has 10 stakes winners from 62 starters (16%), and our conclusion would be that 62 starters bred on a cross provides sufficient data for statistically significant results, especially when the return can be contrasted with Galileo’s record with all other mares, and that of the Danehill mares with all other stallions.

We are not sure if you meant to, but you do raise the interesting concept of 'quality' within a nick. A limitation of TrueNicks – as with all such programs – is that we do not alter the rating based on the quality of the horses produced on the nick, rather we give the major winners (and a lot of other data) so that users can make some intelligent interpretation. Generally this poses little problem in terms of practical use, as breeders and buyers are likely to be comparing like with like, for example choosing between two sires standing in the same price range, rather than say, comparing a B nick with A.P. Indy as the sire to an A+ nick by a $5,000 regionally-bred sire.

We have a current project in development where we are looking into ways of signifying the quality of not only the ancestors used in the mating (i.e. the racing/production ability of the sire, dam, and broodmare sire), but also the racing ability of all the horses bred on the nick (another advantage of having access to all the foals/starters bred on the cross!). While this wouldn't alter the nick rating itself (based the percentage of stakes winners to starters), but it could lead to describing cases where the nick may be a good one, say an A+, but the quality of the stakes winners bred on the cross, or of the parents, might be relatively modest...we are kicking around ideas here on how best to do this, including looking at some multivariate regression analysis of a number of figures outside of the TrueNicks. We’ll keep our readers posted on the progress of that study.

2. How often, if at all, have you done retrospective analysis of situations such as that offered in your Galileo/Danehill vs Galileo/all others? I realize that by being "publicized"/"accepted" a nick may later somewhat "dilute" (potential overload of lesser quality mates by broodmare sire, etc.), but careful analysis can eliminate this variable.

We have looked at this quite a lot, and for a long time. If you go back and look at Alan's second post on the TrueNicks blog back in December of 2007, it is about Kingmambo and Sadler's Wells mares. This has been one that we have really copped some grief for because it is rated well by eNicks, but not by us. On TrueNicks, it has varied between a C and a very weak B for four years now (we have the original data that TrueNicks was developed off back in early 2007 and it was a C then). It is however, one of those matings that people love to do (the inbreeding to Special seems to hold some mysticism!) and when it gets a good one it is a really good one, evidenced by the six group/grade I winners bred on the cross. But it has been tried an awful lot, and the slow ones on the cross are really, really slow, i.e. some have become slow hurdlers which is a task in itself to breed. Breeders like to forget the slow ones, but the TrueNicks algorithm doesn't! It is a question of the percentage of stakes winners bred on the cross being only a little above average for the sire and broodmare sire (Kingmambo and Sadler’s Wells, respectively), but the quality of the material used ensuring that when it does work, it often works very well.

A good example of a rating that went the other way under the pressure of significant numbers is that of Unbridled's Song with Storm Cat mares. This initially started quite well, and it was a solid B+ when Unbridled’s Song sired three stakes winners on the cross in his 2003 crop (Magnificent Song, Half Ours, and Noonmark), which followed on from the success Buddha – a grade I winner bred on the cross – but it has only one stakes winner in his 2006 crop (current 5yo's) and it hasn't produced one since. If anything in that case the 'quality' of Storm Cat mares bred to Unbridled's Song in the years 2005, 2006, and 2007 (current 5yo's, 4yo's, and 3yo's), were significantly better than the ones he received in his earlier years at stud, and they may be in even better hands/stables/care if that is possible, so one would think that there would be more stakes winners bred on the cross but there is not at this stage. It is a little perplexing at some level, but it is a good reflection of TrueNicks that it is making allowances for this rather than continuing to represent the cross as something that it is not. The rating now is a solid C, and there are over 120 foals bred on the cross of racing age.

These are just two examples, and there are others, such as More Than Ready with Danehill mares in Australia, that we continue to monitor. What is notable however is that in the case of Kingmambo/Sadler's Wells it went from a C to a weak B and Unbridled's Song/Storm Cat went from a B+ to a solid C. They both moved under the pressure of more foals/starters being bred on the cross at hand, but neither of them jumped to becoming an A+ rating or decayed significantly to becoming a D rating. They stayed pretty close to the statistical parameter that might have been expected from the initial rating.

What makes Galileo/Danehill unique is that it is an A+ from 98 foals (62 starters). Kingmambo/Sadler's Wells is a B from 106 foals (82 starters) and Unbridled's Song/Storm Cat is a C from 126 foals (77 starters). Obviously we have answered the question if we feel that 98 foals and 62 starters is a statistically significant number – the answer is yes – but the more important question that you were alluding to is "under the weight of numbers to come, will Galileo/Danehill keep its A+ rating, or will the performance of Galileo with all other broodmare sires, and/or the performance of Danehill mares with other stallions, outperform those bred on the cross?" (phew!) That is a good question, and one that needs a little more solid research to answer definitively, but on what we have seen with Kingmambo/Sadler's Wells and Unbridled's Song/Storm Cat, it could have the potential to improve (although this looks a less likely scenario), stay around the same, or decay a little, but no more than down to a solid B+...how is that for a prognostication! Of course to a degree, this does depend on how carefully the Danehill mares bred to Galileo are selected. If we take a look at the Kingmambo/Sadler's Wells or Unbridled's Song/Storm Cat scenario as examples, we suspect that one of the reasons for the deterioration in the strike rate of some of these very popular crosses is that they become a “default option” and are employed with insufficient consideration for the individual mares utilized. Another reason, and we have seen this a little with Speirbhean (dam of Teofilo), the repeatability of full relations becoming stakes winners is often not as easy as it seems to be. There are some mares that do very well with repeat matings, while others, either through environment (age of mare, etc.) or the variance of genetic inheritance, fail to repeat a successful mating with another stakes winner.

3. I would also like to see your "numbers" for only those who competed in Ireland, England, and France.

So would we! We can parse out the Galileo with Danehill mares easily enough, but separating out the "all others" would be a massive task that we would have to enlist The Jockey Club to help us with. Right now we have other projects in development (one is a research tool that will ultimately help us improve TrueNicks significantly by using it) and another three new products that we are in the process of programming, so a tool to do this type of analysis will have to wait...for now.

Thanks again for your comment Sceptre. If you or anyone else has any follow up questions please feel free to post them below.

Filed under: , ,

9 Comments:

Ian,

On an earlier thread you mentioned that the cross of all Northern Dancer sires over Northern Dancer mares produces a below average strike rate of stakes winners.  Can you provide the strike rate (and frequency in the population if you can) of the most popular sire of sire cross.  AP, MrP, ND, Roberto, and Blushing.  This should produce the following permutations:  APxMrP, APxND, APxRob, APxBlush, MrPxAP, NDxAP, RobxAP, BlushxAP, MrPxND, MrPxRob, MrPxBlush, NDxMrP, RobxMrP, RobxMrP, NDxRob, NDxBlush, RobxND, BlushxND, RobxBlush, BlushxRob, ApxAp, MrPxMrP, NDxND (already completed), RobxRob, BlushxBlush

This would identify the top sire of sires TrueNick.  I'm sure most people would of thought that ND over ND would of been the top Sire of Sires TrueNick, but I was surprised to read that this is not the case.  

I do understand that this can be done with some creative matings using your online resource but I thought it would be nice if you would provide for readers.  

Europeans:  This is a cry to please stop complaining about distance.  Fans have been doing this for three centuries and it is old.  Horses in America are primarily bred to win the Classics on dirt from 8 to 10 furlongs.  To do this horses need to be extremely fast (fastest in the world) and carry their distance.  Biomechanics, DNA, Nicks, cardio etc... prevent them from getting the Classic distances so we have a number of six furlong races.  Get over it!    

Ryan 27 May 2011 11:07 AM

Hi Ryan,

To clarify the comment about Northern Dancer: in general, crossing a sire and broodmare from the same immediate sire line under-performs opportunity.

A good current example is Smart Strike/Mr. Prospector. This is a Mr. P/Mr. P cross; there are 64 starters by Smart Strike and sons out of Mr. P line mares, but zero SW.

With further generations certain crosses branch off, especially between contrasting examples of that sire line--for example, we wouldn't call Galileo/Danehill a "Northern Dancer/Northern Dancer" cross since it has distinguished itself specifically.

Furthermore, calling it a "Nearco/Nearco" cross would be completely irrelevant due to genetic divergence. Nicking is only useful when comparing groups of horses with some degree of genetic similarity. This is why TrueNicks only looks back as far as [grandsire of the sire] and [grandsire of the broodmare sire]. Encompassing more distant relations would make nicking comparisons impossible, as genetic divergence brings the group closer to breed average.

Regarding your comment on "top sires of sires", I'm not one to take stock in the "sire of sires" argument--I think that's a myth. A sire is either good or not. While sons of a particular stallion usually maintain similar genetic affinities to their sire, a stallion's individual success or failure at stud is not due simply to his sire or cross, but rather his total genetic profile and a multitude of environmental factors (including human decision-making).

Ian Tapp 27 May 2011 2:48 PM

Dear Byron,

I'm appreciative and honored that in this blog you chose to focus on my recent questions. Please know that they were indeed honest queries, rather than veiled statements of disagreement. While your Galileo/Danehill vs Galileo/all others "numbers" are what initially prompted my reflection and then questions, their comparative/analysis is but a simple example of the larger issue which continues to perplex-That is, what numbers are sufficient for samples such as these to render an accurate view of reality? I am not a statistician, and I assume that those skilled in this field could shed some light, but my guess is that they would raise also the spectre of variables (of which they may be far less acquainted with respect to the topic at hand) when attempting to answer this. Such a question (above) also applies to your hierarchal nick ratings, among other like things... I did note your attempt to answer my questions, but for me they fell short of what I was hoping for. Incidentally, I state in my first question "...all else equal...", so "...consideration of context (etc.)..." is off the table. Also, Alan specifically mentioned that the "quality" of Danehill mares vs all others was essentially equal. As to my second question, that concerning retrospective analysis, some of your examples, i.e. Unbridled's Song with Storm Cat mares did little more than add fuel to the fire...I fear that what I'm asking-How does one determine what number is sufficient"?-is not easily answered, and that its difficulty is compounded by the degree and complexity of the variables involved (genetic mechanisms themselves, many of which remain unknown, are among such variables). It's apparent to me that TrueNicks continues to fine tune this variable variable (meant to be written twice), but there is also intertwined within this the question of sufficient number. As said, I'm not a statistician, so perhaps/or perhaps not my questions are valid ones, and a true statistician might have more questions to pose. I did raise the retrospective issue, because I felt that a proper retrospective analysis of large numbers might evoke more confidence in the quantity of numbers you deem as sufficient presently.        

sceptre 27 May 2011 9:59 PM

Sceptre,

We took your comment as read. After all, you have been a long time contributor to this blog and posts like yours do make us really think and evaluate the TrueNicks rating and algorithm like we have here. So in that respect we are always grateful that people like you take the time to make comment. I also apologize for the length of this reply in advance.

We did actually get a PhD statistician to look at the initial numbers to determine firstly if the rules that we created would generate a statistically significant sample size and then if the ratings that were generated had statistical significance themselves. We were quite guided by what he said in that we didn't go back any further than the third generation on the sireline and the fourth generation on the broodmare because of his recommendation that the statistics generated were pretty meaningless.

In regards to the first part, what it came down to was a statistical power analysis whereby we had to determine (a) how large a sample is needed to enable statistical judgments that are accurate and reliable and (b) how likely our statistical test will be to detect effects of a given size in a particular situation. If the sample size is too low, then the test will lack the precision to provide reliable answers, if sample size is too large, time and resources will be wasted, often for minimal gain.

As you may recall we studied some 100,000 horses to create TrueNicks and it turns out that we were guilty of the latter - 100,000 was way too many horses for very little gain. The a priori power assumption was that it would be somewhere around 30,000 horses to make valid assumptions on the total population, but we overpowered the analysis on the basis of it to be 'better safe than sorry'. Subsequent post-hoc analysis proved that we could have done the analysis on just 3,000 random horses and we would have gotten similar results with similar sampling error and had enough power in relation to the population.

Moving on to the calculation itself, and more specifically your question of "how does one determine what numbers are sufficient", it is a difficult question to answer indeed because among other factors it not only boils down to how many starters you need, but how many starts these starters have had to give them the opportunity to perform on the racetrack.

TrueNicks relies on two rules that only calculate on the basis of a number of starters making a significant number of starts. Again, when we were designing the rating we looked at different scenarios of starters (10,15,20,etc) and starts (100,200,300, etc), before settling on the parameters that we use today. This wasn't as easy as it may sound as we had to consider young stallions with their first runners, stallions whose progeny take time to mature and scenario's where old stallions were being bred to very young broodmare sires as well.

We also do a 'faceoff' in within the rules. If you have a hypothetical mating there is just as much relevance, generationally speaking, between a calculation done on the sire of the sire (2 gen) and the broodmare sire (2 gen) and the sire (1 gen) and the sire of the broodmare sire (3 gen). Again, we have a logarithmic calculation to return the most relevant rating.

There was also one deviation. You will notice that from time to time there are calculations made on a small number of runners - i.e Distorted Humor with El Prado mares - where there are two or more stakes winners out of unique mares. These are some of the highest variant scores that you will see. We wrestled a bit with this one as it is surprisingly a rather rare occurrence and we decided that it was better to calculate and show the rating based on two unique mares producing stakes winners on the cross, than not. That is probably the only part that fails statistical significance, but it is still actually quite predictive (i.e subsequent foals bred on the cross still become stakes winners at a higher rate than the alternatives, even if it is much less than the initial success).

Overall, we are quite satisfied as to the sample size we are using to make calculations and the predictive robustness of TrueNicks, but we are never satisfied that we can't improve on TrueNicks to make it an even better product. In line with your question on retrospective ratings, we are looking to develop a historical lookup tool that will allow us to look at larger populations of horses in various countries and have the data collected at various time points.

What does interest us now is to look at the variants, rule #, starters and starts considered when a rating is generated at a particular time or times (say conception, date of birth, 6 months after birth, 12 months after birth) and then see if there are more discreet differences within the rating based on the rule that it has been calculated on and the number of starters and starts, along with some quality indices of the horse to see if there is an even more predictive refinement that we can make. On the stats we have collected, it is possible, and probably not surprising to you and others, that not all "A's" are created equal and there are certain "A's" bred on certain rules with certain starter/starts parameters that may be significantly more predictive of subsequent stakes success than others.

Byron  

Byron Rogers 28 May 2011 2:51 PM

Hi Byron,

I wrote a lengthly reply to your most recent post, but the system apparently failed and it wasn't transmitted (I retained no copy). I won't attempt to re-create it, and to be honest, I have a whole new set of thoughts that for now I'll refrain from offering-not sure that any will come as news to you. Perhaps, one day we can communicate by phone or in person.

Thanks very much for airing this, and for your detailed and thoughtful replies.

sceptre 31 May 2011 10:00 PM

Sceptre

No problem. Feel free to email byron@pedigreeconsultants.com

Byron

Byron Rogers 01 Jun 2011 3:23 PM

The proof is in the pudding!!! Three Group 1 European Guineas winners in the UK, France & Ireland in a 2 week period recently for the Galileo x Danehill cross,

Frankel, Golden Lilacs & Roderick O'Connor, was truly an amazing performance, they join a growing list bred on this cross.

MurrayK 04 Jun 2011 6:39 AM

I have a mare Knatmeg Scenic/Victorian Opal and am looking for a Darshaan / Shirley Heights type cross in Australia. Does anyone know of a stallion or suggest a mating.

Robert Cooke 26 Aug 2011 6:52 AM

Robert,

Unfortunately there are not a lot of options in Australia for stallions with Darshaan/Shirley Heights. Maybe a horse like Spinning World who is by Nureyev out of a Riverman mare might fit? You might want to try running a TrueNicks Broodmare Analysis report with stallions that fit into your price range and see what that returns for you.

Best of luck,

Byron Rogers

Byron Rogers 26 Aug 2011 2:57 PM

Leave a Comment

All comments are moderated and must be approved before they are posted. The blog author reserves the right to edit or omit any comment.

  (Appears with your comment) (required)
  (Will not be published) (required)
  (required)