![]() |
ISDS Border Collie Database |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Home Pedigree Service Dog names Articles: - DB setup - Key Dogs - Founders - Inbreeding No Wiston Cap Cap and Cap Number of dogs Missing parents ROM dogs Key dogs #Pups per dog Popular dogs Prefixes Regional Breeding Where do they go Last updated: 4 Apr 2010 Teun v/d Dool info@bcdb.info © 2002-2022 |
STATISTICS FROM THE STUD BOOKSAlso published in Working Sheepdog News May/June, July/August, and September/October 2002Copyright © Teun C. van den Dool, Jan 2002-2021 Internet: http://www.bcdb.info
1. INTRODUCTIONDo you know the top-3 of most popular dog names? These are Meg, Nell and Ben. They are used 11079, 9274 and 7840 times respectively in the ISDS registered population of more than 250000 dogs. The top-ten of dog names is: Meg, Nell, Ben, Glen, Fly, Moss, Roy, Spot, Jess and Cap. Cap is still used 3954 times and 27% of all dogs have a top-ten dog name. The change in popularity of several names is displayed in figure 1.
And what's more:
You might think 'scanning all those pages, what a job'. But it proved to be less than 10% of the effort involved. Although it must be said that the quality and consistency of scanning has major impact on the time used for the subsequent processing steps and was therefore carried out on a single system by one person. Most of the work involved correcting the errors of the OCR processing and errors in the StudBooks themselves. The OCR software is approximately 99% accurate. That seems pretty good but a StudBook contains 1 million characters on average. So 10000 errors are introduced in every processed book. Luckily most data in every StudBook occurs twice and OCR-errors have typical behaviour. Approximately 90% of the OCR-errors could be corrected automatically with suitable software. That left 1000 OCR-errors plus original errors in every StudBook. These last errors have been corrected by hand after they were found by cross-checking with software specially developed for this job. Of course I do not know the number of errors left in the final database, otherwise I would have corrected them. But I estimate that some hundreds may be left in each StudBook. Most of these errors occur in the names and addresses of breeders and owners, which I consider the least interesting information anyway. The most important information are the ISDS numbers and these, I believe, are more accurate than the Stud Books themselves. This is because ISDS numbers form an almost gap-free series and every ISDS number should occur twice in each StudBook with accompanying name of dog and owner. This enables an almost perfect cross-check. Furthermore, the ISDS numbers of parents can be cross-checked with data (name of dog and owner) from earlier years already available in the database. Something like 10 parents per StudBook might still have wrong ISDS numbers, because they are clearly erroneous and none or multiple dogs were found with the same name and similar ISDS number.
As an example of error correction, consider Joy 5290, registered in the StudBooks as:
A similar example is Phil 13063:
As a final example consider Nell 3514, one of the most important bitches in the ISDS population as we will show below. She is registered as: These examples are particularly interesting because they all appear in the pedigree of famous Wiston Cap 31145:
Especially the first two StudBooks contain many of such obscurities. Often also details of dogs are missing in the second part of the StudBooks. The seventies were worst in this respect, culminating in details of 85 dogs missing in 1976 and more than 400 in total. Their parents could sometimes be found by looking for likely littermates in the first part of the StudBooks. For approximately 70 pups the parents could not be found and 24 of them produced offspring later on. Note: up to and including 1960 the Stud Books had a separate chapter 'Litter Records' that listed non-registered pups from (often additional) litters. I did not include these in the database. However, some extra information on early dogs was added from Sheila Grew's 'Key Dogs' and from Barbara Carpenter's 'The Blue Riband'.
2. KEY DOGSNow such a database is handy in quickly browsing through pedigrees. It is impressive to generate 25 generation pedigrees all down to unregistered dogs within seconds. But that does not necessarily give insight. You need to study those pedigrees and recognise links between them. But that again takes much time if you do it by wandering around. So I became interested in statistics. Let's see if we can retrieve some key dogs that way.A warning is in place here. The following will involve a considerable amount of genetics and mathematics now and then with some references to scientific literature. Don't let it distract you too much. Just read on to get a feeling for how I arrived at the conclusions and enjoy the figures. Let us calculate the influence of a dog on its descendants by supposing that every child inherits half of its parent. So if father 23 has six children, than each child 'consists of' 0.5 times 23 and in total father 23 is carried on an equivalent of 6 x 0.5 = 3 times 23. If these children bring forth children again, than each of those will inherit 0.25 times grandfather 23. And so on. We repeat this for every dog and bitch that has ever been bred from, nearly 50000. This will give us the influence of those 50000 dogs on the total population of 250000 dogs. The result is sorted according to year of birth of the 250000 dogs and divided by the number of dogs born in each year. The most influential dogs are plotted in figure 2. The plotted dogs have been selected according to the following criteria:
A comment is in place here. In the early years it was easier for a dog to become a key dog. Much less dogs were registered each year (<200 before 1940 and <800 before 1950) compared to the number of dogs registered each year after 1960 (5700 on average). In view of this, Wiston Cap 31154 surely is unique. For bitches it is more difficult to become a 'key dog' because of the limited direct offspring they can produce themselves. Only one bitch meets the key criteria. John Kirk's Nell 3514 is even on a prominent third place, despite the fact that she never ran an International. Only 5 other key dogs did not compete in an International, see table 1. Table 1 shows more statistics on the influence of these key dogs on the pups born during the last five years (1996-2000). A five year period is chosen because that is the average age at which parents produce pups. In other words, in five years a new generation is born on average. The older key dogs are present in a so much diluted way that none or hardly any dog can be found without their blood. A dog cannot be bred back more than max%. And if no dogs exist without its blood (%zero=0) then we can never eliminate its influence.
From the 41 key dogs recognised by Sheila Grew, 22 dogs fail the criteria stated above. Nine dogs in figure 2 are not recognised by her as key dogs but discussed in her text as important ancestor or descendant of one of the key dogs. Six more were becoming famous after Mrs.Grew finished her books. Roy Goutte has discussed in his 'Principal Sheepdog Lines' five of those dogs. Adam Telfer's Old-Hemp (9), born in 1894, is put forward by Mrs.Grew as probably the most influential early dog. Unfortunately the StudBooks list none of his children (over 200 according to Mrs.Grew). I added three of his children mentioned in 'Key Dogs' and another one from 'The Blue Riband'. That already nearly promoted Old-Hemp to the key-dog status.
Some of the key bitches mentioned by Sheila Grew have a really small influence. Apparently counting the number of descendants alone will not reveal all established dogs.
3. FOUNDERSThe total influence of the 'key dogs' on the current population of pups born during the last five year, is 72.4 percent. Note that this figure is not found by simply adding the mean% figures from table 1 (that would give 150%). The mutual influences were first subtracted (31154 consists of 14% 3036 etc.).It seems that the current Border Collie population is dominated by a small number of dogs. Another way of looking at the gene base of a population is to define the foundation dogs (founders). These are the registered dogs with one or two unregistered parents. The unregistered parents could also be considered the founders but because they have no ISDS numbers it is not easy to find out which unregistered parents are equal for different registered pups. In that case the number of founders would be higher than in reality. There are 3143 registered founders of which 1481 produced registered children. Only 643 of them have had influence on the pups born in the last five year. An even smaller number of founders already have a major influence, see table 2.
So at most 643 dogs influence the current gene pool. 'At most' is added because these dogs might originate from an even smaller number of dogs before the ISDS started the StudBooks. But it might also be that at some later moment in time a small number of key dogs became dominant. Boichard (1997) compares three methods to measure the effective number of dogs that have (had) influence on a particular population. These methods appear to be used for estimating genetic variability in rare (wild) animal species.
The first method gives the effective number of founders. The contribution to the current gene pool of the 643 founders given above is very different. The two most influential ones contribute 9% together. Others have an almost negligible influence. To account for this difference in contribution, the effective number of founders is defined as: The effective number of founders appears to be EF=71.3 in this case. This is the number of founders that would give the same gene diversity if they all contributed equally to the current population. So from the original number of founders, effectively only 71.3 have an influence on the current ISDS population. Note that effective dogs have little to do with real dogs. One effective dog is a measure for the number of genes in a single dog but these genes might be (and in practice always are) spread over many real dogs. Boichard gives a second (approximate) algorithm that takes into account 'bottlenecks' due to the reduction in gene diversity caused by often used stud dogs. The resulting number is called the effective number of ancestors (EA). He gives the example in figure 3 to explain the difference with founders. The current population in this example has 6 founders (1,2,3,4,15,16) but only 4 ancestors (5,6,17,18). The effective number of founders is 5.6 because the second family has a smaller number of representatives in the current population so the genes of 15 and 16 have effectively lesser influence. In this case the effective number of ancestors appears to be EA=2.94. The EA algorithm finds the ancestors by recursively looking for the dogs with the largest genetic influence on the current population. The effective number of ancestors is calculated from the real number of ancestors with the same formula as used for the effective number of founders. EA is always lower than EF because bottlenecks will reduce gene diversity.
The third, most accurate but time consuming algorithm, accounts for all causes of loss in genetic diversity. In a population of unlimited size with random mating the genetic diversity will remain constant. However, in populations of limited size with selective mating like ours, the diversity will decrease due to a process called 'random drift'. Random drift can be explained with the following example. Suppose a dog and a bitch mate and both have genotype B/b at some locus of one of their chromosomes. Their children will inherit one chromosome from the father and one from the mother to form a new genotype at that locus. If by chance they produce two children both with genotype BB, than the properties that go with gene b will be lost forever if these parents were the only living dogs with allele b. Note that this mating not necessarily reduces the effective number of founders or ancestors. This third algorithm gives EG, the effective number of different genomes from founders that are still present in the current population. It does so by simulating the random selection of a particular gene during the fathering of all dogs ever registered. After this simulation the genes in the current generation of Border Collies are counted. This simulation of the total Border Collie population is repeated many times (1000 in this case) and the average occurrences of the genes are calculated from the results. For the current ISDS population the effective number of founder genomes is EG=8.3. So effectively the genes of only 8.3 founder dogs are present in the current generation of approximately 25000 ISDS registered Border Collies. Amazing isn't it? Figure 4 shows how the different (effective) numbers of dogs have evolved over generations since the fifties. Before 1950 the pedigrees are incomplete and show a wild behavior starting at zero around 1900. Since 1980 the effective numbers are almost constant. Sometimes even a bit increasing due to ROM (Registered On Merit) dogs or dogs with otherwise unknown parents. So the main selection took place before 1980. Maybe not accidentally the last big reduction in genetic diversity (1965-1975) coincided with the rising star of Wiston Cap 31154. A little extra explanation for those puzzled which those 8 'effective' dogs are. They do not exist in reality, we cannot point at these 8 individual dogs. You should think of it as an amount of genes spread over all dogs. Many dogs have the same genes and only very few different genes are present in our dogs, only an equivalent of 8 dogs. That is what 'effectively' 8 dogs means. Another way of looking at it is, suppose that we would have 8 founder dogs with differences in their chromosomes. We let them mate with each other randomly to produce a new generation of 25000 pups. This new generation will have the same genetic diversity as the current ISDS population. In this example we can point at the 8 effective dogs but in reality we cannot, the chromosomes have been selected over many generations from many dogs. You might think 'but Wiston Cap must be one of them'. Well, he surely caused a considerable reduction in the effective number of genomes but he did not add anything. He just caused a particular selection of chromosomes from its ancestors to become more dominant.
4. INBREEDINGA small (effective) number of ancestors combined with line breeding or inbreeding can result in a highly inbred gene pool. A figure commonly used to indicate the amount of inbreeding is the Coefficient of Inbreeding (CI). CI is equal to the relationship or Kinship coefficient of its parents. CI is the probability that two genes chosen at random from distinct individuals are identical. An individual with equal genes is called homozygous (for this part of its chromosomes).
The following formula, commonly attributed to Sewel Wright (1922), is used to calculate the CI of dog X in a simple way:
As an example take the (partial) pedigree depicted in figure 5. There are 6 paths via 3 common ancestors (31154, 82022 and 88472) of which the CI's are indicated. The formula is applied as follows: Summing gives a total CI=66.7%. Five Dogs have such a high CI. It actually is 66.9% due to some more common ancestors further back in time. These dogs were an attempt of Edward Smith to breed back to Wiston Cap 31154. None of these five dogs produced sustained progeny. David Rees and Roy Goutte wrote articles on these Dogs in WSN (see the references). These dogs had 87.7% of Wiston Cap's blood in the definition of figure 2. Even if these figures would be 100% it would not guarantee a copy of Wiston Cap as it would give a random mixture of its chromosomes. Often such breeding will result in illnesses, as it did in this case, although apparently it also produced excellent Wiston Cap-like copies.
Figure 6 shows the resulting CI's as a function of the year in which the pups were born. Some CI's higher than 40% stand out:
Before 1950 a line can be seen at CI=25% (and less clear at 12.5% and 6.25%). A CI=25% is obtained by matings between parents and children or between siblings. After 1960 this line becomes vague and increasing due to influence of older generations in the pedigrees. This is also the reason why in recent years hardly any dog can be found with a CI lower than 4%. And even then most of them come from ROM (Registered On Merit) dogs. Normally, people would look at pedigrees with 6 generations at most. Figure 4 shows the CI's when one would only use that information. This line shows a decreasing trend. The difference with a full pedigree is 7% maximum. So on a short term we may be more outcrossing than in the past, but in reality there is still an increased inbreeding. This is characteristic for closed populations with selective mating. Over the last 10 year the CI was 7% on average. Is that high or low? I have the feeling that it is on the low side compared to many other breeds. More than 40% would be considered high by almost everybody, although not uncommon in many other breeds. Near 0% in 10 generations would be considered very low, and it is 4-5% on that time-scale. Probably the average CI is a little higher because the known founders were inbred themselves. A CI of 7% is certainly low considering the small genetic diversity in the current population (effectively the genomes of 8 dogs). More inbreeding would:
REFERENCES
|