INTRODUCTION
India is endowed with diverse cattle breeds, the majority of which exhibit low milk yields, prolonged calving intervals, and delayed age at first calving, typical for a tropical climate [
1,
2]. The planners thought that the most expedient way to enhance the productivity of these less-producing indigenous cattle was to breed them with exotic cattle that were well-known for their high milk production, early maturity, and reproductive efficiency. One such crossbreeding project was initiated at ICAR-NDRI, Karnal in 1971, which involved crossing indicine Tharparkar (T) cows with exotic Holstein Friesian (HF), Brown Swiss (BS), and Jersey (J) bulls [
3].
The erstwhile Institute breed committee assessed the diverse levels of exotic inheritance in various cattle groups and suggested that stabilizing the population at 62.5% exotic level will be beneficial to maintain higher productivity. The outcome resulted in the development of a composite cattle known as Karan Fries (KF) since 1982. The breed has since been maintained at this level of exotic inheritance by selective breeding. Presently, the sixth generation of KF cattle has been maintained in the Livestock Research Complex of ICAR-National Dairy Research Institute, Karnal, India [
4].
Various cattle populations, including crossbreds kept in organized herds, have experienced a decline in the level of genetic diversity due to increased selection intensity over time [
5]. The decline in genetic variation poses a significant threat to the sustainability of these cattle breeds [
6]. An earlier genealogical study utilizing pedigree data of KF cattle has shown few alarming results, with the individual cow having a high inbreeding coefficient of 31.25% in the herd [
4]. This situation necessitated a thorough study for robust analysis of the population diversity in KF cattle using high-throughput genomic data. One of the critical genetic parameters of the population that reveal the evolutionary history and genetic diversity is the effective population size, N
e [
7], which refers to the size of an idealized population undergoing the same rate of genetic drift as the population being studied [
8]. The N
e has a dual role that aids our understanding of genetic dynamics in populations. Firstly, N
e provides insight into observed genetic variation and its distribution within a population, viewed from a retrospective perspective. By analysing the past, N
e helps explain how the patterns of genetic diversity have developed over time. Secondly, N
e offers a predictive perspective, particularly valuable when considering small breeding populations. It can estimate the potential loss of genetic variation in the future and shed light on the survival prospects of these populations. In essence, N
e serves as a valuable tool for delving into both the historical patterns of genetic diversity and the possible genetic trajectories that lie ahead for populations [
9].
Effective population size (N
e) also provides valuable insights into the potential for adaptation, genetic drift, and the risk of inbreeding in a population. The availability of genomic data has revolutionized the process of estimation of N
e, as it can overcome the limitations of traditional pedigree-based methods which can be biased particularly in populations with complex breeding histories [
10]. Furthermore, N
e can help to identify potential bottlenecks and genetic drift in composite cattle breeds, which can lead to the loss of beneficial alleles and an increased risk of inbreeding. By detecting these events, breeders can take measures to increase genetic diversity and minimize the risk of inbreeding depression in the population [
11]. In addition, the information generated can help to adopt suitable breeding strategies to ensure the long-term sustainability of composite cattle breeds. Cumulative selection pressure over generations results in a reduction in N
e due to its impact on genetic drift [
12]. If the estimated N
e is low, it suggests that the population may benefit from introducing new genetic material to increase genetic diversity [
11].
The methods for N
e estimation can be broadly classified as pedigree-based, demographic, and marker-based approaches [
13]. Pedigree-based and demographic approaches require extensive record keeping and do not permit judgments about the historical N
e; therefore, the focus has now turned to the marker-based approach, with a preference for the linkage disequilibrium (LD) based technique, due to the abundance of genotype data and reducing cost of genotyping [
14].
Linkage disequilibrium is another population parameter that refers to the non-random association of alleles at different loci in a population, which arise due to the non-random assortment of genes during meiosis [
15]. Linkage disequilibrium has a significant role in population genetics. It helps us uncover the evolutionary past of populations and how much natural selection has influenced them. LD also reveals information about the population’s history, like changes in size, migrations, and mixing between different groups [
15]. Additionally, LD can be used to detect the presence of functional variants, such as those associated with complex diseases [
16]. The decay of LD over time is particularly relevant in the context of genome-wide association studies (GWAS), where the goal is to identify genetic variants associated with complex traits or diseases. The decay of LD over time can limit the power of GWAS, as the signal of association between a genetic variant and a trait may be lost due to the breakdown of LD between the variant and the causal variant [
17]. Therefore, the degree of LD between markers is one valuable criterion to determine the minimum number of markers (marker density) required for conducting various genomic studies [
18].
Under this background with the ongoing KF breed development programme for the last four decades, the present investigation was taken to estimate the Ne using genealogical and genomic tools, and to study the pattern of LD in KF cattle to get insight of the population dynamics. Genotype data of three other well-stabilized composite breeds (Santa Gertrudis [SG], Brangus [BR], and Beefmaster [BM]) and one of its parental breeds (HF) were utilized along with KF for comparative purpose.
DISCUSSION
Here we presented the first study on the status of the effective population size (Ne), demographic trajectory, and LD of the KF cattle using high-density genotype data. The study also included few other crossbreds (BM, SG, BR) and purebred HF cattle, one of the KF’s parental breeds. The investigation revealed a varying degree of effective population size possibly shaped by different demographic events and most importantly, provided information on the breed formation towards the development of this composite cattle.
The observed non-linear pattern of N
e curve in cattle breeds could be attributed to a variety of processes that ultimately lead to a decline in the population sizes as a result of genetic drift. N
e quantifies the decline in heterozygosity and the percentage rise in inbreeding every generation [
36]. The shifts which were observed as sharp drop in the demographic patterns (generation-wise decline in N
e) could be traced to breed development scenarios. Differential information on N
e at various historical periods is provided by LD between pairs of SNPs at various genetic distances. Loosely linked loci indicate population sizes in the recent past, whereas closely linked loci provide estimates of historical population sizes [
37]. According to Hayes et al [
34], LD between loci with a recombination rate c roughly corresponded to the effective population size of the ancestors 1/(2c) generations ago. This approach on which SNeP relies for estimation of the historical N
e is limited to the assumption of linear or steady population. Under this context, recently developed GONE software has been demonstrated with the ability to discern significant shifts in the historical N
e [
38–
42]. A simulation study concluded that the N
e derived using GONE reflects genuine demographic changes across generations and is not significantly impacted by selection or the heterogeneity in recombination rate across the genome [
39].
If we look at the LD-based N
e estimation methods (GONE and SNeP), considering KF development latest in the timeline and comparing the N
e of all crossbreds 5 generations ago, the average estimates were 171.38 for GONE and 45.8 for SNeP. KF population at the time of its development was estimated to be 678.5 by GONE, the highest at that time compared to all other breeds. In the field of conservation biology, the “50/500” rule of thumb suggests that the N
e of approximately 500 animals is necessary to prevent the loss of genetic diversity due to genetic drift and to maintain a flexible population [
43]. Populations with a N
e of less than 50 are at risk of extinction without proper management intervention. Therefore, the Food and Agriculture Organization (FAO) has recommended that the minimum level of N
e should be at least 50 animals to prevent inbreeding depression. Utilizing three different tools for the estimation of effective population size (N
e), our study revealed N
e values of KF cattle exceeded the FAO recommended level (50); viz. based on pedigree, N
e = 78; and LD-based N
e estimates using SNeP and GONE were 52 and 219, respectively.
SNeP and GONE, both are LD-based methods but SNeP uses
r2 as an LD estimator while GONE uses
δ2. The SNeP algorithm is based on the relationship between r
2, N
e, and c while assuming a linear relationship between N
e and the number of generations. GONE returns the geometric mean values of N
e over the estimation replicates. When employed on simulated data, GONE performed reasonably resilient against variables such as temporal heterogeneity of population sampling, admixture, subpopulation splits, and genotyping errors [
14].
The N
e of BM and SG breeds showed an upward trend in recent generations (
Table 3), which may be attributed to the introduction of foreign lineages of parental breeds or the crossing of diversified populations. SG was also showing the highest recent N
e by SNeP (N
e = 61) as well as by GONE (N
e = 367). According to previous reports, GONE was better at interpreting recent N
e as compared to other coalescence and mutation-recombination-based methods. GONE uses
δ2 to measure LD, instead of r
2, and there are no analytical remedies for the sampling error of r
2. Therefore, using it to infer temporal variations of N
e poses a challenge. Because of this, it is challenging to forecast with accuracy how drift will affect LD cumulatively over generations, especially when the recombination rate is low [
14]. The GONE software has also been reported to work well with a small sample size [
38]. When measuring the effective population sizes in turbot, seabream, European seabass, and common carp species of fish using GONE [
40] discovered a similar pattern of drastic reduction of N
e between the five to nine generations, and these declines might be related to the mixing, founder effect, and artificial selection.
SNeP can effectively determine the ancient demographic history previous to 100 generations [
41]. Additionally, SNeP has its default parameter setting based on its estimation procedure at the 13th generation, which is not very recent considering the generation interval of cattle. However, we were able to shrink this up to 5 generations ago by decreasing the minimum distance between SNPs to be analysed. The theory underlying SNeP’s decision to stop estimating in the 13th generation is that the recombination proceeds slowly, having little or no impact on the most recent generations [
41].
Similarly, a sharp decline in N
e of an ancient, admixed cattle breed-Hawnoo was observed twice in the trajectory, where the first drop (53 to 27 generations ago) was attributed to the advent of selection in cattle breeding, and the second one for the introduction of artificial insemination was 27 to 11 generations ago [
44]. This could be the possible reason for the decline observed in N
e for HF population estimated by GONE. In the trajectory obtained by GONE, we could observe a drastic decline of N
e for all the crossbred cattle (
Figure 6;
Supplementary Figure S1), which could be the actual origin of these composite cattle groups, as GONE can predict the events of breed formation as well [
14]. These phenomena were precisely evident in our study, where BM, BR and SG originated in 1930, 1932, and 1940, respectively [
45]. Considering the generation interval of 5 years, they originated 16 to 18 generations ago. A similar drop of N
e for KF was found around 5 to 7 generations ago, which corresponded to earlier reports of the development of KF cattle 6 generations ago through genealogical analysis [
4]. Martinez et al [
42] was also able to determine the time over which different strains of Salmon were established, which corresponded to the most likely generation when the breeding program was started. This information is in concordance with the GONE trajectory getting parallel for these breeds.
Another study concluded that the introduction of new Brahman germplasm from a foreign lineage in the crossbred Braford herd led to a sudden improvement in the declining N
e within one generation [
46]. In a study on another Indian crossbred cattle Vrindavani, estimates of N
e by using SNeP were 53 and 46 at 7 and 5 generations ago, respectively [
47]. The chromosome-wise effective population size of Vrindavani cattle was reported by Chhotaray et al [
48], where they observed a minimum N
e = 22 on Chr 2 and a maximum N
e = 38 on Chr 26 and 27 for the recent generations, respectively. While N
e estimation by GONE, it was important to look for clustering within the population, which was observed in the HF breed, and accordingly, the value of Haldane correction was considered as 0.01 for the program. Similarly, Fjallnära, Swedish cattle were found to be of admixed origin and showed clustering in the population, was adjusted for its recombination rate accordingly to get better estimates by GONE [
41].
The LD between the farthest SNPs determined the N
e of the recent most generations [
46]. The r
2 value of 0.2 between the markers can be utilized in genomic studies with at least 80% of accuracy [
26], which was achieved at 40 kb inter-marker distance in our population. Our study revealed that if we place a marker equidistantly (at 40 kb interval) within the autosomal genome of 2,510.61 Mb, we can confidently use a custom SNP array of 62,765 markers for genomic studies in KF cattle. This was comparable with the Indian crossbred Vrindavani in which r
2 value of 0.2 was reported at 25 to 50 kb inter-marker distance, and a similar SNP panel can fit these Indian crossbreds for genomic studies. Similarly, r
2 value = 0.2 was observed at 40 kb inter marker distance in Hanwoo admixed cattle, and they suggested a comparatively denser panel of 75k SNPs [
44].
LD of short inter-marker distance, i.e., 1 to 10 kb was highest for all the crossbreds and has been observed in previous studies as well [
46]. In Braford crossbred, the r
2 value = 0.2 was achieved at 1 to 5 kb inter-marker distance, whereas in its parental breed Hereford it was observed at 40 to 60 kb [
46]. The average r
2 value in KF was 0.41 for 0 to 10 kb inter-marker distance, similar to that of Vrindavani cattle with r
2 value of 0.46 (Singh et al [
47]). These LD estimates were in the range between those reported in taurine cattle (Angus, r
2 = 0.46; Hereford, r
2 = 0.49) [
49], and indicine cattle (Brahman, r
2 = 0.25; Nellore, r
2 = 0.27) [
50]. In our study, r
2 value = 0.59 up to 10 kb distance was found to be the highest in the purebred HF cattle, which was one of the parent breeds of KF.
In conclusion, our study on the comparative assessment of effective population size of KF cattle generated valuable information and provided insight knowledge regarding the population dynamics of this composite cattle. The estimates of effective population size (Ne) exceeding the minimum recommended level of 50 by the FAO was a desirable characteristic for KF cattle. It may be necessary to improve the effective population size even further in the future to ensure that genetic diversity may not be lost due to random genetic drift. The outcome of the present study assisted in the development of a viable mating plan that uses diverse lines from the population or by introducing a distinct bloodline of parental breeds. Such measures could help to maintain a healthy and resilient population size with a diverse gene pool. The LD decay at 40 kb inter-marker distance indicated that a customized medium-density panel of 63k SNPs would be sufficient to execute genomic selection in the KF population. Our study also suggested possible measures for maintaining appropriate diversity in KF cattle to carry out breed improvement and sustainable utilization programme.