Population data and genetic diversity analysis of 17 Y-STR loci in Saudi population

The Y chromosome polymorphism has been widely studied for human migrations, population genetics, forensic applications, and paternity analysis. However, studies regarding genetic lineage and population genetic structure of the Y chromosome in different regions of Saudi Arabia are limited. This study aimed to analyze the distribution of Y chromosome haplotypes in a sample of 125 native Saudi males from different geographic regions of Saudi Arabia and compare to previously published Y chromosome haplotype data from Saudi Arabia and some neighboring Arab populations. Buccal swabs were collected from 125 healthy unrelated native Saudi males from different geographic regions of Saudi Arabia. Genomic DNA was extracted by Chelex®100; 17 Y-STR loci were amplified using the AmpFℓlSTR Yfiler PCR amplification kit and detected on the 3130 Genetic AnalyzerTM. Allele frequency and gene diversity were calculated with online tool STRAF. The Saudi population data were compared with the neighboring populations using pairwise genetic distances and associated probability values were calculated using the Y Chromosome Haplotype Reference Database Website (YHRD) software. One hundred six YSTR haplotypes and 102 YSTR alleles (excluding 4 null alleles) were identified having a discrimination capacity (DC) of 85.8%. The highest haplotype diversity (HD) and gene diversity (GD) were observed at the loci DYS 458 (0.817) and DYS385b (0.807), respectively. According to our results, the Iraqi and Qena (Egypt) populations appeared to have closer relatedness to the Saudi population as compared with Yemen. The UAE and Kuwait populations showed the same degree of relatedness to the Saudi population followed by Bahrain. On the contrary, the Adnanit and Qahtanit populations of Jordan demonstrated low genetic distance from the Saudi population. In short, studying a population sample of pure Saudi ethnicity enabled us to identify a unique set of haplotypes which may help in establishing genetic relatedness between Saudi and the neighboring Arab populations. The present paper, therefore, highlights the importance of ensuring ethnic originality of the study sample while conducting population genetics studies.


Background
The Kingdom of Saudi Arabia (KSA) is the largest Arab country constituting the bulk of 80% of the Arabian Peninsula. Before the foundation of the modern Saudi Arabia, it consisted of four distinct regions: Hejaz, Najd, Al-Ahsa, and Asir (Al-Rasheed 2013). Tribes in the KSA are actually the descendants of the peninsula's original ethnic stock; therefore, a certain degree of ethnic heterogeneity is evident among both the sedentary as well as the nomadic populations of modern KSA.
Genetic variations in the KSA were contributed earlier by nomadic or Bedouin tribes and clans (Gordon 2005) living in small groups of Persians, Turks, black Africans, and other ethnicities originating from sub-Saharan Africa along the Red Sea coast (Bowen 2014). In addition, the annual pilgrimage (Hajj) to Mecca has long brought hundreds of thousands of migrants representing various ethnic groups from Arab (Jordan, Iraq, Yemen), Asian, and Far Eastern countries to the KSA who overstayed and settled in and around Makkah, Jeddah, and Medina, (Fig. 1) (Ochsenwald and Philby 2016). Nevertheless, majority of the native Saudi population subgroups in the northern, central (excluding Riyadh), western (excluding Jeddah and Makkah), southern, and eastern regions remained genetically distinct because of their adherence to the consanguineous marriage practice (El-Hazmi et al. 1995).
The Y chromosome polymorphism has been studied widely for human migrations, forensic applications, and paternity analysis (Jobling and Tyler-Smith 2000;Quintana-Murci et al. 2001). The Y-STR markers are inherited without recombination down the paternal line with a little mutation and gene conversion (Rozen et al. 2003;Trombetta et al. 2010). These markers not only provide information on the male lineage relationship (Lowery et al. 2013) but also help in studying the local population structure and its demographic history  (Roewer et al. 2005). Y-STR typing has become an important tool in forensic investigations because of its discrimination power and marked genetic variations which produced highly informative Y chromosome STR haplotypes. Due to the greater sensitivity of non-recombining Y chromosomal markers to founder effects and genetic drift, Y-STRs are very powerful in detecting genetic differences between populations (Heraclides et al. 2017;Iacovacci et al. 2017;Li et al. 2016). Studies regarding Y chromosome genetic lineage and population genetic structure in Saudi Arabia are limited (Abu-Amero et al. 2009;Alshamali et al. 2009;Khurbani et al. 2018Khurbani et al. , 2019. In the present paper, we present analysis of Y chromosome haplotypes in 125 native Saudi males from different geographic regions of Saudi Arabia, using the AmpFℓSTR® YFiler® Amplification kit (Life Technologies, USA). We also compared our Y chromosome STR haplotypes to previously published Y chromosome haplotype data from Saudi Arabia and seven neighboring Arab populations ( Fig. 1). It is hoped that findings of this study will add to the existing state of knowledge about the population genetics and distribution of Y-STR haplotypes in Saudi Arabia.

Sample collection
Approval of the Institutional Ethical Committee to conduct this study was obtained well in advance. Buccal swabs were collected from 125 parentally unrelated, fully informed and consented, as per Helsinki Declaration, native (until three generations), and healthy Saudi males from all the regions of Saudi Arabia (Fig. 2), including Riyadh, Al Qassim in central; Tabuk, Al Jawf, Al-Hudud Al Shimaliyah, Hail in northern; Madinah, Makkah in western; Asir, Jizan, Najran in southern; and Dammam, Al-Khobar, Jubail in Eastern provinces. Their 3generation ethnicity was established by looking at their respective national identification (ID) cards. Information regarding their birth places were provided by the donor. All buccal swab donors were adults and came from different walks of life including teachers, businessmen, policemen, and university students. They were recruited from universities, schools, police stations, and shopping centers. None of the donors underwent bone marrow transplant, radiotherapy, frequent blood transfusion, and chemotherapy in the near past. Most of them were married and none of the participants had any known Y chromosome abnormality.

DNA extraction
Genomic DNA was extracted from buccal swabs using Chelex® 100 as described by Walsh et al. (1991) and quantified in the 7500 Real-Time PCR System using Quantifiler® Duo DNA Quantification Kit (Applied

Results and discussion
YSTR profiling has been considered as a vital tool for forensic investigation of cases like sexual assault (Maiquilla et al. 2011), missing persons (Coble et al. 2009), and kinship (Barra et al. 2015). Other applications include population genetics, anthropology, and epidemiology studies investigating the risk of prostate cancer (Paracchini et al. 2003;Hameed et al. 2015). Because of its crucial geographical location in the Arabian Peninsula and in the Gulf of Oman, several authors have studied Y chromosome diversity in native Saudi population employing Y-STR technology (Cadenas et al. 2008;Abu-Amero et al. 2009;Alshamali et al. 2009;Khurbani et al. 2018Khurbani et al. , 2019. The current report presents the population data for 17 Y-STR loci among 125 adult, native, Saudi male volunteers recruited from different geographic regions of Saudi Arabia (Fig. 2). The quality of the study sample greatly affects the outcome of the population genetics studies. For example, Shringarpure and Xing (2014) reported that the accuracy of population stratification and recovery of individual ancestry are greatly affected by the sampling bias in the data collection process. Other studies have shown that sample selection bias can affect population structure analysis of genotype data, genetic ancestry of individuals, and evolutionary history of a certain population (Rosenberg et al. 2002;Patterson et al. 2006). Most of the studies carried out in the Saudi population (Cadenas et al. 2008;Abu-Amero et al. 2009;Alshamali et al. 2009;Khurbani et al. 2018Khurbani et al. , 2019 are based upon the sample collected either from Saudi blood banks, hospitals, forensic casework samples, or from native Saudis living abroad who are mostly self-declared and are not subjected to any type of further verification therefore, lacking the reliable ethnic or demographic originality that may affect, to some extent, the outcome of population genetic parameters. The present study is the first study from Saudi Arabia in which samples were collected through a well-designed questionnaire served by a trained field worker assuring the acquisition of accurate ethnic data up to three generations to confirm the actual geographic descent. Moreover, the geographic location of each participant was not recorded on the basis of his current place of residence (as done in most of the previous studies) but rather on the basis of the birthplace of the volunteer's great grandfather. Therefore, slight differences in certain population genetic parameters are expected in the present study. Distribution of Y-STR haplotypes in a sample of 125 native, unrelated Saudi individuals were analyzed, and 102 different Y-STR alleles and 106 Y-haplotypes were observed. Ninety-one (85.8%) of the 106 haplotypes were unique, while the remaining 15 (14.2%) were shared; 12/ 125 (9.6%) haplotypes were repeated twice and 3/125 (2.4%) haplotypes were shared by two individuals. The most frequent haplotype was H23 (14,10,30,23,13,11,12, 13/18,10,11,14, 20,14,19,21,11) which was shared by four (3.2%) individuals (Table 2). Although the Arabian Peninsula is the region where numerous migrations between Africa and Asia took place since ancient times, our results showed an average degree of haplotype diversity among the Saudi Arabian population most probably due to consanguinity practice and moderate sample size. Table 3 shows the distribution of YSTR alleles, their corresponding allele frequency, gene diversity (GD), haplotype diversity (HD), and F ST or genetic distance. The maximum number of YSTR alleles (n = 11) was seen at the locus DYS 385b followed by DYS 385a (n = 10) and DYS 635 (n = 8) indicating their high degree of polymorphism. The least polymorphic YSTR loci were DYS 3891,391,437 and YGATA-H4 with each locus having 4 alleles. The maximum HD (0.817) was observed at the locus DYS 458 followed by the locus DS385b (0.787) and DYS 392 (0.684). The locus DYS 437 showed the least HD (0.155). The discrimination capacity (DC) calculated for 17 YSTR loci in the Saudi male population was 85.85%. In a recent report, Khurbani et al. (2018) studied a sample of 597 Saudi individuals from 5 geographic regions of Saudi Arabia using 27-YSTR Yfiler® plus and reported a DC of 95.3%. However, when they studied the same sample using 17-YSTR Yfiler® kit, their population DC declined to 74.7% which is considerably lower than what we have reported in the present study (85.85%) using the same 17-YSTR Yfiler® kit. This may be due to ethnic authenticity of our studied sample compared with the study of Khurbani et al. (2018) which had 15% of their Saudi volunteers recruited from the UK.
In the present study, YSTR locus DYS385b showed the highest gene diversity (GD) (0.807) followed by DYS458 (0.800) and DYS385a (0.686). The loci with the least GD were DYS437 (0.222) preceded by DYS392 (0.299) and DYS389I (0.355) ( Table 3). The diversity of the Y chromosome is affected by factors such as the effective male population size, genetic drift, male behavior, marriage systems, and male patterns of migration (Jobling and Tyler-Smith 2003). The range of polymorphism and associated mutational properties makes Y chromosome the best candidate to answer many forensic, anthropological, population genetics, and evolutionary questions (de Knijff 2000;Jobling and Tyler-Smith 2003). Previous studies suggest that Saudi Arabia has a strategic position between Asian and African populations (Luis et al. 2004). The genetic structure of Saudi Arabia has been modulated by gene flow from Asian and African surroundings (Abu-Amero et al. 2009).
A total of four null alleles appeared in our study, one each in the haplotype H30, H31, and H64 at the locus DYS 458 and one in H75 at the locus DYS 456 (Table  2). A previous study by Chandler has shown that the YSTR locus DYS 458 has the highest mutation rate of 0.00814 followed by the locus DYS 456 showing a mutation rate of 0.00735 (Chandler 2006). As well as in a worldwide collaborative study, 137 null alleles were identified at 17 of the 23 Y-STR loci. The occurrence of null alleles has been associated with the mutation rate of the locus in question.
It was also observed that DYS385a/b to be the most informative marker having 21 complete alleles. In addition, it also showed micro variant allele 17.1 at DYS 385a indicating one base pair deletion within or far from the repeat regions (Butler 2011). Such partial repeat variant occurring at a low frequency may be useful in understanding the Y chromosome diversity and recent migrations.
The haplotypes seen in our studied regions of Saudi Arabia were compared with the published data haplotypes of seven neighboring Arab populations using the YHRD database. As observed in the present study, the  (González et al. 2008;Flores et al. 2005). This therefore was reason for its high genetic distance.

Conclusion
By providing the population data on the genetic variations at 17 YSTR loci in a sample of the native Saudi male population (n = 125), an attempt has been made to develop an understanding about the genetic relationship between Saudi Arabia and the neighboring Arab population. Our results show that the Saudi population is genetically closer to the Iraqi, Qena (Egypt), and Yemen (Sana) populations than the Kuwaiti, Abu Dhabi (UAE), Bahrain, and Jordan population. According to our findings, the Saudi population lacks patrilineal homogeneity across the entire region, being homogeneous at one place and partly heterogeneous in others (data not presented here). This may be due to the highly conserved social culture, practice of consanguineous marriages in certain regions, and religious or historical migration to Makkah, Medina, and Jeddah. Unfortunately, because of the limited sample size from different geographic regions of Saudi Arabia, an independent forensic and population statistics could not be performed.  Further studies are, therefore, needed to establish precise patrilineal inheritance in the Saudi population and explore its relationship with neighboring Arab countries.