Mitochondrial DNA haplogroup study: residents of Sulaymaniyah city in the Iraqi Kurdistan Region may be genetically closer to European lineage

Being the native inhabitants of the Neolithic Fertile Crescent, Kurds were included in several maternal lineage studies concerning the Eurasian population. However, no study was performed on the Kurdish population of Sulaymaniyah city (latitude 33.314690 and longitude 44.376759). This study was carried out on a sample of Sorani Kurds living in Sulaymaniyah for the identification of population-related single nucleotide polymorphisms (SNPs) and modes of maternal lineage. In this study, 36 randomly selected healthy unrelated Kurdish subjects were enrolled. Whole mitochondrial DNA sequencing was performed. HaploGrep 2.0 and neutrality test (Tajima’s D) were employed for haplogroup identification and historical demography determination. When the outcomes were compared with previous studies in Kurds and the neighbouring nations, the identified haplogroups in the sample of study were members of the Western Eurasian haplogroups with a predominance of haplogroup H. The whole mitochondrial DNA sequence is superior to the traditional analysis of the non-coding (control) region. Our study indicates a stronger relation of the studied group to the European lineage than to their neighbouring nations.


Background
After the acceptance of the Out of Africa theory, it was concluded that the long-lasting settlement of anatomically modern humans in Eurasia started 60,000-70,000 years ago in the South Western Asian corridor (Wells et al. 2004;Roostalu et al. 2007), where the descendants of haplogroup N, a branch of macro-haplogroup L3, are today's main inhabitants of West Eurasia (Richards et al. 2000;Kivisild 2015).
Kurds are one of today's inhabitants of West Eurasia and are the largest non-state nation in the world who speaks an Indo-European language. Even some linguists suggest an autochthonous pre-Indo-European origin of Kurds in their homeland in the Near East (Hennerbichler 2012). Furthermore, anthropology studies indicate that the forefathers of Kurds were the descendants of the first Neolithic Northern Fertile Crescent natives, who discovered the food production technologies mainly in Near East and Eurasia with the expansion of the first farmers then towards Europe (Comas et al. 2000). Cayonu, Ganj Dara and Jarmo are three archaeological sites in Kurdistan where zoological and botanical remains of agriculture and domestication were found dating back to Fadhl et al. Egypt J Forensic Sci (2021) 11:31 8000-10,000 years before Christ (YBC) (Gallego-Llorente et al. 2016;Zarei and Rajabi-Maham 2016). Additionally, the presence of Sumerian cuneiform writing from about 3000 BC mentioning the land of the Kurds "Land of the Kardas" is regarded as the earliest historical evidence of their existence in the area (Nasidze et al. 2005). In this regard, the land of the Kurds "Kurdistan" is located astride the Zagros Mountains of Iran, which extends to the Taurus Mountains in Turkey and to the south across the Mesopotamian Plain. Notably, Kurds remained geographically isolated and ethnically distinct despite various invasions and hostile attacks in the region (Zarei and Rajabi-Maham 2016).
Moreover, Kurds are divided into four groups, based on their language (Kurdish dialects); 1st, the northern Kurdish group (Kurmanji) spoken by the majority of Kurds in Turkey, Syria, Armenia, Azerbaijan, Georgia and parts of the Kurds in Iraq, as well as a smaller group of Kurds in Iran. Second is the Central Kurdish group (Sorani) as the majority of Kurds in Iran and Iraq. The third is the South Kurdish group (Kirmashani/Faili/Kalhuri) which is spoken by part of Kurds in Iraq and Iran. Finally, the fourth group is Zazaki, spoken only by a smaller group in Turkey (Sheyholislami 2015).
Genetic data regarding the origin of Kurds are very limited and almost all are depending on hypervariable region sequencing merely. Furthermore, in most of the previously performed studies, Kurds were taken as collateral groups with other major population studies in the region. Surprisingly, all the previous maternal lineage studies showed a closer relation of Kurds (haplogroups) to the European descent rather than to the Middle East lineage (Comas et al. 2000;Richards et al. 2000;Wells et al. 2004). Traditionally, population studies used the non-coding (control) region for mitochondrial DNA sequence analysis. This region has been shown to have less aggregation of SNPs (Lan et al. 2019) and a higher rate of mutations (Kivisild 2015). Therefore, whole mitochondrial DNA sequencing gives more accurate results and a better initial assignment of haplogroups (van Oven and Kayser 2009). To the best of our knowledge, the current study is the first one performed on a sample of Sorani Kurds from Sulaymaniyah city, to identify variance of the maternal lineage, using the whole mitochondrial DNA sequence instead of the traditionally used non-coding (control) region.

Sampling
This study was approved by the Ethical Committee of the Faculty of Medical Sciences/University of (BLINDED FOR PEER REVIEW) (Number 44), and verbal consents were obtained from participants. A total of 36 blood samples were obtained from unrelated individuals within the centre of Sulaymaniyah city; the Kurdish ancestry of them was ascertained for three generations.

Mitochondrial DNA extraction, amplification and sequencing
Whole genomic DNA was extracted according to the manufacturer's instructions, using a DNA extraction kit (GeNet Bio, Yuseong-gu, Daejeon, South Korea). The purity and concentration of the extracted DNA were determined using Biophotometer (Eppendorf/Germany). Mitochondrial genome was amplified in the form of four overlapping PCR fragments by using four forward primers and four reverse primers (Supplement Table S1).
The amplified PCR products were purified using a PCR purification kit (Norgen biotek, Schmon Parkway, Thorold, Canada), and other 19 reverse primers were used for sequencing of the amplified mtDNA fragments (Supplement Table S2)

Data analysis
The algorithm implemented in the HaploGrep 2.0 was used for the identification of haplogroups (Kloss-Brandstätter et al. 2011). Historical demography was determined by the neutrality test and Tajima's D (Tajima 1989).

Results
In order to identify the haplogroup variants of the Sulaymaniyah city residents, the whole mitochondrial genome (MT-DNA) from 36 unrelated, healthy individuals were sequenced running the sequence through HaploGrep 2.0 programme and revealed eight haplogroups and sub-haplogroups (Table 1).
Western Eurasian haplogroups were the only determined groups in the studied samples, in which haplogroup H was the most common (41.6%), and the second most common haplogroup was HV (19.4%), while macro-haplogroup N which represents the dominant Western Eurasian haplogroups and its descendants as R0, U, T and J were less common in the current study. Not to mention three haplogroups (T1a, J1 and U1a) were singletons (Table 1) Furthermore, Tajima's D value was calculated for demographic expansion determination, and it was strongly negative (− 2.155811, p <0.01) ( Table 2).
Furthermore, Tajima's D value in the current study was compared with that of the Kurdish population in other studies. As shown in Table 3, Tajima's D values were negative and differed from zero in all the previous studies, but with different strengths. Correspondingly, Hawrami, Kalhori, Kurmanji, Sorani and Georgian Kurds were all within the limits of the current results [6,7]. While in the case of Iraqi Kurds, Irani Kurds and Turkmen Kurds (Wells et al. 2004;Al-Zahery et al. 2013), the strength was less than in the current study.

Discussion
The Iraqi population consists of 75-80% Arabs and 20-25% others (Al-rashedi et al. 2015;AlRashedi et al. 2016;Lazim et al. 2020). Each population has a unique genetic structure which is determined by the frequency of a particular sequence in a population. The frequency of variation in mtDNA D-loop regions can be a useful tool related to genetic diversity, human anthropology and forensic applications (Al-rashedi et al. 2015;AlRashedi et al. 2016).
Kurdistan involves parts of modern-day Iraq, Turkey, Iran and Syria, plus compact settlements in Georgia and Armenia (Rund et al. 1991;Comas et al. 2000;Hassanpour and Email 2021). Kurds from Sulaymaniyah city "latitude 35.566864 and longitude 45.416107" are primarily the Central Kurdish group (Sorani), taking the northeast part of Iraq as their residence (Fig. 1) Serin et al. 2016). Among the haplogroup H, sub-haplogroup H2 was the commonest, which is frequently seen in Eastern Europe and the Caucasus; however, evidence suggests their spread from Western Europe (Pereira et al. 2005;Brotherton et al. 2013). Besides, the other common haplogroups, HV was observed in 19.4%. Worthy to mention, HV is the ancestral clade of H and V, originated between West and Central Asia; it reaches its highest incidence in the Iranian Plateau, Mesopotamia and South Caucasus (Shamoon-Pour et al. 2019). It is recognized as a crucial component of early human spread in Eurasia. Concerning the incidence of (HV) haplogroup among the neighbours, it is close to the prevalence in Iran (19-24%) (De Fanti et al. 2015), while inconsistent  with the incidence in the Iraqi studies (6.4% and 10.6%) (Al-Zahery et al. 2013; Azzawi and Oleiwi 2013) indicating a closer genetic relation of the studied sample to the Iranians than to the Iraqis. Despite that haplogroup HV is not a common haplogroup in Europe, though about (0-10%) has been identified, distributed mainly in southern Europe (Italy and Spain) (De Fanti et al. 2015). Furthermore, infrequent haplogroups in the current study were macro-haplogroup N representing the dominant Western Eurasian haplogroups (Macaulay et al. 2005), and its descendants as R0, U, T and J haplogroups. Not to mention, three haplogroups (T1a, J1 and U1a) were singletons (Table 1). Similar to previous studies (Wells et al. 2004;Zarei and Rajabi-Maham 2016), haplogroups V, X and K that are part of Western Eurasian haplogroups as well as Eastern Eurasian and sub-Saharan African lineage were not identified in the current study, probably because of the limited sample size. In this regard, the presence of different haplogroups, subclades and specific singletons in this study may indicate a demographic expansion of the city's population. Calculated Tajima's D value as well, which was strongly negative (− 2.155811, p <0.01), indicates the excess of low-frequency polymorphism with a recent population size expansion in the city (demographic expansion), and as the comparison table (Table 3) shows, this result is compatible with the other previously performed studies with a minor difference in the strength with those of Iraqi Kurds, Irani Kurds and Turkman Kurds (Wells et al. 2004;Al-Zahery et al. 2013); this may be explained by the difference in sample size, DNA sequence used and segregation sites.
Although Iraq has one of the world's oldest cultural histories and an important geographic location, forensic reference data on mitochondrial DNA control regions in Iraqi populations are scarce (Al-Zahery et al. 2003;Jabbar and Al-Rashedi 2021). Al-Zahery et al. performed analyses of mtDNA variation in 216 Iraqis and found the majority to have H, J, T and U haplogroups (Al-Zahery et al. 2003). To explore the genetic structure of the Iraqi population and develop a forensic database, Alden sampled 1061 individuals from the cities of Anbar, Baghdad, Basra, Diyala, Najaf and Wasit in Iraq. Analyses found that Iraq is similar to other countries in the Middle East, particularly Iran and Turkey, and is more similar to Europe than either Asia or Africa (Alden 2017). In an attempt to investigate the origin of the Basrah population (South of Iraq), Ohied and Al-Badran examined the mitochondrial DNA variations by hypervariable segment 1 (HVS1) sequencing and determination of specific site haplogroups. The most widely recognized haplogroup among the Basrah population  (Zakaria et al. 2013) was H (17%), followed by J (11%) and U and M haplogroups (9%). The values were within the range of gene diversity across the Middle East (Ohied and Al-Badran 2020).
The current study included a sample of Sulaymaniyah city residents, who belong to the Sorani group of ethnic Kurds, having a geographical and cultural contact with both Arabs of southern Iraq and Iranians in the east. Identification of various haplogroups and sub-haplogroups as well as singletons in this study indicates that the population is expanding with emerging new mutations under possible environmental and climatic changes factors (Cheng et al. 2013). Still, it is worth mentioning that the current study with its small sample size, limited by the impact of whole mitochondrial DNA sequencing, was performed at best to identify haplogroup variance in the city residents; looking forward to conducting the current procedures in a study that includes the bulk of the city residents in the near future.

Conclusions
Whole mitochondrial DNA sequencing in 36 healthy residents of Sulaymaniyah city revealed Western Eurasian 8 haplogroups dominated by H with a strongly negative Tajima's D value indicating a closer genetic relation of the city residents to the European lineage.