Anthropogenetic study of the Arabic - speaking population of Chaouia Ouardigha (Morocco) based on autosomal STRs

Background The Chaouia - Ouradigha is a historical and ethno-geographic region of Morocco. Despite its historical relevance, this region has not received sufficient attention in terms of population genetic exploration. The present study aims to provide new data on the genetic background of the Chaouia-Ouardigha population, the Arabic-speaking North African population in the western center of Morocco, and to examine the relationship between this population and other reference populations based on our dataset as well as previously published population data. Genetic exploration will enable the reconstruction of past gene flows that have influenced the genetic makeup of this particular population. We utilized the Investigator ® IDplex Plus PCR amplification kit (Qiagen, Germany) to establish the allelic frequencies and forensic statistical parameters of 15 autosomal Short Tandem Repeats (STRs) included in the kit in 153 random healthy unrelated individuals from the Chaouia-Ourdigha region. Results The Chaouia-Ouardigha population, which speaks Arabic, exhibits a high degree of genetic diversity with heterozygosity values ranging from 0.670 to 0.879. The loci with the highest allele frequencies, namely D7S820 and TPOX, have the lowest heterozygosity values (0.670 and 0.672, respectively), resulting in lower exclusion power (PE) and typical relatedness index (TPI) values (PE = 0.427 and TPI = 1.66 for TPOX; PE = 0.442 and TPI = 1.72 for D7S820). Phylogenetic analysis showed that the Arabic-speaking population of Chaouia-Ouardigha was found to be distinct from Sub-Saharan, East Asian, and Latin American populations. Conclusions Our study provides insights into the genetic connections of the Chaouia Ouardigha population, high-lighting how historical events, socio-cultural influences, and geographical proximity have contributed to shaping their genetic structure. It demonstrates the substantial impact of historical events and geographical closeness in fostering affinities between the Chaouia Ouardigha community and neighboring populations in North Africa, Southern Europe, and the Middle East. Additionally, the study has generated valuable data confirming the effectiveness of these 15 genetic markers for forensic applications across the broader population.


Background
The region of Chaouia -Ouardigha lies in the central Atlantic area, encompassing the land between the Oum Rbia River, the Cherrat Wadi, and the Tadla plains (Fig. 1).It spans an area of approximately 16,571 square kilometers, which amounts to 2.4% of the entire national area.In terms of origins, the Chaouia -Ouardigha people were described as an intimate mixture of heterogeneous Amazigh elements strongly arabized and crossed, with a small proportion of Hilalian Arab blood.The berghwatas, who are amazighs from the Masmouda confederation, are the local inhabitants of Chaouia -Ouardigha.These populations were predominantly settled in the region.However, in the late 12 th century, the Almohads brought the Arab tribes of « Beni Hilal » and « Beni Soulaïm » to Chaouia -Ouardigha during the conquest of Spain.Subsequently, the Merinids settled other Arab and Amazighs contingents, particularly Zenetes, in the region between the 13 th and 14 th centuries, resulting in significant ethnic changes (Weisgerber et Amade 1935).
In order to understand the Moroccan population's genetic structure, various anthropogenetic studies focused on mtDNA sequences, ABO polymorphism, Alu insertion and HLA polymorphism, have been conducted on Amazigh -speaking and Arabic -speaking populations across different regions of the country (Brakez et al. 2001, Izaabel 2009, Dahbi et al. 2023, Coudray et al. 2007).Despite its historical relevance, there is a dearth of genetic data on several Moroccan populations that played a significant role in the country's history, including the Arabic -speaking population of Chaouia -Ouardigha.This population resides in an area that separates the Amazigh populations of the Middle Atlas from the Arab -speaking populations of the Doukkala plain and Rabat Zemmour -Zaer.
Short tandem repeat (STR) loci, also referred to as microsatellites or simple sequence repeats (SSRs), represent a DNA sequences that contain a repeat motif of 2-6 bp and are characterized by a high level of relatively stable polymorphisms, a dense, uniform chromosomal distribution as well as short sequence lengths, which facilitates detection and analysis by PCR and sequencing (Zhan et al. 2018).Despite growing competition from new genotyping and sequencing tools, the use of these versatile and cost -effective markers continues to increase, boosted by successive technical advances.These markers have been popular molecular markers ever since their advent in the late eighties.In view of their high level of variability, low susceptibility to mutations, independence, and ease of simultaneous amplification, autosomal STRs have been the most common genetic markers used in forensic applications, including paternity disputes or missing person's cases and represent an efficient tool in anthropological research, and specifically were used in phylogenetic analyses with populations from around the world (Butler 2006, Dumache et al. 2016, Li et al. 2018).Fine scale resolution of forensic STR may prove handful in delineating genetic difference and affinities between closely related ethnic groups (Mojtabavi Naeini et al. 2014, Wen et al. 2019).There have been previous studies of STR genotypes in the Moroccan population but the Chaouia -Ourdigha population remain uncharacterized.Therefore, in this study, we genotyped an Arabicspeaking population sample (n = 153) from the Chaouia -Ouardigha region using 15 autosomal STRs included in the Investigator ® IDplex plus PCR amplification kit (Qiagen, Germany).Our objective is to provide allele frequencies data of the Chaouia -Ouardigha population and to analyze its degree of affiliation with adjacent and global populations in order to understand the genetic structure of the Moroccan population.

Editorial policies and ethical considerations
Ethical clearance for this study was obtained from the Biomedical Research Ethics Committee (CERBC) of Casablanca.The ethics committee is based on the following: Law 28-13, 17/09/2015 on the protection of individuals involved in biomedical research (Minister of Health's Decision N°02/DRC/00, 03/12/2012 concerning biomedical research).This study was approved in accordance with the Helsinki Declaration (v.2008).Prior to the data

Sampling and DNA extraction
To investigate the Arabic-speaking population, a linguistic survey was carried out in Chaouia -Ouardigha.Individuals who satisfied the following requirements were considered eligible for this study: Location of the Chaouia region (i), there was no blood relationship between them (ii); they had three grandparents (both maternal and paternal grandparents were born in the Chaouia region) (iii).Use the Arabic language as their mother tongue (iv).
A random of 153 healthy adults who were meet the eligibility criteria had saliva samples taken using FTA cards.A 1.2 mm diameter disc impregnated with dry saliva is sliced using a punch-cutting automaton (Gene punch BSD 600) to retrieve a sample from an FTA card.The DNA trapped in the punch is subsequently extracted from the sample via punch washing with millinic water (Automate Janus).

STRs genotyping using capillary electrophoresis
PCR products were analyzed by capillary electrophoresis using an ABI 3130xl genetic analyzer (Applied Biosystems, Foster City, CA) with reference to the DNA size standard BTO (Qiagen, Germany) and Investigator IDplex Allelic Ladder in total of 12μl master mix consisting of BTO standard and Hi-Di formamide (Thermo fisher Scientific, Inc., Wathman, MA, USA).
Raw data were collected using the Data Collection software v2.0 (Applied Biosystems, USA) and analyzed using GeneMapper ® IDX Software version 1.0 (Thermo fisher Scientific, Inc., Wathman, MA, USA) by comparing the fragment size with the allelic ladder provided with the Investigator ® IDplex plus PCR amplification kit.

Quality control
All experimental procedures were performed in accordance with the laboratory's internal control standards.Negative and positive controls were employed for DNA extraction, DNA quantitation, PCR amplification and capillary electrophoresis.All negative controls displayed an absence of amplified product while positive controls were consistent with known genotypes.

Statistical Analysis
To analyze the data, we used Arlequin software version 3.1 (Excoffier, Laval, et Schneider 2007).This software was used to estimate allele frequencies, heterozygosity, and to test for Hardy -Weinberg equilibrium at the population level.To account for multiple testing, Bonferroni's correction was applied, and only values below 0.003 were considered significant (0.05/15=0.003)(Weir 1996).Allelic frequencies and important forensic parameters, such as match probability (MP), power of discrimination (PD), power of exclusion (PE) and polymorphism information content (PIC) were calculated using PowerStats V1.21 (Tereba 1999).

Phylogenetic analysis
The phylogenetic analysis was conducted to examine the allele frequencies of 18 different populations in a global study.The tree was generated using the Neighbor Joining program of the Phylip 3.67 software (Felsenstein, 2008).

STRs Markers characterization of Chaouia -Ouardigha population
The Chaouia -Ouardigha population, which speaks Arabic, exhibits a high degree of genetic diversity with heterozygosity values ranging from 0.670 to 0.879 (Table 1).The loci with the highest allele frequencies, namely D7S820 and TPOX, have the lowest heterozygosity values (0.670 and 0.672, respectively), resulting in lower exclusion power (PE) and typical relatedness index (TRI) values (PE = 0.427 and TPI = 1.66 for TPOX; PE = 0.442 and TPI = 1.72 for D7S820) (Table 2).On the other hand, the locus D18S51 exhibits the highest heterozygosity value (0.879) and the highest values of exclusion power (PE = 0.762) and typical kinship index (TPI = 4.29).The table presented indicates that all markers exhibit very high power of discrimination (PD) (>0.856).The marker D18S51 exhibits the highest PD and polymorphism information content (PIC) values (0.971 and 0.870, respectively), while the TPOX locus displays the lowest PD and PIC values (0.856 and 0.650, respectively).Notably, the D18S51 marker demonstrates the strongest potential for paternity testing with the highest values of power of exclusion (PE) and typical paternity index (TPI) (0.762 and 4.29 for PE and TPI, respectively).
In order to explore the relationships between the different parameters, we conducted correlation analyses between each pair of parameters.Generally, all parameters exhibited a positive correlation with one another.However, the most significant correlations were observed between the power of discrimination and the information content of the polymorphism, as well as between the power of exclusion and the typical index of kinship.

Phylogenetic analysis
We notified by phylogenetic analysis that the Arabic -speaking population of Chaouia -Ouardigha is found to be distinct from Sub -Saharan, East Asian, and Latin American populations, but closely related to the populations of the Middle East and North Africa (Fig. 2).

Discussion
Tandem repeats (STRs) are widely used as genetic markers in forensic and population genetic studies due to their high variability.These markers have a strong potential to discriminate genetic sequences and are used to determine kinship links between individuals and populations.
This study reveals a high polymorphism of the selected markers as reported in other studies on the Moroccan population by El Ossmani (El Ossmani et al. 2007) and Coudry (Coudray et al. 2007).So the combined use of the 15 STR markers included in the Investigator ® IDplex plus PCR amplification kit is very efficient to discriminate between individuals (CPD > 99%) and exclude false paternity (CPE > 99%).These findings confirm the usefulness of these 15 loci for forensic cases in the general population, i.e., person identification and paternity determination (Aguiar et al. 2021, Gao et al. 2018).
Based on the established phylogenetic analysis, our study is an example that reflects the impact of historical genetic affinities that have occurred between various world populations (Nettle et Harriss 2003, Lan et al. 2022).These affinities have been shaped by changes in socio-cultural, economic, and geopolitical factors over time, as well as advancements in technology which are important factors as reported by others studies (Tandon et al. 2004, Cavalli-Sforza, Menozzi, et Piazza 1994).In our case, this sample of Arabic-speaking group of Chaouia -Ouardigha population separated from the North Mediterranean populations and joined the Middle East and North African populations due to the possible founder effect, where the age of this effect played a crucial role in this relocation.In fact, a recent study of the Chaouia population (Arabic and Amazigh speaking) showed that it shared close genetic affinities with populations from North Africa, the Middle East, and Europe (Cheffi et al. 2023).This can also be attributed to the fact that the Arab founders of the Chaouia-Ouardigha population were Muslims who emigrated from the Middle East to the North African region (Morocco) and settled in the region towards the end of the 12th century with the arrival of the Arab tribes "Beni Hilal" and "Beni Soulaïm" (Daghfous 2003).The populations of the Middle East and North Africa are centrally positioned relative to populations in East Asia, the northern Mediterranean, Latin America, and sub-Saharan Africa, which reflects the significance of geographical and cultural (including religious) proximity.For example, India and Bangladesh show a certain affinity with populations from the Middle East and North Africa.This structure aligns well with the findings of the phylogeny of national and international study of Amazigh and Arabic -speaking in morocco using same 15 microsatellites (Coudray et al. 2007, Cheffi et al. 2023, Dahbi et al. 2023).STRs used in forensic applications contain enough data to distinguish (to some extent) between distinct geographic and linguistic groups within the Moroccan population and within a larger collection of northeast African populations, despite the fact that they are not the most informative markers for inference of ancestry.

Conclusions
To conclude, this study on Chaouia -Ouardigha attempted to report the allele frequencies and forensic statistical parametres of Chaouia -Ourdiga population (Morocco) using the 15 STRs included in the Investigator ® IDplex plus PCR amplification kit.Interpopulation comparisons revealed that Investigator ® IDplex kit might be performed well in intercontinental forensic population analysis and it present a great potential in forensic investigations.The study also suggests a parallelism between the established genetic map and the geographical distribution of the populations analyzed.In fact, the historical and cultural factors have played a significant role in shaping the genetic structure of the population Chaouia -Ouardigha, as evidenced of a possible founder effect that led to the repositioning of this particular population in relation to North Mediterranean populations.

Fig 1
Fig 1 Location of the Chaouia -Ouardigha (highlighted in black on the map), where the samples were collected for the present study

Fig. 2
Fig. 2 Phylogenetic tree of the 15 STRs in the Arabic -speaking population of Chaouia-Ouardigha on a global scale.PD: Power of discrimination, PE: Power of exclusion, PIC: Informative content of the Polymorphism, TPI: Typical paternity index

Table 1
Allelic frequencies of the 15 STRs in the population of Chaouia-Ouardigha included in the Investigator ®

Table 2
Forensic parameters and heterozygosity of 15 STRs in the population of the present study