Analysis of 27 Y-chromosomal STR loci of the Mestizo Peruvian population

Y-chromosome-specific short tandem repeat markers reside on the non-recombinant portion of the Y chromosome and are paternally inherited. These properties make Y-STRs a useful tool in investigations of sexual assault and other violent crimes, paternity cases, genealogical testing, and evolutionary studies. Native populations reside in the Peruvian coast, mountains, and jungle; however, the Mestizos are the largest population group in Peru. Of the 290 samples, 283 distinct haplotypes were observed. Gene diversities for each Y-STR marker ranged from 0.4271 to 0.8597. The haplotype diversity for the total sample was 0.9964. Genetic affinity was identified between the Peruvian Mestizo population and other Peruvian populations (except Jivaro, Awajun, and Shipibo-Conibo populations) and Bolivian Mestizo populations using multidimensional scaling and phylogenetic tree analysis


Background
Y-STR markers are inherited via the paternal lineage, present haploid, and except for the pseudoautosomal regions do not experience recombination.These properties make Y-STRs a useful tool in sexual assault investigations, paternity testing, genealogical testing, and evolutionary studies (Roewer 2009).
The populations of Peru have undergone different migratory processes, contributing to Peruvian population variation and native populations in the coast, mountains, and jungle as well as populations with substantial genetic admixture.Because of this diversity, studies have been carried out on autosomal STRs markers from different Peruvian populations (Delgado & Neyra 2018;Neyra et al. 2019Neyra et al. , 2020;;Neyra-Rivera et al. 2020;Neyra-Rivera et al. 2021a, b;Bermejo et al. 2022;Neyra-Rivera et al., 2022) and also on Y-STRs from native populations (Tineo et al. 2015;Neyra-Rivera et al., 2022;Neyra-Rivera et al. 2023) analysis in the case of Y-STRs the genetic diversity (GD, probability that two randomly chosen individuals can be differentiated by a particular genetic marker(s) (Nei 1987)), haplotype diversity (HD, probability that two randomly chosen individuals can be differentiated by a particular set of markers comprising a haplotype(s) (Nei 1987)), probability of a random match (MP, probability that two randomly selected individuals in a population share identical genotypes (Jones 1972)), discrimination capacity (DC, probability that two randomly selected individuals will be genetically differentiated at a particular marker(s) (Smalldon and Moffat 1973)), haploid match probability (HMP, probability that two randomly selected individuals will share the same haplotype), and pairwise genetic distance (RST, the correlation of allele sizes (rather than allelic states) between genetic markers sampled within populations (Excoffier 2001)).However, there have not been studies with large population samples (i.e., greater than 200 individuals) and with at most 23 Y-STR markers.Only one study with 83 individuals and 23 Y-STR markers (accession number YA005599) is represented in the YHRD database (Yhrd.org).The present investigation seeks to expand on available genetic data by studying a larger sample set of Mestizos for 27 Y-STR markers and compare these genetic data to assess relationships with populations associated either geographically or historically.A 1.2 mm punch was taken from each card and amplified by direct PCR using the Yfiler ™ Plus PCR Amplification Kit (Life Technologies) following the manufacturer's instructions.Amplified products were detected in the Applied Biosystem ™ 3500XL Genetic Analyzer (Life Technologies) following the manufacturer's recommended protocol.After capillary electrophoresis, the data were imported into the GeneMapper ® ID-X v1.5 software (Thermo Fisher Scientific 2015) to generate the genetic profiles which were analyzed according to the manufacturer's recommendations.The population data were deposited in the YHRD database (accession number: YA006020).

Statistical analysis
For the calculation of the allelic frequencies for 23 of the 27 STRs, the Power Stats V1.2 software was used (Tereba 1919); for the DYS385a/b and DYF387S1 markers, the "type" frequencies were calculated by the counting method.GD and HD were calculated with the following formulas GD = (n/n − 1) (1 − ΣPἰ2) and HD = (n/n − 1) (1 − ΣXἰ2), where Pἰ is the frequency of the ἰth allele, n is the number of samples analyzed, and Xἰ represents the calculated frequency of the ἰth haplotype (Sandoval et al., 2013).The MP was calculated as MP = ΣXἰ2 and DC was determined by dividing the number of different observed haplotypes with the sum total number of identified haplotypes.HMP was calculated with the formula HMP = 1 − HD.AMOVA was used to RST values between Y-STR datasets.The RST values were then used for MDS analysis.RST and p values between different populations were estimated by AMOVA and visualized on a MDS plot (YHRD) (Willuweit & Roewer 2007).

Allelic frequencies and forensic statistical parameter analyses
This study analyzed 27 Y STRs in 290 males collected across different departments in Peru.Results were obtained for all 27 Y-STR loci in all samples, and no null alleles were observed.Allele (or type) frequencies for each locus of the combined data set are shown in Table 1.The DYS437 marker displayed the fewest number of alleles (3 alleles), while the DYS385a/b marker had the highest number of types (45 types).GD values ranged between 0.4271 (DYS437) and 0.9415 (DYS385a/b), and only two Y-STR loci had values less than 0.5 (DYS437 and DYS391).The two highest diversity markers, based on GD, were DYS385a/b (0.9415) and DYF387S1 (0.9363) while the lowest diversity was observed at the DYS437 locus (0.4271).

Genetic distances
The RST values for genetic distances and the MDS plot are shown in Table 3 and Fig. 2 for the

Discussion
Molecular biology technologies offer enhanced capabilities for identification.Forensic DNA typing has an important role in some criminal investigations, which in turn benefits society, by providing critical data for developing investigative leads to identify donors of biological evidence who may be perpetrators of crime as well as to exonerate those that may be wrongly associated with biological evidence.In order to contribute to assessing the significance of an association by DNA between a reference sample and an evidence sample, a population study on a sample population of Peruvians was carried out by analyzing 27 Y-STRs contained within a widely used commercial kit (i.e., Yfiler ™ Plus PCR Amplification Kit).
Because there is no recombination on the Y chromosome where the Y-STRs reside, haplotype diversity is more meaningful than GD to compare diversity among populations.The Y-STR HD was 0.9964 with 283 distinct haplotypes out of 290 individuals analyzed in the Peruvian population studied herein.This level of diversity is slightly lower than that of the Peruvian Aymara (0.9998, n = 90) and Peruvian Asháninca (1.000) populations and higher than that of the Peruvian Jungle population (0.9910, n = 141) (Tineo et al. 2015;Neyra-Rivera et al., 2022;Neyra-Rivera et al. 2023).In addition, the diversity value was lower than that of the Mestizo populations of Spain (0.9995, n = 144), Colombia (0.9971, n = 137), and Bolivia (1.000, n = 100) (Martin et al., 2004;Vullo et al. 2015;Yunis et al. 2005).The difference compared with the Mestizo populations may be due in part to fewer samples being typed, a fewer number of markers tested (17 vs 27 markers), and/or stochastic variation.
The specific haplogroups carried by the males analyzed in the Peruvian Mestizo population provide insight into the male lineages of the population.The most prevalent was haplogroup Q (56.18%), which also was the most common haplogroup in the Ashaninca (between 99 and 100%), Aymara (86.52%), and La Selva (91.56%) populations.The lower proportion of haplogroup Q in the Mestizo population may be due to admixture that has occurred and continues occur in Peru; for various reasons, native populations have migrated from their communities to the cities in Peru (Tineo et al. 2015;Neyra-Rivera et al., 2022;Neyra-Rivera et al. 2023).
The RST values were used to generate a UPGMA tree using the Molecular Evolutionary Genetics Analysis (MEGA) 6.06 software (Tamura et al. 2013).The Mestizo population was closest to Bolivian and other Peruvian populations (except Chachapoya and Quechua populations) (Fig. 3).The Mestizo population was closest to the Bolivian population (Bolivia, Boliviano) and farthest from the Jivaro and Awajun (Amazonas, Peru) and Shipibo-Conibo (Ucayali) (Fig. 2).These results suggest that there is a high genetic affinity between the Mestizo population and the other Peruvian populations (except Jivaro, Awajun and Shipibo-Conibo populations).Additionally, the Jivaro, Awajun, and Shipibo-Conibo populations were distant from all groups analyzed which may be due to genetic differences and/or small population sample sizes (n ≤ 21) (Fig. 3).

Conclusions
In conclusion, 27 Y-STR loci in 290 unrelated Mestizo males were typed, providing a larger dataset for this population group.The data obtained have been deposited in the YHRD database (accession number: YA006020).These data are larger than any other available data for the Mestizo population from Peru.For example, YRHD (Yhrd.org)contains 394 haplotypes (394 with 12 Y-STR haplotypes, 229 with 17 Y-STR haplotypes, 83 with 23 Y-STR haplotypes, and until the study herein 0 with 27 Y-STR haplotypes) (filed under accession numbers YA003399 and YA005599).A specific limitation of this study is the small sample sizes for some groups which can lead to errors in the allele frequency estimates and subsequent population genetic calculations.Regardless of this limitation, the analyses carried out herein show that the Y-STR haplotypes are highly polymorphic and thus can be useful in human identification forensic cases, with proper statistical methods that account for uncertainty, as well as may have value for studies on the history of the human populations in Latin America.

Fig. 3
Fig. 3 UPGMA tree based on 17 Y-STR markers in common among 15 populations and Rst estimates

Table 1
Allele frequencies and gene diversities of 27 Y-chromosome STR loci for the Mestizo population from Peru (n = 290) NA Numbers of allele, GD Genetic diversity

Table 3
Pairwise Rst value estimates (below the diagonal) and corresponding value (above the diagonal) for the Mestizo population from Peru and 14 reference population datasets