Analysis of fortuitous matches in a STR genotype database from Mexico and its forensic efficiency parameters
© The Author(s) 2017
Received: 9 May 2017
Accepted: 12 November 2017
Published: 6 December 2017
Commercial kits for Human Identification (HID) purposes, based on 15 short tandem repeat (STR) regions, allow the resolution of most forensic and paternity cases. However, some pitfalls arise in situations such as identification of missing persons, disaster victims, and in motherless paternities. We describe nine fortuitous matches found during the inclusion of new DNA profiles and/or searching for missing persons within a Mexican STR database (N = 2000). For these cases, we estimated both the likelihood ratio (LR) and the modified LR according to National Research Council (NRC) recommendations (i.e., LRNRC).
In most cases, low LRNRC values (~1 or <1) and elevated presence of modal STR alleles (>50% loci) were indicative of fortuitous matches. We estimated a significant increase in the observed frequency of fortuitous matches in this database when using the PowerPlex 16 DNA-typing system. In addition, forensic efficiency parameters for this HID kit are reported for the State of Mexico based on a population sample selected from the STR database (n = 493). We describe the population structure and relationships, which detail comparative homogeneity among Mexican-Mestizos from the Central region, excluding the Western region.
The results suggest that additional / more powerful HID systems should be used when searching for relatives within STR databases in order to enable definitive conclusions, which are also required in cases of motherless paternity and massive disasters. The genetic structure described herein allows forensic geneticists to use alternative STR population data during statistical interpretation, given that most Mexican populations do not have their own STR databases.
Short tandem repeat (STR) loci are the most informative genetic markers used worldwide for human identification (HID) purposes. Although commercial kits that analyze 15 STRs allow solving most forensic and paternity cases, some pitfalls arise in situations such as the identification of disaster victims, missing persons, and in motherless paternity cases (Wurmb-Schwark et al. 2006; Coletti et al. 2008; Borovko et al. 2009; Li et al. 2012; Ziętkiewicz et al. 2012). In our experience, the lack of informative reference samples (first-degree relatives) is probably the most common problem when identifying unknown bodies. This is sometimes overcome by searching STR databases that include samples from missing persons and their relatives, criminals, victims, etc. (Álvarez-Cubero et al. 2012). A considerable number of countries have established local databases that include the 13 STR loci of the Combined DNA Index System (CODIS), the amelogenin locus for gender determination, and additional STRs depending on the HID kit employed (Collins et al. 2004). In 2011, the federal State of Mexico (located within the Central region, Mexico) established a civil and criminal STR database containing approximately 2000 DNA profiles analyzed with the commercial PowerPlex 16 System (Krenke et al. 2002). Although many Mexican populations have been analyzed with autosomal STRs to support forensic casework (Rubi-Castellanos et al. 2009; Salazar-Flores et al. 2015), only the following states/regions of the country have been studied with the PowerPlex 16 kit: Guanajuato, Veracruz, Nayarit, Yucatan, Mexico City, and the Western region (Rangel-Villalobos et al. 2010; González-Herrera et al. 2010; Ramírez-Flores et al. 2014; Martínez-Sevilla et al. 2016). Furthermore, the population structure and genetic relationships of the Mexican populations remain unexplored using PowerPlex 16 System.
Therefore, we analyzed this STR genotype database from the State of Mexico. The findings: i) report nine fortuitous matches detected during the addition of new DNA profiles to this genetic database, which suggest first-degree kinship; ii) evaluate the presence of modal alleles in these cases; iii) estimate the expected probability of finding these fortuitous matches; iv) estimate statistical parameters of forensic efficiency in this previously unstudied population (State of Mexico); and v) evaluate the structure and genetic relationships among Mexican populations previously studied with the PowerPlex 16 system.
Material and methods
Bones analyzed in this work were previously decalcified with EDTA 0.5 M. Bone slices were treated with a Proteinase K Digestion Solution prepared with Bone Incubation Buffer and Proteinase K solution at 21 mg/mL (Promega Corp. Madison, WI). Similarly, tissues (mainly muscle) were digested with Proteinase K and digestion buffer, whereas saliva swabs were spun for DNA extraction from the resulting pellets. The final step of DNA extraction employed the DNA IQ System according to the manufacturer’s instructions (Promega Corp., Madison, WI). Conversely, blood samples obtained from relatives, or reference samples included in the database, were placed on FTA cards and processed with FTA purification reagent (Whatman Inc., Clifton, NJ). One punch derived from FTA cards was used during PCR amplification as DNA sample. All individuals provided a signed, written informed consent in accordance with the ethical guidelines of the Helsinki Declaration. The anonymity of the recruited individuals was preserved. This project was authorized by the Ethical Committee Dirección de Servicios Periciales de la Fiscalía General de Justicia del Estado de México (PGJEM).
The PowerPlex 16 system was used according to the supplier’s instructions (Promega Corp. Madison, WI). Amplified products were run by capillary electrophoresis using the ABI Prism 310 Genetic Analyzer (Applied Biosystems, Foster City, CA). Allele calling was achieved by comparison with the allelic ladder provided in the kit, helped by GeneMapper software (version 3.2).
STR database creation
DNA profiles based on the 15 STRs constituting the PowerPlex 16 system were individually uploaded into the Genetics Platform (Server Unix Solaris, Architecture Sparc64, Motor Rdbms Oracle, Enterprise Edition) developed by Grupo Empresarial Iberoamericano (GEI) (http://geigen.mx). In accordance with international recommendations (http://enfsi.eu/documents/), the inclusion criteria and upload process of DNA profiles to the STR database included quality control verification of the laboratory’s procedures (ENFSI DNA Working Group 2014). The laboratory has participated in the quality control exercise organized by the Grupo Iberoamericano de Trabajo en Análisis de DNA (GITAD: http://www.aicef.net/). The STR genotype database primarily contains details on unknown bodies, people searching for missing relatives, and biological evidence from criminal cases. The sample’s origin was recorded in the database for classification purposes. We carefully selected a subpopulation of unrelated individuals to estimate forensic parameters (n = 493). For this purpose, we excluded genetic data from criminal samples, unknown bodies, and from individuals sharing surnames in order to avoid possible kinship in this subpopulation sample.
Likelihood ratios (LRs) representing paternity indexes with only one parent (motherless of fatherless) were computed either when the match was detected during the inclusion of new DNA profiles, or when searching missing persons within the STR database. Bayesian posteriori probabilities assuming equal prior probabilities (0.5) for the two alternative probabilities were considered for this purpose (Gjertson et al. 2007), using the allele frequencies reported here and the Familias 3 software (Kling et al. 2014). In addition, we estimated another LR, termed LRNRC, by applying a correction factor according to the size of the database (i.e., N = 2000). This follows Recommendation 5.1 of the National Research Council (NRC II), which was endorsed by the FBI’s DNA Advisory Board in the February 2000 recommendations on statistical approaches (National Research Council (NRC) Committee on DNA Forensic Science 1996). In order to check whether the observed cases showed agreement with expectations simply by chance, we estimated for each STR the matching probabilities between all possible genotypes sharing at least one allele. For this purpose, HWE and equilibrium linkage assumptions were applied for estimating genotype and DNA profile frequency, respectively. The combined matching probability was obtained by applying the product rule and the Bonferroni correction according to the sample size (N = 2000), using Microsoft Excel 2007.
Results and discussion
Observed cases of fortuitous matching
We report nine matches (presumed to be fortuitous) found during the search or inclusion of DNA profiles into the STR database of 2000 samples from the State of Mexico. According to our records, there were no biological relationships between these matching individuals.
Seven presumed fortuitous matches were detected between pairs of samples during the inclusion of DNA profiles into the STR database (Table 2). Comparison of surnames and/or available records indicated no biological relationship between matching individuals. Although in most cases (6/7; 85.7%) the low LR values allow deducing fortuitous kinship events (LRNRC < 2.2), one (case 7) displayed large LRs (LR = 2,320,000; LRNRC = 1160), suggesting a real biological relationship. Unfortunately, further biological and/or DNA samples were not available for additional genetic analyses (e.g., X-STRs, mtDNA, etc.) to confirm or discard this hypothesis. Again, low LRNRC values (~1 or <1) appear useful for establishing fortuitous matches.
Presence of modal alleles in fortuitous matching
Expected probability for sharing one/two allele(s) between two unrelated individuals for the Powerplex 16 kit in the state of Mexico database (N = 2000)
0.00030617 / 2000b = 1.153081 × 1017
(1 in 6,532,422 random comparisons)
(9 in 2001,000a)
Evaluating the probability of matching events in the database
Fortuitous matches potentially have serious legal implications, mainly in motherless paternity tests (Poetsch et al. 2006), and during the search for missing persons in STR databases, given that, frequently, few first-degree relatives are available for the test (Ge et al. 2011). This problem could be accentuated in Mexico by the growing number of missing and killed persons during the last years, along with clandestine graves discovered in different parts of the country. Therefore, we estimated, for individual and combined STR loci, the matching probability between all possible genotypes sharing at least one allele (p = 0.00030617). The corrected p-value according to the sample size (p = 0.00000015308) indicated that one fortuitous match between two DNA profiles is expected in 6,532,422 comparisons for this database (Table 3). This frequency differs from that observed in our study (p = 0.000004; Yate’s chi-square = 21.105). This finding is similar to motherless paternity cases described in Germany, where 26 non-STR mismatches were observed between 336 children empirically compared with 348 men (Poetsch et al. 2006). Although we could not identify a direct explanation for this increment, it might involve the presence of neighboring indigenous individuals with larger inbreeding coefficient, who are constantly incorporated into the Mexican-Mestizo populations. This hypothesis would be in agreement with descriptions of Native American ancestry throughout the country (Moreno-Estrada et al. 2014).
Concluding remarks regarding fortuitous matches
As could be expected, many recommendations of the International Society of Forensic Genetics (ISFG) for disaster victim identification (DVI) are useful to promote the correct identification of missing persons (Prinz et al. 2007). However, some critical ISFG recommendations to avoid fortuitous matches are listed briefly herein: #4) Multiple direct references and samples from first-degree relatives should be collected for each missing person; #6) Use of additional typing systems, such as mtDNA, Y-chromosomal STRs or SNP markers; #10) DNA based identification should whenever possible be anchored by anthropological and/or circumstantial data, a second identification modality, or multiple DNA references; #11) Use of LR that permit DNA results to be combined among multiple genetic systems or with other non-DNA evidence. LR threshold should be determined for when DNA data alone can suffice for an identification, which will be based on the size and circumstances of the event (Prinz et al. 2007). It must be noticed that there are difficulties to implement some recommendations in Mexico and probably in another developing countries. For instance, whereas DNA testing nowadays constitutes the main tool for human identification, the use of anthropological and circumstantial data has become scarcely employed and underappreciated. However, as described in case 2 (Table 1), these data would have been helpful to get stronger conclusions; thus, they should be available whenever possible. In brief, as can be noted, strict application of these ISFG recommendations for DVI would have avoided the fortuitous matches described herein for missing persons identification.
Forensic parameters of the STR database
In agreement with ISFG recommendations for DVI (Prinz et al. 2007), we report statistical parameters of forensic efficiency for the PowerPlex 16 kit when used in the State of Mexico population (Additional file 1: Table S1). The forensic parameters were estimated for each STR locus based on 493 unrelated DNA profiles carefully selected from the STR database according to the criteria above described. HWE test showed that only D7S820 was in disequilibrium after applying the Bonferroni correction (p < 0.0033). Similarly, only two LD cases were detected between D13S317/PENTA E and TH01/TPOX after exact tests (data not shown). Altogether, these isolated findings indicate that DNA profiles can be confidently estimated in the State of Mexico population. The combined power of discrimination (PD) and power of exclusion (PE) in this Mexican population were >99.9%, which is sufficiently reliable to solve most forensic and paternity cases, respectively.
In brief, results emphasize the importance of analyzing a sufficient number of relatives and/or HID systems to reach reliable conclusions when searching for relatives in STR databases for DVI, missing persons identification, and motherless paternity cases. When this is not possible, the concomitant presence of low LRNRC values (~1 or <1) and elevated presence of modal STR alleles should be analyzed to detect possible fortuitous matches. Interestingly, an increased frequency of fortuitous matches was observed in the studied Mexican STR database. Finally, we observed a relative homogeneity among Mexican-Mestizos of the Central region based on the PowerPlex 16 system.
We thank to COMECYT: 8BCD0013 for the Doctoral fellowship to J.A. Quiroz-Mercado.
This work was supported by the Dirección de Servicios Periciales de la Procuraduría General de Justicia del Estado de México (PGJEM).
Availability data and materials
The complete STR genotype database from the studied Mexican population is available upon request to the corresponding author.
Conception of the project. QMJA, JDH, MRR, RVH; Population sampling and DNA Extraction. QMJA, RRR, CMG, SHJC; Genotyping process. QMJA, RRR, CMG, SHJC; Population data analysis. QMJA, MSVM, RVH; Discussion of results and writing of the manuscript. QMJA, MSVM, RVH. All authors read and approved the final manuscript.
Ethics approval and consent to participate
All volunteers included in the population study signed a written informed consent according to the ethical guidelines of the Helsinki Declaration. The anonymity of the recruited individuals was preserved. This project was authorized by the Ethical Committee Dirección de Servicios Periciales de la Procuraduría General de Justicia del Estado de México (PGJEM).
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
- Álvarez-Cubero MJ, Saiz M, Martínez-Gonzalez LJ, Alvarez JC, Eisenberg AJ, Budowle B, Lorente JA (2012) Genetic identification of missing persons: DNA analysis of human remains and compromised samples. Pathobiology 79(5):228–238View ArticlePubMedGoogle Scholar
- Borovko SR, Korban VV, Kritskaya SV, Yeumenenka SA (2009) Missing people: problems of identification of unknown bodies using DNA database. Forensic Sci Int Supp Ser 2(1):260Google Scholar
- Carracedo A, Butler JM, Gusmão L, Linacre A, Parson W, Roewer L, Schneider PM (2013) New guidelines for the publication of genetic population data. Forensic Sci Int Genet 7:217–220View ArticlePubMedGoogle Scholar
- Coletti A, Lancia M, Massetti S, Dobosz M, Carnevali E, Bacci M (2008) Considerations on a motherless paternity case with two related fathers: possible pitfalls. Forensic Sci Int Supp Ser 1(1):505–506Google Scholar
- Collins PJ, Hennessy LK, Leibelt CS, Roby RK, Reeder DJ, Foxall PA (2004) Developmental validation of a single-tube, amplification of the 13 CODIS STR loci, D2S1338, D19S433, and amelogenin: the AMPFℓSTR® Identifiler® PCR amplification kit. J Forensic Sci 49(6):1265–1277View ArticlePubMedGoogle Scholar
- ENFSI DNA Working Group DNA-database management review and recommendations 2014. (http://enfsi.eu/)Google Scholar
- Excoffier L, Laval G, Scheneider S (2005) Arlequin ver. 3. 0: an integrated software package for population genetic analysis. Evol Bioinformatics Online 1:47–50Google Scholar
- Ge J, Budowle B, Chakraborty R (2011) Choosing relatives for DNA identification of missing persons. J Forensic Sci 56(Suppl 1):S23–S28View ArticlePubMedGoogle Scholar
- Gjertson DW, Brenner C, Baur MP, Carracedo A, Guidet F, Luque JA et al (2007) ISFG: recommendations on biostatistics in paternity testing. Forensic Sci Int Genet 1(3–4):223–231View ArticlePubMedGoogle Scholar
- González-Herrera L, Vega-Navarrete L, Roche-Canto C, Canto-Herrera J, Virgen-Ponce D, Moscoso-Caloca G et al (2010) Forensic parameters and genetic variation of 15 autosomal STR loci in Mexican-Mestizo populations from the states of Yucatan and Nayarit. Open Forensic Sci J 3:57–63View ArticleGoogle Scholar
- Hill CR, Duewer DL, Kline MC, Coble MD, Butler JM (2013) US population data for 29 autosomal STR loci. Forensic Sci Int Genet 7(3):e82–e83View ArticlePubMedGoogle Scholar
- Kling D, Tillmar AO, Egeland T (2014) Familias 3 extensions and new functionality. Forensic Sci Int Genet 13:121–127View ArticlePubMedGoogle Scholar
- Krenke BE, Tereba A, Anderson SJ, Buel E, Culhane S, Finis CJ et al (2002) Validation of a 16-locus fluorescent multiplex system. J Forensic Sci 47(4):773–785View ArticlePubMedGoogle Scholar
- P.O. Lewis, D. Zaykin, Genetic data analysis, a computer program for the analysis of allelic data. Version 1.1, (2002). Free program distributed by the authors over the internet from the GDA home page. https://phylogeny.uconn.edu/software/ Google Scholar
- Li L, Ge J, Zhang S, Guo J, Zhao S, Li C et al (2012) Maternity exclusion with a very high autosomal STRs kinship index. Int J Legal Med 126(4):645–648View ArticlePubMedGoogle Scholar
- Martínez-Sevilla VM, Aguilar-Velázquez JA, Inclán-Sánchez A, Romero-Rentería O, Díaz-Navarro XX, Saiz M, Rangel-Villalobos H (2016) Parámetros forenses del sistema Powerplex® 21 Promega Corp en población mestiza del occidente de México. Rev Esp Med Legal 42(1):10–16Google Scholar
- Moreno-Estrada A, Gignoux CR, Fernández-López JC, Zakharia F, Sikora M, Contreras AV et al (2014) Human genetics. The genetics of Mexico recapitulates native American substructure and affects biomedical traits. Science 344:1280–1285View ArticlePubMedPubMed CentralGoogle Scholar
- National Research Council (NRC) Committee on DNA Forensic Science (1996) The evaluation of forensic DNA evidence. National Academic Press, Washington, DCGoogle Scholar
- Poetsch M, Lüdcke C, Repenning A, Fischer L, Mályusz V, Simeoni E et al (2006) The problem of single parent/child paternity analysis-practical results involving 336 children and 348 unrelated men. Forensic Sci Int 159(2–3):98–103View ArticlePubMedGoogle Scholar
- Prinz M, Carracedo A, Mayr WR, Morling N, Parsons TJ, Sajantila A, Scheithauer R, Schmitter H, Schneider PM (2007) DNA Commission of the International Society for forensic genetics (ISFG): recommendations regarding the role of forensic genetics for disaster victim identification (DVI). Forensic Sci Int Genet 1:3–12View ArticlePubMedGoogle Scholar
- Ramírez-Flores E, Saiz M, Villegas-Carmona D, Álvarez-Cubero MJ, Álvarez JC, Vega-Navarrete L (2014) Genetic variation of 24 STR loci in a Mexican-Mestizo population from Mexico DF. Forensic Sci Int Genet 10:e4–e6View ArticlePubMedGoogle Scholar
- Rangel-Villalobos H, Martínez-Sevilla VM, Aguilar-Velázquez JA, Martínez-Cortes G, Sosa-Macías M, Gonzalez-Martin A (2016) Importance of the geographic barriers to promote gene drift and avoid (pre and post-Columbian) gene flow in Mexican native groups: evidence from forensic STR loci. Am J Phys Anthropol 160(2):298–316View ArticlePubMedGoogle Scholar
- Rangel-Villalobos H, Rubi-Castellanos R, Morales-Vallejo ME, Molina-Araujo V, Licea-Cadena LA, Rizzo-Juárez S et al (2010) Admixture estimates and statistical parameters of forensic importance based on PowerPlex 16 system in Mexican-Mestizos from the states of Guanajuato (center) and Veracruz (east). Forensic Sci Int Genet 4(4):271–272View ArticlePubMedGoogle Scholar
- Rubi-Castellanos R, Martinez-Cortés G, Muñoz-Valle JF, González-Martín A, Cerda-Flores R, Rangel-Villalobos H (2009) Pre-Hispanic Mesoamerican demography approximates the present-day ancestry of Mestizos throughout the territory of Mexico. Am J Phys Anthropol 139(3):284–294View ArticlePubMedGoogle Scholar
- Salazar-Flores J, Zuñiga-Chiquette F, Rubi-Castellanos R, Alvarez-Miranda JL, Zetina-Hernández A et al (2015) Admixture and genetic relationships of Mexican-Mestizos regarding Latin American and Caribbean populations based on 13 CODIS-STRs. Homo 66(1):44–59View ArticlePubMedGoogle Scholar
- A. Tereba, Tools for analysis of population statistics. Profiles in DNA: Promega Corp., (2001)Google Scholar
- Wurmb-Schwark VN, Mályusz V, Simeoni E, Lignitz E, Poetsch M (2006) Possible pitfalls in motherless paternity analysis with related putative fathers. Forensic Sci Int 159(2–3):92–97View ArticleGoogle Scholar
- Ziętkiewicz E, Witt M, Daca P, Zebracka-Gala J, Goniewicz M, Jarząb B, Witt M (2012) Current genetic methodologies in the identification of disaster victims and in forensic analysis. J Appl Genet 53(1):41–60View ArticlePubMedGoogle Scholar