Applicability of Willems method for age estimation in Brazilian children: performance of multiple linear regression and artificial neural network

Dental age estimation of children may be necessary in the clinical and forensic fields. In the former, it may contribute to the investigation of dental development and biological maturation. In the latter, dental age estimation may support the Court in several circumstances, such as adoption and unidentified bodies of deceased children. This study aimed to apply Willems method for dental age estimation of children from Southeastern Brazil. The second aim of the study was to test the performance of the method modeled with multiple linear regression (MLR) and artificial neural network (ANN). The sample consisted of 1000 panoramic radiographs of female (n = 500) and male (n = 500) Brazilian children. The individuals were evenly distributed through ten age intervals of 1 year from 6 to 15.99 years. Dental development was classified with the 7-teeth technique of Demirjian et al. (1973), followed by age calculation with Willems method (2001). The difference between chronological and estimated ages was quantified for the original Willems model and for the MLR and ANN models. For females and males, the overall difference found with Willems original model was 0.27 and 0.28, respectively. With MLR and ANN, the overall differences were 0.54 and 0.35, and 0.26 and 0.24, respectively. The ANN was able to reduce half of the mean error of female age predictions up to 100%. The same phenomenon occurred in 1/3 of the males. Despite the improvements of the ANN model to specific age groups, the original Willems model performed similar or better in 40% of the studied age intervals. All the models showed the worse age predictions in the interval between 15 and 15.99 years (p < 0.001). Willems method remains optimal and applicable after 20 years since original development. The ANN model might be an option for future improvements (depending on sex and age interval).


Background
After the development of Demirjian's (Demirjian et al. 1973) radiographic technique for dental age estimation in 1973, several so-called validation studies were designed with populations worldwide (Lee et al. 2011;Nik-Hussein et al. 2011;Hegde et al. 2017;Moness Ali et al. 2019). Generally, these studies aimed to test the performance of the technique on samples other than the original (Nur et al. 2012). By comparing estimated and chronological ages, the authors detected constant overestimation (Jayaraman et al. 2013). Twenty-eight years later, Willems et al. (2011) revisited the technique by maintaining the staging system originally proposed and recalculating maturity values attributed to each of the seven mandibular left permanent teeth. Over time, Willems method becomes a popular approach for dental age estimation in the forensic practice (Wang et al. 2017). In 2017, a systematic literature review (Yusof et al. 2017) pooling together different populations showed that Willems method had an overall overestimation of 0.10 years (nearly a month). In 2020, a meta-analysis (Franco et al. 2020) dedicated to test the performance of different methods in Brazilian children rated Willems the top method based on accuracy.
For the Brazilian population, more specifically, the method was previously confirmed as accurate-with mean underestimation of 0.17 and 0.38 among females and males, respectively (Franco et al. 2013). The population previously sampled, however, was restricted to the South region and was not balanced on distribution for sex and age (Franco et al. 2013). The progressive development of existing techniques is justified towards tool polishing and practical improvement. Willems et al. (2010), for instance, published alternative tables for the original Willems method (Willems et al. 2001) in 2010. The new reference maturity values, combined for females and males, enabled dental age estimation in case of unknown sex (Willems et al. 2010). Bedek et al. (2020) proposed a sequential reduction of the number of staged teeth, so Willems method could be used even when some of the seven mandibular left permanent teeth were missing. Multiple linear regression (MLR) and artificial neural network (ANN) emerge as methodological possibilities to minimize error rates. To the present, studies with ideally balanced samples were not previously designed to improve Willems method with MLR and ANN-representing an important gap to be investigated.
Based on the exposed, this study aimed to test the applicability and performance of Willems method in a large and balanced population of Brazilian children; and to test the performance of MLR and ANN modeling on error minimization.

Study design and ethical aspects
This observational cross-sectional study was structured according to the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) Initiative (Von Elm et al. 2014). Ethical approval was granted by the local committee of ethics in human research (clearance protocol: 29384620.7.0000.5374). The research steps respected the Declaration of Helsinki 2013. The radiographs used in this study were retrospectively collected from an existing image database of a private oral radiology center in Sao Paulo State, Southeastern Brazil. Hence, patients were not exposed to ionizing radiation for the purpose of the present study.

Sample and participants
The sample consisted of 1000 panoramic radiographs of Caucasian females (n = 500) and males (n = 500) in the age interval between 6 and 15.99 years (Table 1). In order to establish and assess the eligibility criteria, clinical records were investigated. The inclusion criteria consisted of panoramic radiographs of Caucasian Brazilian children from Vale do Paraiba region, in Southeastern Brazil. The exclusion criteria consisted of images showing maxillofacial lesions, bilateral missing and restored teeth in the mandible (except for third molars), fixed orthodontic appliances in mandibular teeth, bilateral root canal treatment, history of systemic diseases with impact on growth and development, history of broken bones in the face, presence of appliances for the fixation of maxillofacial trauma and images that did not allow a clear visualization of the mandibular left permanent teeth. All the images were acquired with a Sirona Orthophos device (GmbH, Bensheim, Germany) and settings of 69 Kv, 15 mA, and acquisition time of 14.1 s.

Variables and settings
Sex, known chronological age (quantified from the difference between the date of image acquisition and date of birth), and estimated age were the main variables in this study. Secondary variables considered for statistics were the eight developmental stages (from A to H) proposed by Demirjian et al. (1973). The radiographic staging system describes the development of crown and roots of the seven mandibular left permanent teeth, except the third molar. The stages were later quantified into estimated age using the sex-specific tables proposed by Willems et al. (2001)-originally trained in a Belgian population. The method was chosen based on the following aspects: (I) a systematic review and meta-analysis from 2017 (Yusof et al. 2017) pointed Willems method as reliable for age estimation in children; (II) another systematic review with meta-analysis from 2017 (Wang et al. 2017) confirmed that the method was appropriate for Caucasians; (III) a third systematic review and meta-analysis (Franco et al. 2020), revealed that Willems method had the best performance among Brazilian children; and (IV) a study among Brazilians (Franco et al. 2013) showed a small error when Willems method was applied in a population of Southern children.

Study bias and minimization
The mandibular left permanent teeth were classified into stages by a main examiner-an oral radiologist with 7 years of experience with 2D and 3D image analysis. To avoid examiner fatigue, no more than 20 radiographs were analyzed per day. Within an interval of 30 days from the main analysis, the examiner re-analyzed 10% (n = 100) of the sample to enable intra-examiner agreement calculation. An additional examiner, a forensic odontologist with 10 years of experience in practice and history of age estimation assignments, was recruited to analyze the same 100 panoramic radiographs to enable the inter-examiner agreement test. The sample used during examiner agreement tests was selected from the main pool of radiographs using the online tool www. random. org setting as reference for randomization of the (free service) numbering system. The sample size (10%, n = 100) and randomization tool used to test examiner agreement were set based on the previous study (Franco et al. 2013;Machado et al. 2018). Examiner agreement for staging each tooth was calculated with Cronbach Alpha and Weighted Kappa statistics. Image analyses were performed using a MacBook Pro notebook (Apple Inc., Cupertino, CA, USA) equipped with a 13.3″ screen (resolution: 2560 × 1600). Computer's original photo viewer was used with maximum image magnification of 200%.

Statistics
Data were initially treated by means of descriptive statistics of central tendency and dispersion. Mean age and standard deviation per age category of 1 year (interval) were calculated. Spearman's correlation between developmental stages and chronological/estimated ages was calculated. Mean error (ME), mean absolute error (MAE), and root mean squared error (RMSE) were calculated for each age interval of 1 year subtracting chronological and estimated ages. The significance between differences was quantified by means of Wilcoxon's test. The error of the method was depicted via Bland-Altman plot. All the tests were performed separately for females and males. As an attempt to reduce the error of the method, MLR and ANN analysis were accomplished. ANN has the ability to learn relationships between input variables (independent, predictors) and output variables (dependent, outcome). This is achieved by applying algorithms that train the network with a training data set (or derivation) that consists of predictor variables and the known or associated results (Farhadian et al. 2019). In the linear regression analysis, the dental stages were firstly tested for their univariate effect. Next, they were included in a multivariate model (Hair et al. 2019
Descriptive statistics revealed that Willems method had a better performance in the age intervals of 10-10.99 and 11-11.99 for females, in which the ME error between estimated and chronological ages was − 0.06 and − 0.09, respectively. Between the age intervals of 7-13.99, the ME was never worse than − 0.25 (1 month difference between estimated and chronological ages). Among males, the method was able to reach ME = 0 in the age interval of 12-12.99. Between the age intervals of 7-14.99 the ME was never above − 0.39 (3 months difference between estimated and chronological ages). The worst predictions were detected in age intervals in the upper and lower limits of the sample, namely 6-6.99 Overestimations were predominant over underestimations (Table 2). Bland-Altman plotted female and male predictions clustered most of the error rates close to zero and between the error limits of 1 year (Fig. 1). For females and males, positive and statistically significant (p < 0.001) correlations were detected between chronological age, estimated age, and allocated developmental stages ( Table 3).
The MLR and ANN models behaved differently compared to the original Willems model. In general, the MLR had a better performance reducing the ME of the original Willems model applied to Brazilian males. More specifically, within the age interval between 6 and 10.99 years, the model was able to reduce the ME close to 100%. Differently, the ANN had more balanced improvements between sexes. The most evidence improvements were in age interval from 7 to 8.99 years, in which the model reduced the ME nearly 100% in females and males. From the polled age intervals (6-15.99) addressed in this study, 40% showed a better performance of the original Willems model compared to MLR, while 60% showed a better performance of the original model compared to the ANN (Table 4).

Discussion
Dental age estimation of children is usually needed in cases of adoption, unaccompanied minors in irregular migration, unknown deceased, and clinical diagnosis and treatment planning (Pradella et al. 2017;Sobieska et al. 2018). Testing methods trained in international populations is an essential step towards the external validation of forensic tools. This study aimed to test the applicability of Willems method in a population of Southeastern Brazil and to test the performance of method modeling with MLR and ANN.   Willems method was already designed as an improved version of Demirjian's age predictions (Demirjian et al. 1973). In short, Demirjian's approach-known for recurrent overestimation (Asab et al. 2011;Souza et al. 2015;Moness Ali et al. 2019), was recalculated to best-fit the Belgian population. Over time, population-specific investigations have shown an overall optimal applicability of the method-with some restrictions to Asian populations (Wang et al. 2017). In 2013, a study with Southern Brazilian children tested the performance of Willems original model compared to a South-Brazilian model (Franco et al. 2013). With the original Willems model, the authors found a ME of − 0.17 in females and − 0.38 in males between estimated and chronological ages (Franco et al. 2013). The four-month difference between ages depicted a positive outcome of the method in its first application in Brazil (Franco et al. 2013). In the same study, the Brazilian model led to an ME improvement of 0.02 and 0.2 in females and males, respectively. The difference was not clinically significant to justify the proposition of a new method (Franco et al. 2013). In the present study, the combined ME values for females and males were − 0.27 and 0.28. The difference between studies might rely on not only in the geographic origin of the sample, but also mainly on the unbalanced sample used by the previous authors in 2013 (Franco et al. 2013). In the present study, age-and sex-related bias was minimized by sampling the same number of individuals per age interval and sex. In practice, the performance of Willems original model showed in the present study confirms its applicability among children with age and sex similar to the sampled population.
From an international perspective, several systematic literature reviews (Wang et al. 2017;Yusof et al. 2017;Esan et al. 2017;Sehrawat and Singh 2017) corroborate the applicability of Willems method within global populations. Recent examples of international validation include South Africa (Willems et al. 2018), Kosovo (Kelmendi et al. 2018), andKenya (Llano-Pérula et al. 2020). In 2017, a meta-analysis considered Willems method accurate with mild overestimations and mean difference from chronological age of 0.29 and 0.26 for females and males (Esan et al. 2017). Despite the relevant outcomes, the authors concluded by suggesting population-specific modeling for best application worldwide instead of unified standard approaches (Esan et al. 2017). The present study proposed statistical modeling via MLR and ANN. The outcomes of the proposed modeling were not unanimous on improvements of age predictions through the different age intervals (10 intervals from 6 to 15.99 years for each sex). The overall ME of the pooled age intervals according to the MLR was 0.54 and 0.35 for females and males, respectively. The ANN instead, decreased the ME of females and males to 0.26 and 0.24-close to international values (Esan et al. 2017) and very similar to the outcomes of the original Willems model used in the present study. It must be noted, however, that the accuracy of Willems performance among older age intervals, namely 15-15.99, decreases with the progressive development of the seven permanent left mandibular teeth, which is scarce at this point. The scientific literature (Ismail et al. 2018) highlighted this phenomenon before. In the present study, the ME of MLR and ANN for the age interval of 15-15.99 was above 1 year, for females and males, while Willems original model led to ME of 0.71 and 0.54, respectively. In order to have a clear look of the outcomes and to avoid the false impression of high ME values, the ME for the age interval of 15-15.99 can be excluded. By doing so, the pooled ME for females and males decreases to 0.47 and 0.25-using the MLR model; 0.14 and 0.16using the ANN; and 0.22 and 0.26-using Willems original method, respectively.
The improvement of dental age estimation after pooling ME outcomes of the ANN modeling was depicted in this study. Forensic odontologists must understand, however, that Willems original model remains superior in most age intervals among females, and in older age intervals in males (>12 years). In practice, these findings suggest that the original method is a reliable tool in forensic practice with overall realistic estimate up to the age of 14.99. If the need for improvements to specific age intervals arises in practice, dedicated modeling with the ANN approach might be useful, especially in the age intervals where the original method is more limited.

Conclusions
Designed 20 years ago, Willems method figures reliable and applicable for dental age estimation among individuals of the Southeastern Brazilian population aged below 15 years. The method showed slightly smaller differences between chronological and estimated ages among females, compared to males. Attempted improvements by modeling with MLR and ANN were only optimal in specific age intervals depending on sex. In general, ANN performed better than MLR and similar to the original approach.