Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Estimating the Returns to Education Using a Machine Learning Approach – Evidence for Different Regions

Estimating the Returns to Education Using a Machine Learning Approach – Evidence for Different... 1IntroductionWith the emergence of the human capital theory, initially proposed by Smith (1776) and later popularized by Becker (1964), Mincer (1974), and Schultz (1960, 1961), the relationship between investment in human capital and income distribution has been a subject of continuous investigation. Economists have conceptualized human capital as the accumulation of human resources, including knowledge, skills, and personal characteristics, that contributes to the generation of economic value in the labor market (Acemoglu & Autor, 2011; Fogg, Harrington, & Khatiwada, 2018; Goldin & Katz, 2007). Formal education, as a significant component of human capital, plays a crucial role in determining social and economic success, and investments in education are associated with numerous future benefits.Systematic attempts to estimate the social and economic impact of educational investment began in the 1950s and gave rise to the concept of rates of returns to education (Becker, 1962; Becker & Chiswick, 1966; Carnoy, 1967; Chiswick, 1969; Mincer, 1958). Estimating the private rate of returns to education, which refers to the financial earnings individuals expect to receive as a result of their educational investment, provides crucial insights. Two primary methods are commonly used to calculate the expected rates of returns to education:The discounting method, introduced by Psacharopoulos and Patrinos (2018), utilizes the concept of the pure internal rate of returns to schooling. This method involves determining the discount rate that equalizes the stream of benefits from education to the stream of costs at a specific point in time.The earnings function method, formalized by Mincer (1974), examines the rate of returns to schooling as the relative change in earnings resulting from an additional year of education. The Mincer earnings function is widely employed in empirical research and allows economists to estimate the monetary returns to education. Its simplicity facilitates direct comparisons and enables individuals to assess the profitability of investing in education, thereby aiding their decision-making process regarding the optimal level of educational investment.The estimation of rates of returns to education remains a prominent topic in the field of economics, leading to a substantial body of empirical research (Montenegro & Patrinos, 2014, 2013; Patrinos & Psacharopoulos, 2010; Psacharopoulos & Patrinos, 2004, 2018). Recent compilations of studies consistently reveal similar patterns: the global and private rates of returns to education average around 9% per year. Notably, returns tend to be higher in low- and middle-income economies compared to high-income economies. In addition, returns are highest at the tertiary and primary education levels. Furthermore, the findings indicate that women tend to experience higher returns than men, and the private sector of the economy benefits from greater returns compared to the public sector.Ordinary Least Squares (OLS) is a widely utilized econometric algorithm for calculating the rates of returns to education. However, existing literature highlights the issue of high bias associated with OLS, as it often underfits the data by failing to capture the underlying patterns (Dangeti, 2017). Bias error refers to the average difference between the expected value of an estimator and the true value of the parameter being estimated. In recent years, state-of-the-art algorithms from the field of machine learning (ML) have gained prominence and found applications in econometrics. Among these algorithms, Support Vector Regression (SVR) (Basak, Pal, Ch, & Patranabis, 2007; Gani, Taleb, & Limam, 2010; Smola & Schölkopf, 2004) has shown promise for estimating the rates of returns to education. SVR is a regression model within the ML framework that offers several advantages over OLS. These benefits include improved predictive performance, robustness to outliers, and lower computational requirements compared to other regression techniques. Furthermore, ML algorithms have the capacity to uncover hidden patterns and capture nonlinear effects that have a significant impact on the estimation of returns to education. By leveraging these advanced techniques, researchers can potentially obtain more accurate and comprehensive insights.The objective of this analysis is to estimate the rates of returns to education for eight distinct countries across various regions, namely Australia, Germany, India, South Africa, South Korea, Switzerland, Tanzania, and the United States of America (USA). Moreover, we aim to assess the rates of returns to education for each country over the past decade, specifically from 2010 to 2018, utilizing the most recently available datasets. To ensure consistency, the data have been standardized in terms of age groups, education levels, employment statuses, and earnings, thereby accounting for the significant heterogeneity within each region. The estimation process applies SVR in conjunction with the widely recognized Mincer earnings function. In addition, we explore variations in returns between genders and across different educational levels, including primary, secondary, and tertiary education.Through the utilization of state-of-the-art ML models trained on these rich and diverse datasets, this study aims to explore the intricate relationships between education and economic outcomes. By uncovering regional disparities and providing evidence-based insights, the research findings have significant implications for policy and institutional decision-making processes. The article highlights the superior performance of ML algorithms over traditional methods, contributing to the growing body of research advocating for the integration of advanced statistical techniques in estimating returns to education. The potential impact of this study is profound, as it can revolutionize the evaluation of education’s impact by policymakers, educational institutions, and individuals. By considering the economic consequences of educational investment and spending, stakeholders can make informed choices regarding educational investments. This, in turn, can lead to the development of more effective policies and improved outcomes for individuals and society as a whole. The insights generated by this study provide essential guidance for optimizing resource allocation, enhancing educational systems, and driving socioeconomic progress.The evaluation results reveal that Sub-Saharan Africa has the highest returns to education, averaging at 18%. Healthy rates of returns are also observed in Australia (10%), Asia (10%), and America (11%), while Western Europe exhibits the lowest returns, averaging at 7%. In addition, higher education yields the highest returns across all regions, followed by primary and secondary education. Women generally experience higher rates of returns compared to men, with rates of 10.6 and 10.1%, respectively. Over time, the returns to education demonstrate a modest decline of approximately 0.1% per year. In contrast, the average number of years of education exhibits an increase of 0.16 years annually (1% per year). A comparison between the conventional OLS method and the ML algorithm (SVR) demonstrates that the ML approach provides more robust and accurate estimates, along with superior predictive performance.The structure of the article is as follows: Section 2 provides a review of previous estimates of rates of returns to education. Section 3 explains the methodology employed for estimating rates of returns to education using the Mincer earnings equation and the ML approach. The datasets utilized for analysis are described in Section 4, along with an exploration of the patterns of education and earnings across the different regions. Section 5 presents the estimates of rates of returns to education. The final section discusses the results and highlights the policy implications derived from the findings.2Returns of Investment in Education in the WorldThe literature has amassed a substantial body of research that examines the relationship between earnings and education, with a particular focus on estimating the rates of returns to educational investment. In this section, we provide a comprehensive review of the research evidence concerning the returns to education over the past decade. By synthesizing and analyzing the findings from various empirical works, we aim to provide an extensive understanding of the returns on educational investments during this period.2.1Rates of Returns to Education in AfricaAfter the tremendous work conducted by Psacharopoulos and Patrinos (2018), which extensively examined the global literature encompassing 1,120 estimates across 139 countries from 2004 to 2011, it was revealed that Sub-Saharan Africa exhibits one of the highest private returns to education (10.5%). These estimates were predominantly derived using the Mincer equation and the OLS algorithm. Montenegro and Patrinos (2014), after compiling data on the global returns to education from 139 economies spanning the years 1980–2013, discovered that Sub-Saharan Africa had the highest rates of returns (12.5%).Depken, Chiseni, and Ita (2019) employed OLS and examined two waves of the National Income Dynamics Study to estimate the private returns to education in South Africa for the years 2010 and 2012. The analysis revealed returns of approximately 18% per year, with higher returns observed for females compared to males. Similar estimates were obtained by Salisbury (2015) using data from the 2008 wave. By employing the Two-Stage Least Squares (2SLS) procedure on the 2012 wave, Biyase and Zwane (2015) reported a return rate of 47%.Moreover, multiple studies have focused on estimating the rates of returns to education in Tanzania. The most recent estimate, compiled by Montenegro and Patrinos (2014), indicates a rate of 16.1% in 2011. Returns on higher education were found to be the highest (19.4%), while secondary and primary education yielded similar returns (15 and 14.6%, respectively). In addition, Peet, Fink, and Fawzi (2015) estimated the return rate for Tanzania in 2010 to be 11.1%, while Serneels, Beegle, and Dillon (2017) reported estimated returns of 8% for men and 10% for women in 2008. Further rates of returns for both countries are provided in Table A1.2.2Rates of Returns to Education in the USAIn the USA, there is a consistent effort to estimate the returns on investment in education. The Organisation for Economic and Comparable Development (OECD) regularly publishes an article called “Education at a Glance,” which provides ongoing information about the state of education worldwide. These articles present data on education system finances, performance, and incentives to invest in education across OECD countries. Using data from the Programme for the International Assessment of Adult Skills (PIAAC) in 2016, OECD (2019) estimated the private internal rate of returns to tertiary education for both men and women in the USA to be 20%. This estimate was 18% in 2015. Furthermore, Fogg et al. (2018) found that in 2011–2012, high school dropouts were projected to earn 16–17% less than high school graduates. Conversely, bachelor’s degree holders earned 30% more than high school graduates, and master’s degree holders earned approximately 45% more than high school graduates. For additional estimates, please refer to Table A1.2.3Rates of Returns to Education in AsiaSignificant research efforts have been undertaken to estimate the rates of returns to education in India and South Korea over the past decade. In the case of India, Sikdar et al. (2019) conducted the most recent study using data from 2011–2012. Employing the Mincerian equation, the study found insignificant relationships between wages and education level, estimating the financial returns to education at 1.5%. However, graduates from tertiary education typically obtain higher-paying jobs. Jacob et al. (2018), analyzing returns to various professions and degrees based on data from 2011–2012, found that medical graduates had the highest returns, followed by engineering graduates and professional postgraduates. Two additional studies (Agrawal, 2011; Rani, 2014), although based on older data from 2005, obtained rates of returns to education of 14 and 8.5%, respectively. Agrawal (2011) further examined the temporal change in returns to different levels of education and observed rising returns with higher education levels.Turning to South Korea, estimates of the returns to education are primarily evaluated by the OECD using PIAAC data and the full discounting method. In 2016, private returns to tertiary education were estimated to be 22% for men, 20% for women, and 21% on average (OECD, 2019). Similarly, in 2015, returns to tertiary education averaged 22%, with men experiencing a rate of 25% and women 19% (OECD, 2018). As shown in Table A1, returns to education have increased from 2011 to 2016. Another study examining the Korean minority living in China, based on data from 2009–2010, revealed high returns to education compared to the Asian and world averages. This explains the strong private demand for education among the ethnic Koreans in China (Mishra & Smyth, 2013).2.4Rates of Returns to Education in AustraliaThe rates of returns to education in Australia are primarily investigated and reported by the OECD. In 2016, the average returns to higher education were estimated to be 13.5%, showing similar values for both men and women (OECD, 2019). There was only a marginal change compared to the previous year’s returns, which stood at 12% in 2018 (OECD, 2018). Additional estimates from the OECD can be found in Table A1.Other studies have examined the returns to education in Australia using the Mincerian equation. Montenegro and Patrinos (2014) obtained a rate of returns to education of 14.1% for the year 2010, while Mariotti and Meinecke (2011) estimated a rate of 8.3% for the same year. The disparity in these findings can be attributed to the utilization of different datasets in each analysis.2.5Rates of Returns to Education in EuropeEuropean countries, particularly those in Western Europe, are characterized by relatively low average rates of returns to education, as reported by Psacharopoulos and Patrinos (2018) with a value of 7.3%. In the case of Germany and Switzerland, returns to education have been consistently studied and analyzed.According to the OECD reports, the rates of returns to tertiary education in Germany were 15% in 2016 (OECD, 2019) and 11% in 2015 (OECD, 2018). Using the Mincerian earning function with OLS estimation, Montenegro and Patrinos (2014) calculated rates of returns for the years 2010, 2011, and 2012, resulting in estimates of 15.2, 14.3, and 14.5%, respectively. Similarly, employing a two-stage least squares (2SLS) approach, Mysíková and Večerník (2015) obtained estimates of 15.6% in 2010 and 15.5% in 2011.For Switzerland, calculations revealed comparable rates of returns to tertiary education in 2016 and 2015, both amounting to 14% (OECD, 2018, 2019). Previous calculations based on the Mincerian equation indicated higher returns, with estimates of 21% for the years 2010 and 2011 using the 2SLS method (Mysíková & Večerník, 2015). However, when employing OLS, the rates for 2011 and 2012 were approximately 12% (Montenegro & Patrinos, 2014).The majority of studies included in this review rely on older data, with data collection concluding by 2012, except for those analyzed by the OECD, which includes the most recent data from 2016. It is important to note that the results provided by the OECD may have limitations due to the employed methodology, specifically the full discounting method. This approach is less commonly used in the economics literature, as most researchers typically adopt the standard and conventional approach based on the Mincer earnings function. In addition, studies based on the Mincer equation may occasionally fail to capture trends in different regions.The present study focuses on examining the most recent data released in 2018 from diverse regions worldwide, including South Africa and Tanzania in Africa, the USA, South Korea and India in Asia, and Germany and Switzerland in Europe. These regions exhibit significant heterogeneity, and comparing their returns on educational investment can offer valuable insights. To enhance the accuracy and robustness of the estimation, an ML algorithm called SVR is employed in combination with cross-validation. This innovative approach allows for more reliable and precise estimations. The subsequent sections will provide detailed information about the empirical strategy and the data utilized for the analysis.3Empirical SpecificationThe methodological approach employed in this study involves estimating the returns to education by leveraging an ML technique that builds upon the Mincerian earning function.3.1Mincerian Earning FunctionEstimates of the returns to education are commonly derived using the Mincer model (Mincer, 1974; Patrinos, 2016). The Mincer model can be extended to estimate the returns to education at different levels of schooling. The standard Mincer model involves regressing the logarithm of hourly wage (or in some cases, weekly, monthly, or yearly wage) on a set of human capital variables. These human capital variables typically include total years of education, labor market experience, and experience squared. The Mincer equation can be expressed as follows:(1)yi=log(wi)=β0+β1Si+β2Xi+β3Xi2+μi,{y}_{i}=\text{log}({w}_{i})={\beta }_{0}+{\beta }_{1}{S}_{i}+{\beta }_{2}{X}_{i}+{\beta }_{3}{{X}_{i}}^{2}+{\mu }_{i},where yi= log(wi) denotes the natural logarithm of the earned wage for the individual i; Si∈ R constitutes the number of years of schooling; and Xi∈ R represents the potential labor market experience. The labor market experience is calculated by subtracting from the age of the individual the number of years of schooling and the age at which the individual started schooling (Xi= Agei− Si− 6, with 6 being the average age for starting schooling). Xi2 is the square of the potential experience, and µiis the random error term reflecting unobserved characteristics. Consequently, the coefficient of interest β1 on years of schooling (Si) can be interpreted as the private average rate of return to an additional year of education. The basic Mincer function assumes that the rate of return is the same for all levels of schooling and considers foregone earnings as the sole costs of education. The inclusion of the experience variable in the equation accounts for the expected positive relationship between earnings and experience. In addition, the experience squared term captures the potential non-linear relationship between earnings and experience.To analyze the returns to different levels of education, an extended version of the Mincer model can be employed. In this extended model, the continuous variable representing the total years of education (S) is transformed into a series of dummy variables: Ep for primary education, Es for secondary education, and Et for tertiary education. The extended Mincer equation can be expressed as follows:(2)log(wi)=β0′+βpEpi+βsEsi+βtEti+β2′Xi+β3′Xi2+μi′.\log ({w}_{i})={\beta }_{0}^{^{\prime} }+{\beta }_{\text{p}}{E}_{{\text{p}}_{i}}+{\beta }_{\text{s}}{E}_{{\text{s}}_{i}}+{\beta }_{\text{t}}{E}_{{\text{t}}_{i}}+{\beta }_{2}^{^{\prime} }{X}_{i}+{\beta }_{3}^{^{\prime} }{X}_{i}^{2}+{\mu }_{i}^{^{\prime} }.In the extended Mincer equation, the dummy variables Ep, Es, and Et take a value of 1 when an individual has attained the respective level of education (primary, secondary, or tertiary). The reference level is represented by individuals with no education, and it is excluded from equation (2) to prevent matrix singularity.Once the extended earning function has been computed, the private rates of return to each year of education for a specific level can be derived using the following approach:(3)ratep=βp/Sp,{\text{rate}}_{\text{p}}={\beta }_{\text{p}}/{S}_{\text{p}},(4)rates=βs/Ss,{\text{rate}}_{\text{s}}={\beta }_{\text{s}}/{S}_{\text{s}},(5)ratet=βt/St,{\text{rate}}_{\text{t}}={\beta }_{\text{t}}/{S}_{\text{t}},with ratep, rates, ratet being the private rate of returns to primary, secondary, and tertiary education, respectively. Sp, Ss, and St stand for the average number of years spent in primary, secondary, and tertiary education. For convenience, 3 years are assigned for primary education, 6 years for secondary education, and 4–5 years for tertiary education.3 years represent the foregone earnings for primary-school-aged (Psacharopoulos, 1985).The specific duration of tertiary education may vary depending on the higher education system of each country.For instance, higher education in Germany, Switzerland, and Tanzania typically requires an average of 5 years, while it is typically 4 years in Australia, South Korea, India, the USA, and South Africa.The Mincer equation continues to be one of the most widely used models in econometrics for estimating returns to education. As highlighted by Montenegro and Patrinos (2014) and Patrinos (2016), estimates of returns to education derived from the Mincer equation exhibit stability and comparability. It is important to note that these estimates do not necessarily establish causality between education and earnings but rather indicate the conditional association between years of education and labor market incomes.3.2Machine learning (ML) in EconometricsML, and artificial intelligence (AI) more broadly, refers to the field of study focused on computer algorithms that automatically learn and improve knowledge through experience. ML has found numerous applications in econometrics, including computational finance, higher education, and health. ML algorithms possess the capability to identify natural patterns and intricate structures within large datasets, often outperforming traditional algorithms. They have emerged as transformative technologies with immense potential in higher education. These technologies are being utilized in various aspects of the education system, such as personalized learning, intelligent tutoring systems, academic analytics, and administrative tasks. By harnessing the power of ML and AI, higher education institutions and educators can enhance student engagement, improve learning outcomes, optimize resource allocation, and drive innovation in teaching and research methodologies. The application of ML and AI in higher education is a topic of increasing interest and has the potential to revolutionize the educational landscape (Bozkurt, Karadeniz, Baneres, Guerrero-Roldán, & Rodríguez, 2021; Zawacki-Richter, Marín, Bond, & Gouverneur, 2019).In econometrics, traditional statistical algorithms are typically employed to estimate the parameters β that characterize the relationship between an output variable y and a set of input covariates x. These algorithms aim to identify the underlying statistical associations and make inferences about the relationship between the variables. Traditional statistical algorithms are commonly used for such inference tasks. However, recent discussions by Mullainathan and Spiess (2017) have highlighted the application of ML algorithms in econometrics. These ML algorithms, such as least absolute shrinkage and selection operator regression and SVR as utilized in this article, offer alternative approaches to modeling and prediction. ML algorithms can provide more flexibility by handling complex relationships and capturing nonlinearities that may be challenging for traditional statistical algorithms.3.2.1How ML Generally WorksML algorithms operate by taking a known set of input and output data (x,y) and constructing models that can make predictions yˆ=f(x)\hat{y}=f({\bf{x}}). During the training process, these algorithms aim to minimize a loss function L(y,yˆ)L(y,\hat{y}), which quantifies the discrepancy between the predicted values and the true values in the training data. The objective is to find the model that minimizes the expected prediction loss Ey,x[L(y,yˆ)]{E}_{y,{\bf{x}}}{[}L(y,\hspace{.25em}\hat{y})]on new, unseen data. This means that ML models are evaluated based on their performance on data that were not used in the model construction process, often referred to as out-of-sample data. This approach differs from traditional econometric models, where all available data are typically used for model estimation. By evaluating model performance on out-of-sample data, ML models are better able to generalize and make robust predictions. This mitigates the risk of overfitting, where models become too closely aligned with the training data and may not perform well on new data. Consequently, the estimated parameters, such as β, in ML models can provide more reliable and meaningful inferences for future predictions and deductions.3.2.2How an ML Algorithm is Set UpSetting up an ML algorithm involves an essential step known as tuning, where the hyperparameters of the algorithm are adjusted to optimize the model’s performance.Hyperparameters differ from the parameters (e.g., β) estimated by the model itself during training and are set by the user.One commonly used technique for selecting hyperparameters is Grid Search (Ozdemir, 2016).Grid Search involves manually specifying a set of values for the hyperparameters and systematically evaluating the model’s performance with different combinations of these values.When tuning the model to achieve optimal performance, a potential concern is overfitting, where the model becomes too closely aligned with the training data and fails to generalize well to new, out-of-sample data. To address this issue, a suitable approach is to utilize the cross-validation method (Hastie, Tibshirani, & Friedman, 2009). Cross-validation involves dividing the available data into multiple subsets or “folds.” The model is trained on a portion of the data and evaluated on the remaining fold, and this process is repeated multiple times, with each fold serving as the evaluation set once. This allows for a more robust assessment of the model’s performance and helps mitigate the risk of overfitting. By combining techniques such as Grid Search and cross-validation, ML algorithms can be effectively tuned to optimize performance while avoiding overfitting and ensuring generalizability to new, unseen data.3.2.3How an ML Model is EvaluatedMetrics such as the coefficient of determination (R2) and Root Mean Square Error (RMSE) are commonly used to evaluate the predictive performance of ML models in regression problems (Hyndman & Koehler, 2006; Pelánek, 2015). R2 measures the proportion of the variance in the outcome variable that can be explained by the model’s predictors.It quantifies how well the model captures the variability in the data. R2 is defined as follows:R2=1−∑i=1n(yi−yˆi)∑i=1n(yi−y¯),{R}^{2}=1-\frac{{\sum }_{i=1}^{n}({y}_{i}-{\hat{y}}_{i})}{{\sum }_{i=1}^{n}({y}_{i}-\bar{y})},where y¯=1n∑i=1nyi\bar{y}=\frac{1}{n}{\sum }_{i=1}^{n}{y}_{i}denotes the mean of the observed outcome. R2 ranges from 0 to 1.RMSE assesses the average magnitude of the squared errors between the predicted values (yˆi)({\hat{y}}_{i})and the observed values (yi). It quantifies the dispersion of errors in the predictions. A smaller RMSE indicates that the predicted values are closer to the observed values. RMSE is calculated as follows:RMSE=1n∑i=1n(yˆi−yi)2.\text{RMSE}=\sqrt{\frac{1}{n}\mathop{\sum }\limits_{i=1}^{n}{({\hat{y}}_{i}-{y}_{i})}^{2}}.For an excellent predictive ML model in regression problems, we generally aim for a high value of R2 and a small value of RMSE.3.3SVRSVR is a widely used ML model for regression tasks and has found applications in various fields, including time series analysis, financial prediction, and engineering analyses. SVR builds upon certain aspects of traditional linear regression but introduces some key differences in its approach. While linear regression aims to minimize the sum of squared errors between the predicted and actual values, SVR focuses on minimizing the margin violation of a defined ϵ-insensitive zone around the true values. For a more comprehensive understanding of how SVR works and its underlying techniques, a detailed explanation and graphical illustrations can be found in references such as Bishop (2006) and Smola and Schölkopf (2004). These references delve into the mathematical foundations and algorithms behind SVR, providing a deeper insight into its methodology and applications.Given a dataset with n observations (x1,y1), …, (xn,yn), xi∈ Rd, and yi∈ R, the aim is to find a regression hyperplanef(x)=xTβ+β0,f({\bf{x}})={{\bf{x}}}^{T}\beta +{\beta }_{0},which leads to minimal deviation from the observed values y and, at the same time, is as flat as possible. Flatness means that small β are sought. The error function is then given as follows:12∑i=1n(f(xi)−yi)2+12∥β∥2.\frac{1}{2}\mathop{\sum }\limits_{i=1}^{n}{(f({{\bf{x}}}_{i})-{y}_{i})}^{2}+\frac{1}{2}{\parallel {\boldsymbol{\beta }}\parallel }^{2}.Instead of minimizing the quadratic error function, in which each deviation from the real outcome is penalized, SVR allows sparse solutions with at most ϵ deviation (ϵ > 0). Thus, the quadratic error function is replaced by an ϵ-insensitive error function (Cortes & Vapnik, 1995), in which the absolute difference between the prediction and the real outcome less than ϵ is not considered an error. The ϵ-insensitive function, as illustrated in Figure 1, is given as follows:E(u)=E(f(x)−y)=0,if|f(xi)−yi|≤ϵ,|f(xi)−yi|−ϵ,otherwise.E(u)=E(f({\bf{x}})-y)=\left\{\begin{array}{cc}0,& {\rm{if}}\hspace{1em}|f({\bf{x}}i)-{y}_{i}|\le \epsilon ,\\ |f({\bf{x}}i)-{y}_{i}|-\epsilon ,& {\rm{otherwise}}.\end{array}\right.Figure 1ϵ-Insensitive error function (red) with the error increasing linearly vs quadratic error function (green).To deal with the points, for which |f(xi)−yi|>ϵ|f({{\bf{x}}}_{i})-{y}_{i}|\gt {\epsilon }, two slack variables ξ ≥ 0 and ξˆ≥0\hat{\xi }\ge 0are introduced. Thus, as illustrated in Figure 2, ξi> 0 corresponds to a point for which yi> f(xi) + ϵ and ξˆi>0{\hat{\xi }}_{i}\gt 0corresponds to a point for which yi< f(xi) − ϵ. Points lying inside the ϵ-tube correspond to f(xi) − ϵ ≤ yi≤ f(xi) + ϵ and have ξi=ξˆi=0{\xi }_{i}={\hat{\xi }}_{i}=0. Therefore, the optimization problem to solve in ϵ-SVR has the following form:Figure 2ϵ-SVR with the ϵ-insensitive tube and slack variables ξ and ξˆ\hat{\xi }.minC∑i=1n(ξi+ξˆi)+12‖B‖2subject to yi−xTB−β0≤ϵ+ξixTB+β0−yi≤ϵ+ξˆiξi,ξˆi≥0.\min C\mathop{\sum }\limits_{i=1}^{n}\left(\phantom{\rule[-0.75em]{}{0ex}},({\xi }_{i}\left+{\hat{\xi }}_{i})+\frac{1}{2}{\Vert {\rm B}\Vert }^{2}\right){\rm{subject\; to}}\hspace{5em}\left\{\begin{array}{c}{y}_{i}-{{\bf{x}}}^{T}{\rm B}-{\beta }_{0}\le \epsilon +{\xi }_{i}\\ {{\bf{x}}}^{T}{\rm B}+{\beta }_{0}-{y}_{i}\le \epsilon +{\hat{\xi }}_{i}\\ {\xi }_{i},{\hat{\xi }}_{i}\ge 0\end{array}\right..The constant C > 0 determines the trade-off between the flatness of the function f and the amount up to which deviations larger than ϵ are tolerated.Both constants C and ϵ represent the hyperparameters of the SVR, which have to be tuned by the user via Grid search for optimal performance.To solve the optimization problem, its dual formulation with the help of the Lagrange function is used (for more details, see Bishop, 2006).4Datasets and Patterns of Earnings and EducationThe data used for analysis in this study consist of recent national data collected between 2010 and 2018 from eight different countries located in diverse regions. The dataset includes a range of demographic characteristics such as age, gender, marital status, household size, number of children per household, location of the household, social and religious group, number of completed school years, and the individual’s relation to the household head. In addition, the dataset comprises income-related variables such as household income, consumption, household assets, and the source of income. Employment-related variables such as occupation, industry, and the number of hours worked are also included in the dataset. These rich and comprehensive data allow for a detailed analysis of the relationship between various socioeconomic factors and the returns to education across different countries and regions.The data utilized in this study are primarily sourced from the Cross-National Equivalent File (CNEF).Details on the CNEF project can be found in https://cnef.ehe.osu.edu/.The CNEF project is a collaborative initiative involving multiple individuals and institutions that collect panel survey data from nine different countries as of 2020. These countries include Australia, Canada, Germany, Great Britain, Japan, Korea, Russia, Switzerland, and the USA. The CNEF project harmonizes and equivalently defines variables across surveys, enabling researchers to compare social and economic outcomes consistently over time and across countries. For this study, data from five CNEF countries, namely Australia, Germany, Korea, Switzerland, and the USA, were accessible. In addition, data from two African countries, South Africa and Tanzania, as well as India, were also included to facilitate cross-regional comparisons. By leveraging the CNEF dataset and incorporating data from diverse regions, this study aims to provide insights into the returns to education on a global scale and explore potential variations across different countries and regions.The analysis in this study focuses on sub-samples comprising individuals who are employed either full-time or part-time and fall within the age range of 18–65 years. The following variables are included in the analysis: annual earnings, which encompass wages, salaries, bonuses, overtime, commissions, and earnings from self-employment, professional practice, or trade; gender; the number of hours worked in a year; and the number of years of education completed. To facilitate comparisons and ensure consistency, the earnings of each individual are converted into hourly earnings. Only individuals with hourly earnings higher than the national minimum hourly wage are included in the final sample. The labor market experience variable is calculated by subtracting the number of years of education and 6 from the individual’s age. However, the maximum value for labor market experience is capped at 100 years to avoid extreme values. Education level is determined based on the total number of years of education completed. Following the standard classification in the literature (e.g., Peet et al., 2015), primary education corresponds to 1–6 years of education, secondary education corresponds to 7–12 years of education, and tertiary education corresponds to 13 years of education and above. The description and summary statistics of each dataset are provided, including information on variables such as earnings, education level, and labor market experience (Table 1). Furthermore, Figure 3 illustrates the relationship between earnings and age, stratified by education level.Table 1Summary statistics of the different dataYearObs.MenMean age in yearsMean years of educationMean hourly earnings in local currencyMenWomenTotalMenWomenTotalMenWomenTotalAfricaSouth Africa20175,49851.1%36.537.937.211.311.911.654.841.548.3(Rands)20154,74553.0%36.437.837.110.911.611.240.734.637.820122,97852.4%37.538.538.010.911.911.438.834.036.520102,48953.9%38.238.738.410.611.611.135.929.733.0Tanzania20152,04865.0%35.233.234.58.08.08.01,7481,3061,595(Schilling)20133,05666.0%33.832.333.38.18.08.11,5989661,38420112,28068.5%34.732.934.17.97.97.91,5141,0121,356AmericaUSA20179,48248.9%40.840.540.613.714.213.930.723.927.2(Dollars)20158,72848.3%41.040.840.913.714.214.029.922.626.220138,72649.6%41.340.841.113.714.113.929.022.525.720118,59449.0%41.440.841.113.714.013.930.522.226.2AsiaIndia (Rupee)201249,32170.0%37.338.537.67.03.96.031.719.828.2South Korea20186,67561.2%45.544.345.113.713.213.518,16012,37915,919(Won)20176,79661.5%45.344.344.913.613.013.417,39811,52215,13420166,63862,1%45.244.144.813.513.013.316,58910,94314,44820156,53862.2%44.844.044.513.412.813.216,11610,55714,01620146,27962.4%44.944.144.613.312.613.015,28610,35013,43020136,48662.7%44.443.444.013.212.613.014,8809,67612,93920126,52263.4%44.243.143.813.112.412.914,2489,33312,44820116,39063.2%43.742.443.213.012.312.813,8158,70611,93620106,38363.6%43.641.942.912.912.312.712,8908,15111,165AustraliaAustralia20189,36150.5%39.439.739.513.013.513.341.437.439.4(Dollars)20179,26850.5%39.439.739.613.013.513.240.437.238.820169,29851.1%39.439.539.413.013.413.239.035.937.520159,21551.5%39.439.639.513.013.413.238.735.036.920149,13151.4%39.239.739.412.913.313.137.935.236.620139,12851.5%39.239.439.312.913.213.138.235.636.920129,17451.8%39.239.639.412.913.213.036.232.634.420117,19851.6%39.039.439.212.713.012.935.132.533.820107,01051.4%39.139.539.312.713.012.935.631.933.8EuropeGermany201713,62550.4%45.245.045.113.013.213.123.118.820.9(Euros)201611,80950.5%45.044.644.812.913.213.122.618.520.5201511,99051.3%45.144.644.912.913.213.122.218.220.2201412,67452.1%44.644.144.412.913.213.122.218.120.2201312,11051.9%44.944.244.613.013.313.222.518.020.3201212,80152.6%44.643.844.213.013.313.121.917.820.0201112,75153.9%44.343.443.913.013.313.121.817.719.9201011,95454.6%43.442.543.013.013.313.121.217.619.6Switzerland20184,48949.4%45.144.644.815.014.414.750.541.445.9(Swiss Francs)20174,46349.4%45.444.845.115.114.314.750.741.846.220164,75349.6%45.644.945.314.914.214.650.141.445.720155,28350.0%45.544.545.014.814.114.550.341.645.920145,83050.8%45.044.344.614.714.014.349.040.344.720133,51049.5%44.744.244.414.713.914.348.440.444.320123,64249.5%44.844.144.414.613.814.249.140.144.520113,70750.4%44.543.644.114.513.814.148.840.244.620103,71149.7%44.543.744.114.513.614.147.739.643.7Figure 3Average hourly earnings by age in the different countries.4.1National Income Dynamics Study (NIDS) DataThe NIDS is a significant household panel study in South Africa, recognized as the first nationally representative and longitudinal study of its kind (Brophy et al., 2018). Since its initiation in 2008, NIDS has been conducting regular interviews and tracking over 28,000 individuals in 7,300 households across the country. The dataset encompasses five waves, specifically from the years 2008, 2010, 2012, 2015, and 2017. For this study, the focus is on the four most recent waves, which align with the timeframe of 2010–2019. A balanced distribution between men and women is observed across the different datasets, with women generally being slightly older than men. The average years of schooling amount to approximately 11.5 years, and women tend to have higher educational attainment compared to men. In 2017, the average hourly earnings were recorded as 48.3 South African Rands, representing a 31% increase compared to 2010. However, a gender pay gap of 24.3% in favor of men was observed in that year. Moreover, there exists a positive relationship between earnings and age, indicating that hourly earnings tend to increase with age. Individuals with higher education (38.0% of the sample) earned above the average earnings, while those with lower education earned below the average.4.2National Panel Survey (NPS) DataThe Tanzania NPS is a nationally representative household panel survey designed to collect information on the living standards of the Tanzanian population (National Bureau of Statistics, 2016). The NPS has been conducted since 2008 and consists of four waves (2009, 2011, 2013, and 2015). The survey includes approximately 5,000 households and around 10,000 individuals living within those households, who are interviewed and tracked regularly over the years. For this analysis, data from three waves (2011, 2013, and 2015) are used.In the dataset, men are overrepresented, accounting for 65% of the data, while women comprise the remaining 35%. On average, both men and women have the same age and an equal number of years of education, specifically 8 years. Between 2011 and 2015, the mean hourly earnings increased from 1,356 to 1,595 Tanzanian Shillings, representing a growth of 15%. In 2015, the gender pay gap was approximately 25.3%, indicating a disparity in earnings favoring men. It is noteworthy that individuals with secondary education or lower attained similar average earnings, while those with higher education (7.3% of the sample) earned above the average.4.3Panel Study of Income Dynamics (PSID) DataThe PSID is a nationally representative longitudinal study focused on families and individuals in the USA. It was initiated in 1968 and has since continued to interview the same families and their descendants over nearly five decades, spanning from 1968 to 2017 (PSID, 2019). For this analysis, data covering the period from 2010 to 2017 are used, encompassing four waves (2011, 2013, 2015, and 2017). Each wave of data collection involves approximately 25,000 individuals from 9,000 households. In the PSID dataset, men and women are equally distributed and span various age groups. The average number of years of education remains relatively constant over the years, at approximately 14 years. Females tend to have slightly higher levels of education compared to males. In 2017, the average hourly earnings in the USA were 27.2 USD, representing a growth of about 4% compared to 2011. In addition, the average hourly earnings for men in 2017 were approximately 30.7 USD, which is on average 22% higher than women’s earnings. Examining the relationship between age and hourly earnings reveals a positive association, as hourly earnings tend to increase with age and stabilize around the age of 40. Individuals with higher education (comprising 65.6% of the sample) have the highest and above-average hourly earnings, while those with no education attain the lowest hourly earnings.4.4India Human Development Survey (IHDS) DataThe IHDS is a nationally representative survey that covers a wide range of topics. It initially collected data during the period of 2004–2005, and then individuals from the same households were re-interviewed in 2011–2012 (Desai & Vanneman, 2018). For this analysis, we focus specifically on the data from 2012. In the IHDS dataset, men constitute 70% of the sample, while women make up 30%. On average, both men and women have an age of 38 years. The average years of schooling amount to approximately 6 years, with men having higher levels of education compared to women. In 2012, the average hourly earning was around 28.2 Rupees. Men, on average, earned about 37.5% more than women. There is a positive relationship between earnings and age, indicating that earnings tend to increase with age. Individuals with higher education (10% of the sample) have the highest earnings, well above the average, while those with secondary education earn at or below the average level.4.5Korean Labor and Income Panel Study (KLIPS) DataThe KLIPS is a comprehensive labor-related panel survey that combines cross-sectional and time-series data in Korea (KLIPS, 2020). The survey is conducted annually, with a sample of approximately 5,000 urban households and 13,000 individuals. The survey has been conducted since 1998, with the latest wave (21) completed in 2018. For this article, the focus is on the data from the last decade (2010–2018). Each year’s data consist of approximately 60% men and 40% women, with both groups being roughly the same age, averaging 45 years. The average years of education show a slight increase over the years, from 12.7 in 2010 to 13.5 in 2018, with men generally having slightly higher levels of education than women. Over the years, the average hourly earnings in Korea have shown an increase of approximately 30%, rising from 11,165 Korean Won in 2010 to 15,919 Korean Won in 2018. However, there remains a gender pay gap of approximately 32%, favoring men. Individuals with higher education (accounting for 56.4% of the sample) have the highest earnings, indicating the importance of educational attainment in earning potential.4.6Household, Income and Labour Dynamics in Australia (HILDA) DataThe HILDA survey is a nationally representative longitudinal study that focuses on Australian households (Summerfield et al., 2019). The survey began in 2001 and currently consists of 18 waves of data, with the 18th release covering the period from 2001 to 2018 (waves 1–18). The dataset includes approximately 20,000 individuals from 9,000 households. For this analysis, the focus is on the data from the last decade (2010–2018). Men and women are equally distributed within the datasets, and their average age is approximately 39.5 years. The level of education is consistent for both men and women, amounting to about 13 years of education. Between 2010 and 2018, the average hourly earnings in Australia increased by 14.2%, rising from 33.8 USD to 39.4 Australian Dollars. In 2018, Australia’s gender pay gap was reported to be 9.7%. In addition, there is a positive relationship between age and hourly earnings, with earnings tending to increase with age and stabilizing around the age of 40. Individuals with higher education (accounting for 45.7% of the sample) are the highest earners, surpassing the average earnings. On the other hand, individuals with secondary or primary education levels tend to earn below the average.4.7Socio-Economic Panel (SOEP) DataThe German SOEP is a significant and long-running multidisciplinary household survey that includes approximately 15,000 households and 30,000 individuals (Markus, 2019). The survey has been conducted since 1984 and continues to interview many of the same families and individuals, with the latest data released in 2017. For this analysis, the focus is on data covering the period from 2010 to 2017. Men and women are equally represented within the datasets and have similar age distributions. The average schooling level is approximately 13 years of education. In 2017, the average hourly earning in Germany was around 20.9 Euros, reflecting an increase of 6.2% compared to 2010. On average, men earned about 18.6% more than women in 2017. There is a positive relationship between age and hourly earnings, indicating that earnings tend to increase with age. Individuals with higher education (representing 44.3% of the sample) have the highest earnings, surpassing the average level. Conversely, individuals with secondary education tend to earn below the average level.4.8Swiss Household Panel (SHP) DataThe SHP is a large-scale, nationally representative longitudinal study conducted in Switzerland since 1999, collecting data on households and individuals (Voorpostel et al., 2020). The SHP dataset currently consists of 20 waves, covering the period from 1999 to 2018. Each year, approximately 12,000 individuals from 5,000 households are surveyed. For this analysis, the focus is on data from the last decade (2010–2018). The data in the SHP are equally distributed between men and women, with similar age distributions. The average years of education show a slight increase over the years, rising from 14.1 in 2010 to 14.7 in 2018. Men generally have slightly higher levels of education compared to women. In 2018, the average hourly earning in Switzerland was approximately 45.9 Swiss Francs, reflecting a growth of 5% compared to 2010. On average, men earned about 18% more than women in terms of hourly earnings. There is a positive relationship between age and hourly earnings, indicating that earnings tend to increase with age. Individuals with higher education (comprising 59.6% of the sample) earned above the average and more than individuals with lower levels of education.4.9Patterns of Education and Earnings Across RegionsThe analysis of education levels across different regions reveals interesting patterns. European countries, Australia, South Korea, and the USA have the highest levels of education, as indicated in Table 2. This finding aligns with previous predictions made by the International Institute for Applied Systems Analysis (IIASA) (IIASA, 2015). On the other hand, India has the lowest level of education, with 30.7% of its population lacking formal education. This observation supports IIASA’s predictions as well. A surprising finding is that Tanzania has a very low percentage (0.2%) of its population with no education, in contrast to the approximately 20% predicted by the IIASA. This discrepancy may reflect improvements in educational access and policies in Tanzania. Countries such as the USA, South Korea, and Switzerland have high proportions of their populations attaining higher education, with rates exceeding 50%. In addition, there is a gender gap favoring women in terms of educational attainment and the population with higher education. This trend is observed in regions such as Africa, the USA, and Europe, which aligns with findings from the OECD (2019). However, it is important to note that this gender gap is generally not observed in Asian countries. These findings shed light on the variation in education levels and gender gaps across different regions, highlighting the importance of considering regional differences in educational attainment and gender disparities in higher education.Table 2Pattern of education and earning across regionsAfricaAmericaAsiaAustraliaEuropeSouth AfricaTanzaniaUSAIndiaSouth KoreaAustraliaGermanySwitzerland20172015201720122018201820172018Years of education11.68.013.96.013.513.313.114.7Population with no educ.1.7%0.2%0.2%30.7%0.1%0.1%0.0%0.0%Population in higher educ.38.0%7.3%65.6%10.0%56.4%45.7%44.3%59.6%Education gender gap (years)−0.6***0.0−0.5***3.1***0.5***−0.5***−0.2***0.6***Gender gap in higher educ.−8.4%***−0.3%−9.7%***3.4%***7.0%***−13.8%***−4.1%***6.0%***Gender pay gap24.3%***25.3%***22.1%***37.5%***31.8%***9.7%***18.6%***18.0%***Gender pay gap in higher educ.26.5%***2.5%27.8%***10.3%***24.6%***19.3%***24.8%***19.4%***Pay gap (higher vs sec. educ.) 157.2%***70.5%***37.9%***57.0%***27.8%***28.0%***30.8%***27.0%***Notes: In the gender gap, men are taken as reference. educ.: education; sec.: secondary.1The pay gap is calculated between people who attained higher education (reference) and those in secondary education. ***p-value < 0.01.The gender pay gap, which reflects the difference in mean gross hourly earnings between men and women expressed as a percentage of men’s earnings, indicates gender inequality in hourly pay. Across regions, the gender pay gap tends to favor men, with women earning, on average, 23% less than men. In Europe, the gender pay gap is around 18%, which is consistent with the estimate of 14.8% across Europe in 2018 as reported by Eurostat (2020). Australia has the lowest gender pay gap at 9.7%, while Asia has the highest gender pay gap at an average of 35%. These findings align with the observations of the International Labour Organization.https://www.ilo.org/global/about-the-ilo/multimedia/maps-and-charts/enhanced/WCMS_650829/ lang--en/index.htmEven among individuals with higher education, the gender pay gap persists, with men earning 19% more than women. In addition, when comparing the pay gap between individuals with higher education and those with secondary education, significant differences are observed. In Africa and India, the pay gap between these two groups is around 60%, while in other countries, it is around 30%. These findings are consistent with the research conducted by Livanos and Nunez (2012), which also revealed wage inequalities between individuals with higher education and those with secondary education.5Estimates on the Returns to EducationIn this section, we present the estimates of the returns to education using the Mincerian equation (as described in Section 3.1) and the ML algorithm SVR.Specifically, we employ the SVR algorithm and tune the hyperparameters using the R programming language (Core R, 2020).The returns to education are computed separately for each country and survey year and then averaged across the regions. We estimate the returns for each level of education (primary, secondary, and tertiary) based on the extended Mincerian equation. In addition, we calculate the returns by gender. To evaluate the models, we employ cross-validated (out-of-sample) RMSE and coefficient of determination (R2) metrics. We conduct 200 simulations, which consist of 20 repetitions of 10-fold cross-validation, to obtain robust estimates. We compare the results obtained from SVR with those obtained from OLS regression. By using these evaluation metrics and comparing the results with OLS, we can assess the predictive performance and goodness of fit of the SVR models. This approach allows us to obtain reliable and robust estimates of the returns to education across different countries and survey years.5.1Returns to EducationThe returns to an additional year of education for each country are summarized in Table 3. In the first row, the total private rate of returns to another year of schooling is reported to be 10.4% for the period from 2010 to 2018. This result surpasses other findings reported by Psacharopoulos and Patrinos (2018) (9%), Patrinos (2016) (9.7%), and Montenegro and Patrinos (2014) (10.1%). Globally, the returns to education are highest in Africa, with a rate of 17.8%, significantly above the average. Healthy returns are also observed in America, Asia, and Australia, with rates of 11.5, 10.7, and 9.8%, respectively. On the other hand, the lowest returns are found in Europe, at only 7.2%. These findings highlight the variations in the returns to education across different regions. Africa stands out with the highest returns, while Europe lags behind with the lowest returns. These results provide important insights into the economic value of education in different parts of the world.Table 3Average returns to education across regionsYearAverage years of educationAverage returns to educationAverage returns to primary educationAverage returns to secondary educationAverage returns to tertiary educationAverage male returns to educationAverage female returns to educationTotal2010–201812.80.104 (0.003)0.1030.0640.1200.1010.106Africa2010–20179.90.178 (0.003)0.1050.0710.2490.1640.209South Africa201711.60.179 (0.001)0.1990.0940.2010.1720.203201511.20.152 (0.001)0.1090.0740.1760.1410.181201211.40.144 (0.004)0.1180.0600.1790.1350.175201011.10.156 (0.001)0.0940.0830.1920.1440.193Tanzania20158.00.206 (0.004)0.0610.0520.3240.1920.23220138.10.200 (0.004)0.0750.0720.3420.1800.23420117.90.207 (0.003)0.0810.0610.3300.1870.246America2011–201713.90.115 (0.004)0.0550.1050.1220.116USA201713.90.111 (0.004)0.0460.1050.1200.112201514.00.121 (0.005)0.0630.1060.1290.123201313.90.113 (0.002)0.0620.1010.1210.113201113.90.113 (0.002)0.0470.1060.1180.115Asia2010–201812.40.107 (0.004)0.0830.0550.1320.0880.109India20126.00.082 (0.002)0.0830.0550.1550.0710.080South Korea201813.50.095 (0.001)0.1060.0800.097201713.40.103 (0.003)0.1200.0820.111201613.30.100 (0.002)0.1170.0790.108201513.20.111 (0.002)0.1340.0900.115201413.00.103 (0.002)0.1210.0810.111201313.00.115 (0.007)0.1350.0970.112201212.90.117 (0.006)0.1420.0960.118201112.80.119 (0.004)0.1400.1020.112201012.70.122 (0.003)0.1510.0980.128Australia2010–201813.10.098 (0.003)0.1010.1050.095Australia201813.30.103 (0.005)0.1070.1080.104201713.20.101 (0.002)0.1020.1080.098201613.20.096 (0.002)0.0940.1020.095201513.20.099 (0.002)0.0990.1060.097201413.10.103 (0.007)0.1120.1080.104201313.10.096 (0.002)0.1000.1010.096201213.00.093 (0.002)0.0960.0990.090201112.90.100 (0.002)0.1060.1080.094201012.90.091 (0.001)0.0920.1030.081Europe2010–201813.80.072 (0.002)0.0720.0760.065Germany201713.10.083 (0.002)0.0840.0900.078201613.10.082 (0.003)0.0830.0890.077201513.10.082 (0.003)0.0840.0900.075201413.10.081 (0.004)0.0830.0870.076201313.20.081 (0.003)0.0820.0900.072201213.10.079 (0.004)0.0790.0880.071201113.10.076 (0.002)0.0760.0860.066201013.10.084 (0.004)0.0840.0910.076Switzerland201814.70.061 (0.001)0.0580.0640.052201714.70.065 (0.001)0.0620.0670.055201614.60.064 (0.001)0.0640.0650.056201514.50.064 (0.001)0.0630.0640.057201414.30.064 (0.001)0.0620.0650.056201314.30.064 (0.001)0.0670.0600.059201214.20.066 (0.001)0.0680.0630.061201114.10.062 (0.001)0.0620.0610.054201014.10.065 (0.001)0.0650.0640.0565.1.1Returns by GenderThe analysis reveals that, on average, returns to education are higher for women than for men, which aligns with the existing literature (e.g., Patrinos, Psacharopoulos, & Tansel, 2019; Psacharopoulos & Patrinos, 2018). The total rate of returns to another year of education is estimated to be 10.1% for men and 10.6% for women. In Africa and Asia, the returns to education for women are statistically significantly higher than for men, with a p-value of less than 1%. This suggests that women benefit more from investing in education in these regions. However, in the USA, men’s returns to education are similar to those of women, and in Europe, men’s returns are even higher. This contrast aligns with the findings of Mendolicchio and Rhein (2014) when analyzing West European countries. Notably, Switzerland stands out with the lowest returns to education for men, at only 6%. On the other hand, Tanzania shows the highest returns to education for women, exceeding 20%.5.1.2Returns by Level of EducationReturns to different levels of education follow a specific pattern. Globally, higher education yields the highest returns, followed by primary education (only calculated in Africa and India), while secondary education has the lowest returns. These findings align with the existing literature (Montenegro & Patrinos, 2014; Psacharopoulos & Patrinos, 2018). On average, the returns to tertiary education amount to 12%, indicating that individuals who invest in higher education experience a higher increase in earnings compared to those with lower levels of education. Similarly, the returns to primary education are estimated to be 10%. In Africa, the returns to tertiary education are particularly high, reaching 24.9%, highlighting the significant economic benefits of pursuing higher education in this region. Returns to primary education in Africa amount to 10%, which still represents a positive return on investment. In contrast, Europe exhibits the lowest returns to higher education, at 7.2%, suggesting that the economic gains from pursuing tertiary education are relatively lower in this region compared to others.5.1.3Returns to Education Over TimeAs displayed in Figure 4, the returns to education tend to decrease slightly over time, with a rate of approximately 0.1% per year. This finding suggests that the economic benefits individuals derive from investing in education may gradually decline over time. This observation is consistent with previous research studies (Montenegro & Patrinos, 2014; Patrinos, 2016; Peet et al., 2015; Psacharopoulos & Patrinos, 2018). In contrast to the declining returns to education, the average number of years of education has been increasing over time at a rate of approximately 0.16 years per year or 1% per year. This finding implies that individuals are investing more in their education and acquiring higher levels of schooling as time progresses. The simultaneous increase in the average number of years of education and the decline in the returns to education may indicate a shift in the labor market’s demand for specific skills and qualifications. It suggests that while education remains important, other factors such as technological advancements, changing labor market dynamics, and skill requirements may influence the returns to education.Figure 4Returns to education and years of education over the last decade.5.1.4Returns to Education and Average Years of EducationThe regression analysis displayed in Figure 5 suggests that there is a negative relationship between the number of years of education and the returns to education. Specifically, the analysis indicates that for each additional year of education, the returns to education decrease by 1.4 percentage points. This finding supports the notion that education responds to price signals in the economy. As the level of education increases within a given economy, the supply of educated individuals also increases. This increased supply can lead to a decrease in the returns to education as there is a larger pool of individuals with similar levels of education competing for similar job opportunities. The result obtained in this analysis is consistent with the findings of Montenegro and Patrinos (2014), suggesting that the relationship between education and the returns to education is influenced by market dynamics and the supply of educated individuals.Figure 5Decrease of returns to education as education increases.5.2Comparison with OLS ReturnsTo compare the returns to education estimated by the SVR model and the OLS model, both models are applied to the same dataset using the same set of variables. The performance of each model is evaluated using the R2 and RMSE metrics. By conducting 200 cross-validated simulations, the models’ performance is assessed on multiple subsets of the data, ensuring robust and reliable estimates of the model’s performance. This approach helps account for the potential variation in the results that could occur due to different data partitions.According to the results presented in Table 4, the returns to education estimated using the SVR model are generally higher compared to those estimated using the OLS model. This suggests that the SVR model captures additional nonlinearities or patterns in the data that are not captured by the linear OLS model. In terms of predictive performance, the SVR models consistently outperform the OLS models. The SVR models achieve higher R2 values, indicating that they explain a larger proportion of the variance in the returns to education. In addition, the SVR models have lower RMSE values, indicating that their predictions are closer to the actual observed values. These results suggest that the SVR model, being an ML algorithm, is better able to capture the complex relationships and patterns in the data, resulting in more accurate and robust estimates of the returns to education. The SVR model’s ability to handle nonlinearity and capture more nuanced patterns gives it an advantage over the traditional OLS model in this context. It is important to note that the performance of the SVR model may vary depending on the specific dataset and the choice of hyperparameters. Nonetheless, the results presented in Table 4 suggest that the SVR model provides more reliable and accurate estimates of the returns to education compared to the OLS model in this particular analysis.Table 4Comparison of SVR and OLS estimated returns and cross-validated model performancesRate of returnR2RMSECountrySVROLSSVROLSSVROLSSouth Africa (2017)0.179 (0.001)0.172 (0.001)0.466 (0.033)0.344 (0.031)0.647 (0.022)0.679 (0.022)Tanzania (2015)0.206 (0.004)0.189 (0.002)0.355 (0.046)0.282 (0.036)0.871 (0.051)0.899 (0.052)USA (2017)0.111 (0.004)0.110 (0.001)0.252 (0.015)0.210 (0.019)0.544 (0.014)0.550 (0.017)India (2012)0.082 (0.002)0.081 (0.000)0.386 (0.012)0.272 (0.007)0.591 (0.007)0.604 (0.007)South Korea (2018)0.095 (0.001)0.084 (0.001)0.210 (0.014)0.175 (0.019)0.563 (0.016)0.568 (0.017)Australia (2018)0.103 (0.005)0.070 (0.001)0.222 (0.014)0.179 (0.014)0.465 (0.018)0.468 (0.017)Germany (2017)0.083 (0.002)0.076 (0.001)0.231 (0.012)0.192 (0.012)0.441 (0.012)0.442 (0.011)Switzerland (2018)0.061 (0.001)0.061 (0.001)0.280 (0.028)0.258 (0.030)0.366 (0.020)0.371 (0.019)6Conclusion and Policy ImplicationsThis article contributes to the existing literature by examining the private rates of returns to education across different regions worldwide using very recent data from 2010 to 2018. The analysis includes data from South Africa, Tanzania, the USA, India, South Korea, Australia, Germany, and Switzerland. In contrast to the traditional approach of using OLS, this study employs an ML technique called SVR to estimate the returns to education. The SVR approach allows for capturing nonlinear relationships and complex patterns in the data, potentially providing more accurate and robust estimates. The analysis is based on the Mincer earnings function, both the basic and extended versions, which consider factors such as years of education, labor market experience, and educational levels. The samples used in the analysis are carefully homogenized to ensure comparability across regions, considering variables such as age, level of education, employment status, and earnings. By applying the SVR approach and considering recent data, this study aims to provide updated and reliable estimates of the private rates of returns to education in different regions. The use of ML techniques and the careful selection of variables and samples enhance the understanding of the relationship between education and earnings in these specific contexts.The findings of this study highlight the positive and significant returns to education globally, with an average rate of 10.4% per year for another year of education. However, the returns to education vary across regions, ranging from 7% in Europe to 18% in Africa. Africa stands out with the highest returns, which are approximately twice as high as those in America, Asia, and Australia. One possible explanation for the higher returns to education in Africa could be the structure of the labor market. The labor market in Africa is often characterized by two segments: an informal sector and a formal sector. Individuals with lower levels of education or only primary education may face difficulties in accessing formal sector employment and may choose to remain unemployed or work in the informal sector. On the other hand, individuals with secondary or tertiary education tend to enter the formal sector, which offers higher returns to education. This segmentation of the labor market contributes to the higher returns observed in Africa. The discussion by Kuepié and Nordman (2016) likely provides further insights into the reasons behind the high returns to education in Africa. Understanding these factors is crucial for policymakers and stakeholders to design effective education and labor market policies that can enhance human capital development and improve economic outcomes in the region.The finding that tertiary education yields the highest returns globally, followed by primary and then secondary education, challenges previous research results that often indicated higher returns for primary education. This finding has important implications for policymakers and educational institutions. First, policymakers should consider increasing investment in tertiary education. The higher returns associated with tertiary education suggest that individuals who obtain higher education qualifications are likely to have better employment opportunities and higher earning potential. This can contribute to economic growth and development. Therefore, allocating educational budgets to support tertiary education programs and institutions can be a strategic decision to meet the demand for highly skilled professionals in various sectors. However, it is also crucial to prioritize investments in primary education and ensure universal access to quality primary education worldwide. Primary education serves as the foundation for individuals’ educational journey and plays a fundamental role in shaping their future opportunities. Neglecting primary education could lead to widening inequalities and hinder the development of a skilled workforce. Furthermore, secondary education should not be overlooked, as it serves as a crucial bridge between primary education and tertiary education. It provides essential knowledge and skills that prepare individuals for higher education or vocational training. Emphasizing the importance of secondary education can help ensure a smooth transition to higher levels of education. While the finding of higher returns to tertiary education highlights the importance of investing in higher education, policymakers should maintain a balanced approach. Prioritizing universal access to quality primary education and recognizing the significance of secondary education can contribute to creating a comprehensive and inclusive education system that caters to the diverse needs of individuals and promotes overall societal development.Returns to education are generally higher for women than for men is an important observation. It indicates that investing in education yields a greater payoff for women in terms of their earnings and economic outcomes. While it does not necessarily mean that women earn higher salaries than men, it underscores the significance of education for women and girls in improving their economic prospects. Ensuring that women have access to and complete at least primary and secondary education is crucial. Education equips women with valuable skills, knowledge, and opportunities, which can enhance their productivity and contribute to their overall well-being. By closing the gender gap in education enrollment and promoting educational opportunities for women, societies can empower women to participate fully in economic and social life. Moreover, investing in women’s higher education is an essential priority. Higher education has been shown to have significant returns, and enabling more women to pursue university education can lead to greater efficiency gains and reduce the gender pay gap. By providing women with access to higher education and supporting their educational aspirations, barriers to career advancement can be overcome, leading to more equitable economic outcomes and increased opportunities for women in various sectors. It is worth noting that education not only enhances women’s skills and productivity but also plays a role in reducing the gender pay gap by addressing factors such as discrimination, preferences, and societal circumstances that contribute to unequal earnings. By improving educational opportunities for women and promoting gender equality in education, societies can strive toward more inclusive and equitable economic systems. Investing in women’s education, from primary to higher education, is a critical step toward promoting gender equality, improving economic outcomes for women, and reducing the gender pay gap. Efforts should be directed at eliminating disparities in education enrollment, ensuring access to quality education for women and girls, and creating an enabling environment that supports their educational and professional advancement.Indeed, the use of ML techniques, such as SVR, in calculating returns to education, has the potential to revolutionize the analysis and understanding of the economic value of education. ML algorithms offer advantages such as improved accuracy, better handling of complex and non-linear relationships, and the ability to incorporate a wide range of variables and factors into the analysis. By leveraging ML algorithms, policymakers, institutions, and students can gain valuable insights into the returns to education and make more informed decisions. This includes designing relevant and effective curricula, providing tailored career guidance, and allocating resources more efficiently. ML techniques can also assist in evaluating the effectiveness of education financing programs, such as student loans, by analyzing the potential returns on investment for different educational paths. While this study focuses on private returns to education, which directly benefit individuals in terms of financial outcomes, it is important to acknowledge the broader social benefits of education as well. Investigating the social returns to education would provide insights into the positive externalities and wider societal impacts that education brings, including improved productivity, social mobility, and reduced inequalities. Further research into social returns to education can help policymakers and stakeholders better understand the long-term benefits of investing in education and guide policy decisions that promote inclusive and equitable educational opportunities. By considering both private and social returns, a comprehensive understanding of the full value and impact of education can be achieved, leading to more effective educational policies and interventions. Overall, the integration of ML techniques in the analysis of returns to education opens up new avenues for research and policy development, enabling more data-driven and evidence-based decision-making in the field of education. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Open Education Studies de Gruyter

Estimating the Returns to Education Using a Machine Learning Approach – Evidence for Different Regions

Open Education Studies , Volume 5 (1): 1 – Jan 1, 2023

Loading next page...
 
/lp/de-gruyter/estimating-the-returns-to-education-using-a-machine-learning-approach-U8qd9cWBNu

References (27)

Publisher
de Gruyter
Copyright
© 2023 the author(s), published by De Gruyter
ISSN
2544-7831
DOI
10.1515/edu-2022-0201
Publisher site
See Article on Publisher Site

Abstract

1IntroductionWith the emergence of the human capital theory, initially proposed by Smith (1776) and later popularized by Becker (1964), Mincer (1974), and Schultz (1960, 1961), the relationship between investment in human capital and income distribution has been a subject of continuous investigation. Economists have conceptualized human capital as the accumulation of human resources, including knowledge, skills, and personal characteristics, that contributes to the generation of economic value in the labor market (Acemoglu & Autor, 2011; Fogg, Harrington, & Khatiwada, 2018; Goldin & Katz, 2007). Formal education, as a significant component of human capital, plays a crucial role in determining social and economic success, and investments in education are associated with numerous future benefits.Systematic attempts to estimate the social and economic impact of educational investment began in the 1950s and gave rise to the concept of rates of returns to education (Becker, 1962; Becker & Chiswick, 1966; Carnoy, 1967; Chiswick, 1969; Mincer, 1958). Estimating the private rate of returns to education, which refers to the financial earnings individuals expect to receive as a result of their educational investment, provides crucial insights. Two primary methods are commonly used to calculate the expected rates of returns to education:The discounting method, introduced by Psacharopoulos and Patrinos (2018), utilizes the concept of the pure internal rate of returns to schooling. This method involves determining the discount rate that equalizes the stream of benefits from education to the stream of costs at a specific point in time.The earnings function method, formalized by Mincer (1974), examines the rate of returns to schooling as the relative change in earnings resulting from an additional year of education. The Mincer earnings function is widely employed in empirical research and allows economists to estimate the monetary returns to education. Its simplicity facilitates direct comparisons and enables individuals to assess the profitability of investing in education, thereby aiding their decision-making process regarding the optimal level of educational investment.The estimation of rates of returns to education remains a prominent topic in the field of economics, leading to a substantial body of empirical research (Montenegro & Patrinos, 2014, 2013; Patrinos & Psacharopoulos, 2010; Psacharopoulos & Patrinos, 2004, 2018). Recent compilations of studies consistently reveal similar patterns: the global and private rates of returns to education average around 9% per year. Notably, returns tend to be higher in low- and middle-income economies compared to high-income economies. In addition, returns are highest at the tertiary and primary education levels. Furthermore, the findings indicate that women tend to experience higher returns than men, and the private sector of the economy benefits from greater returns compared to the public sector.Ordinary Least Squares (OLS) is a widely utilized econometric algorithm for calculating the rates of returns to education. However, existing literature highlights the issue of high bias associated with OLS, as it often underfits the data by failing to capture the underlying patterns (Dangeti, 2017). Bias error refers to the average difference between the expected value of an estimator and the true value of the parameter being estimated. In recent years, state-of-the-art algorithms from the field of machine learning (ML) have gained prominence and found applications in econometrics. Among these algorithms, Support Vector Regression (SVR) (Basak, Pal, Ch, & Patranabis, 2007; Gani, Taleb, & Limam, 2010; Smola & Schölkopf, 2004) has shown promise for estimating the rates of returns to education. SVR is a regression model within the ML framework that offers several advantages over OLS. These benefits include improved predictive performance, robustness to outliers, and lower computational requirements compared to other regression techniques. Furthermore, ML algorithms have the capacity to uncover hidden patterns and capture nonlinear effects that have a significant impact on the estimation of returns to education. By leveraging these advanced techniques, researchers can potentially obtain more accurate and comprehensive insights.The objective of this analysis is to estimate the rates of returns to education for eight distinct countries across various regions, namely Australia, Germany, India, South Africa, South Korea, Switzerland, Tanzania, and the United States of America (USA). Moreover, we aim to assess the rates of returns to education for each country over the past decade, specifically from 2010 to 2018, utilizing the most recently available datasets. To ensure consistency, the data have been standardized in terms of age groups, education levels, employment statuses, and earnings, thereby accounting for the significant heterogeneity within each region. The estimation process applies SVR in conjunction with the widely recognized Mincer earnings function. In addition, we explore variations in returns between genders and across different educational levels, including primary, secondary, and tertiary education.Through the utilization of state-of-the-art ML models trained on these rich and diverse datasets, this study aims to explore the intricate relationships between education and economic outcomes. By uncovering regional disparities and providing evidence-based insights, the research findings have significant implications for policy and institutional decision-making processes. The article highlights the superior performance of ML algorithms over traditional methods, contributing to the growing body of research advocating for the integration of advanced statistical techniques in estimating returns to education. The potential impact of this study is profound, as it can revolutionize the evaluation of education’s impact by policymakers, educational institutions, and individuals. By considering the economic consequences of educational investment and spending, stakeholders can make informed choices regarding educational investments. This, in turn, can lead to the development of more effective policies and improved outcomes for individuals and society as a whole. The insights generated by this study provide essential guidance for optimizing resource allocation, enhancing educational systems, and driving socioeconomic progress.The evaluation results reveal that Sub-Saharan Africa has the highest returns to education, averaging at 18%. Healthy rates of returns are also observed in Australia (10%), Asia (10%), and America (11%), while Western Europe exhibits the lowest returns, averaging at 7%. In addition, higher education yields the highest returns across all regions, followed by primary and secondary education. Women generally experience higher rates of returns compared to men, with rates of 10.6 and 10.1%, respectively. Over time, the returns to education demonstrate a modest decline of approximately 0.1% per year. In contrast, the average number of years of education exhibits an increase of 0.16 years annually (1% per year). A comparison between the conventional OLS method and the ML algorithm (SVR) demonstrates that the ML approach provides more robust and accurate estimates, along with superior predictive performance.The structure of the article is as follows: Section 2 provides a review of previous estimates of rates of returns to education. Section 3 explains the methodology employed for estimating rates of returns to education using the Mincer earnings equation and the ML approach. The datasets utilized for analysis are described in Section 4, along with an exploration of the patterns of education and earnings across the different regions. Section 5 presents the estimates of rates of returns to education. The final section discusses the results and highlights the policy implications derived from the findings.2Returns of Investment in Education in the WorldThe literature has amassed a substantial body of research that examines the relationship between earnings and education, with a particular focus on estimating the rates of returns to educational investment. In this section, we provide a comprehensive review of the research evidence concerning the returns to education over the past decade. By synthesizing and analyzing the findings from various empirical works, we aim to provide an extensive understanding of the returns on educational investments during this period.2.1Rates of Returns to Education in AfricaAfter the tremendous work conducted by Psacharopoulos and Patrinos (2018), which extensively examined the global literature encompassing 1,120 estimates across 139 countries from 2004 to 2011, it was revealed that Sub-Saharan Africa exhibits one of the highest private returns to education (10.5%). These estimates were predominantly derived using the Mincer equation and the OLS algorithm. Montenegro and Patrinos (2014), after compiling data on the global returns to education from 139 economies spanning the years 1980–2013, discovered that Sub-Saharan Africa had the highest rates of returns (12.5%).Depken, Chiseni, and Ita (2019) employed OLS and examined two waves of the National Income Dynamics Study to estimate the private returns to education in South Africa for the years 2010 and 2012. The analysis revealed returns of approximately 18% per year, with higher returns observed for females compared to males. Similar estimates were obtained by Salisbury (2015) using data from the 2008 wave. By employing the Two-Stage Least Squares (2SLS) procedure on the 2012 wave, Biyase and Zwane (2015) reported a return rate of 47%.Moreover, multiple studies have focused on estimating the rates of returns to education in Tanzania. The most recent estimate, compiled by Montenegro and Patrinos (2014), indicates a rate of 16.1% in 2011. Returns on higher education were found to be the highest (19.4%), while secondary and primary education yielded similar returns (15 and 14.6%, respectively). In addition, Peet, Fink, and Fawzi (2015) estimated the return rate for Tanzania in 2010 to be 11.1%, while Serneels, Beegle, and Dillon (2017) reported estimated returns of 8% for men and 10% for women in 2008. Further rates of returns for both countries are provided in Table A1.2.2Rates of Returns to Education in the USAIn the USA, there is a consistent effort to estimate the returns on investment in education. The Organisation for Economic and Comparable Development (OECD) regularly publishes an article called “Education at a Glance,” which provides ongoing information about the state of education worldwide. These articles present data on education system finances, performance, and incentives to invest in education across OECD countries. Using data from the Programme for the International Assessment of Adult Skills (PIAAC) in 2016, OECD (2019) estimated the private internal rate of returns to tertiary education for both men and women in the USA to be 20%. This estimate was 18% in 2015. Furthermore, Fogg et al. (2018) found that in 2011–2012, high school dropouts were projected to earn 16–17% less than high school graduates. Conversely, bachelor’s degree holders earned 30% more than high school graduates, and master’s degree holders earned approximately 45% more than high school graduates. For additional estimates, please refer to Table A1.2.3Rates of Returns to Education in AsiaSignificant research efforts have been undertaken to estimate the rates of returns to education in India and South Korea over the past decade. In the case of India, Sikdar et al. (2019) conducted the most recent study using data from 2011–2012. Employing the Mincerian equation, the study found insignificant relationships between wages and education level, estimating the financial returns to education at 1.5%. However, graduates from tertiary education typically obtain higher-paying jobs. Jacob et al. (2018), analyzing returns to various professions and degrees based on data from 2011–2012, found that medical graduates had the highest returns, followed by engineering graduates and professional postgraduates. Two additional studies (Agrawal, 2011; Rani, 2014), although based on older data from 2005, obtained rates of returns to education of 14 and 8.5%, respectively. Agrawal (2011) further examined the temporal change in returns to different levels of education and observed rising returns with higher education levels.Turning to South Korea, estimates of the returns to education are primarily evaluated by the OECD using PIAAC data and the full discounting method. In 2016, private returns to tertiary education were estimated to be 22% for men, 20% for women, and 21% on average (OECD, 2019). Similarly, in 2015, returns to tertiary education averaged 22%, with men experiencing a rate of 25% and women 19% (OECD, 2018). As shown in Table A1, returns to education have increased from 2011 to 2016. Another study examining the Korean minority living in China, based on data from 2009–2010, revealed high returns to education compared to the Asian and world averages. This explains the strong private demand for education among the ethnic Koreans in China (Mishra & Smyth, 2013).2.4Rates of Returns to Education in AustraliaThe rates of returns to education in Australia are primarily investigated and reported by the OECD. In 2016, the average returns to higher education were estimated to be 13.5%, showing similar values for both men and women (OECD, 2019). There was only a marginal change compared to the previous year’s returns, which stood at 12% in 2018 (OECD, 2018). Additional estimates from the OECD can be found in Table A1.Other studies have examined the returns to education in Australia using the Mincerian equation. Montenegro and Patrinos (2014) obtained a rate of returns to education of 14.1% for the year 2010, while Mariotti and Meinecke (2011) estimated a rate of 8.3% for the same year. The disparity in these findings can be attributed to the utilization of different datasets in each analysis.2.5Rates of Returns to Education in EuropeEuropean countries, particularly those in Western Europe, are characterized by relatively low average rates of returns to education, as reported by Psacharopoulos and Patrinos (2018) with a value of 7.3%. In the case of Germany and Switzerland, returns to education have been consistently studied and analyzed.According to the OECD reports, the rates of returns to tertiary education in Germany were 15% in 2016 (OECD, 2019) and 11% in 2015 (OECD, 2018). Using the Mincerian earning function with OLS estimation, Montenegro and Patrinos (2014) calculated rates of returns for the years 2010, 2011, and 2012, resulting in estimates of 15.2, 14.3, and 14.5%, respectively. Similarly, employing a two-stage least squares (2SLS) approach, Mysíková and Večerník (2015) obtained estimates of 15.6% in 2010 and 15.5% in 2011.For Switzerland, calculations revealed comparable rates of returns to tertiary education in 2016 and 2015, both amounting to 14% (OECD, 2018, 2019). Previous calculations based on the Mincerian equation indicated higher returns, with estimates of 21% for the years 2010 and 2011 using the 2SLS method (Mysíková & Večerník, 2015). However, when employing OLS, the rates for 2011 and 2012 were approximately 12% (Montenegro & Patrinos, 2014).The majority of studies included in this review rely on older data, with data collection concluding by 2012, except for those analyzed by the OECD, which includes the most recent data from 2016. It is important to note that the results provided by the OECD may have limitations due to the employed methodology, specifically the full discounting method. This approach is less commonly used in the economics literature, as most researchers typically adopt the standard and conventional approach based on the Mincer earnings function. In addition, studies based on the Mincer equation may occasionally fail to capture trends in different regions.The present study focuses on examining the most recent data released in 2018 from diverse regions worldwide, including South Africa and Tanzania in Africa, the USA, South Korea and India in Asia, and Germany and Switzerland in Europe. These regions exhibit significant heterogeneity, and comparing their returns on educational investment can offer valuable insights. To enhance the accuracy and robustness of the estimation, an ML algorithm called SVR is employed in combination with cross-validation. This innovative approach allows for more reliable and precise estimations. The subsequent sections will provide detailed information about the empirical strategy and the data utilized for the analysis.3Empirical SpecificationThe methodological approach employed in this study involves estimating the returns to education by leveraging an ML technique that builds upon the Mincerian earning function.3.1Mincerian Earning FunctionEstimates of the returns to education are commonly derived using the Mincer model (Mincer, 1974; Patrinos, 2016). The Mincer model can be extended to estimate the returns to education at different levels of schooling. The standard Mincer model involves regressing the logarithm of hourly wage (or in some cases, weekly, monthly, or yearly wage) on a set of human capital variables. These human capital variables typically include total years of education, labor market experience, and experience squared. The Mincer equation can be expressed as follows:(1)yi=log(wi)=β0+β1Si+β2Xi+β3Xi2+μi,{y}_{i}=\text{log}({w}_{i})={\beta }_{0}+{\beta }_{1}{S}_{i}+{\beta }_{2}{X}_{i}+{\beta }_{3}{{X}_{i}}^{2}+{\mu }_{i},where yi= log(wi) denotes the natural logarithm of the earned wage for the individual i; Si∈ R constitutes the number of years of schooling; and Xi∈ R represents the potential labor market experience. The labor market experience is calculated by subtracting from the age of the individual the number of years of schooling and the age at which the individual started schooling (Xi= Agei− Si− 6, with 6 being the average age for starting schooling). Xi2 is the square of the potential experience, and µiis the random error term reflecting unobserved characteristics. Consequently, the coefficient of interest β1 on years of schooling (Si) can be interpreted as the private average rate of return to an additional year of education. The basic Mincer function assumes that the rate of return is the same for all levels of schooling and considers foregone earnings as the sole costs of education. The inclusion of the experience variable in the equation accounts for the expected positive relationship between earnings and experience. In addition, the experience squared term captures the potential non-linear relationship between earnings and experience.To analyze the returns to different levels of education, an extended version of the Mincer model can be employed. In this extended model, the continuous variable representing the total years of education (S) is transformed into a series of dummy variables: Ep for primary education, Es for secondary education, and Et for tertiary education. The extended Mincer equation can be expressed as follows:(2)log(wi)=β0′+βpEpi+βsEsi+βtEti+β2′Xi+β3′Xi2+μi′.\log ({w}_{i})={\beta }_{0}^{^{\prime} }+{\beta }_{\text{p}}{E}_{{\text{p}}_{i}}+{\beta }_{\text{s}}{E}_{{\text{s}}_{i}}+{\beta }_{\text{t}}{E}_{{\text{t}}_{i}}+{\beta }_{2}^{^{\prime} }{X}_{i}+{\beta }_{3}^{^{\prime} }{X}_{i}^{2}+{\mu }_{i}^{^{\prime} }.In the extended Mincer equation, the dummy variables Ep, Es, and Et take a value of 1 when an individual has attained the respective level of education (primary, secondary, or tertiary). The reference level is represented by individuals with no education, and it is excluded from equation (2) to prevent matrix singularity.Once the extended earning function has been computed, the private rates of return to each year of education for a specific level can be derived using the following approach:(3)ratep=βp/Sp,{\text{rate}}_{\text{p}}={\beta }_{\text{p}}/{S}_{\text{p}},(4)rates=βs/Ss,{\text{rate}}_{\text{s}}={\beta }_{\text{s}}/{S}_{\text{s}},(5)ratet=βt/St,{\text{rate}}_{\text{t}}={\beta }_{\text{t}}/{S}_{\text{t}},with ratep, rates, ratet being the private rate of returns to primary, secondary, and tertiary education, respectively. Sp, Ss, and St stand for the average number of years spent in primary, secondary, and tertiary education. For convenience, 3 years are assigned for primary education, 6 years for secondary education, and 4–5 years for tertiary education.3 years represent the foregone earnings for primary-school-aged (Psacharopoulos, 1985).The specific duration of tertiary education may vary depending on the higher education system of each country.For instance, higher education in Germany, Switzerland, and Tanzania typically requires an average of 5 years, while it is typically 4 years in Australia, South Korea, India, the USA, and South Africa.The Mincer equation continues to be one of the most widely used models in econometrics for estimating returns to education. As highlighted by Montenegro and Patrinos (2014) and Patrinos (2016), estimates of returns to education derived from the Mincer equation exhibit stability and comparability. It is important to note that these estimates do not necessarily establish causality between education and earnings but rather indicate the conditional association between years of education and labor market incomes.3.2Machine learning (ML) in EconometricsML, and artificial intelligence (AI) more broadly, refers to the field of study focused on computer algorithms that automatically learn and improve knowledge through experience. ML has found numerous applications in econometrics, including computational finance, higher education, and health. ML algorithms possess the capability to identify natural patterns and intricate structures within large datasets, often outperforming traditional algorithms. They have emerged as transformative technologies with immense potential in higher education. These technologies are being utilized in various aspects of the education system, such as personalized learning, intelligent tutoring systems, academic analytics, and administrative tasks. By harnessing the power of ML and AI, higher education institutions and educators can enhance student engagement, improve learning outcomes, optimize resource allocation, and drive innovation in teaching and research methodologies. The application of ML and AI in higher education is a topic of increasing interest and has the potential to revolutionize the educational landscape (Bozkurt, Karadeniz, Baneres, Guerrero-Roldán, & Rodríguez, 2021; Zawacki-Richter, Marín, Bond, & Gouverneur, 2019).In econometrics, traditional statistical algorithms are typically employed to estimate the parameters β that characterize the relationship between an output variable y and a set of input covariates x. These algorithms aim to identify the underlying statistical associations and make inferences about the relationship between the variables. Traditional statistical algorithms are commonly used for such inference tasks. However, recent discussions by Mullainathan and Spiess (2017) have highlighted the application of ML algorithms in econometrics. These ML algorithms, such as least absolute shrinkage and selection operator regression and SVR as utilized in this article, offer alternative approaches to modeling and prediction. ML algorithms can provide more flexibility by handling complex relationships and capturing nonlinearities that may be challenging for traditional statistical algorithms.3.2.1How ML Generally WorksML algorithms operate by taking a known set of input and output data (x,y) and constructing models that can make predictions yˆ=f(x)\hat{y}=f({\bf{x}}). During the training process, these algorithms aim to minimize a loss function L(y,yˆ)L(y,\hat{y}), which quantifies the discrepancy between the predicted values and the true values in the training data. The objective is to find the model that minimizes the expected prediction loss Ey,x[L(y,yˆ)]{E}_{y,{\bf{x}}}{[}L(y,\hspace{.25em}\hat{y})]on new, unseen data. This means that ML models are evaluated based on their performance on data that were not used in the model construction process, often referred to as out-of-sample data. This approach differs from traditional econometric models, where all available data are typically used for model estimation. By evaluating model performance on out-of-sample data, ML models are better able to generalize and make robust predictions. This mitigates the risk of overfitting, where models become too closely aligned with the training data and may not perform well on new data. Consequently, the estimated parameters, such as β, in ML models can provide more reliable and meaningful inferences for future predictions and deductions.3.2.2How an ML Algorithm is Set UpSetting up an ML algorithm involves an essential step known as tuning, where the hyperparameters of the algorithm are adjusted to optimize the model’s performance.Hyperparameters differ from the parameters (e.g., β) estimated by the model itself during training and are set by the user.One commonly used technique for selecting hyperparameters is Grid Search (Ozdemir, 2016).Grid Search involves manually specifying a set of values for the hyperparameters and systematically evaluating the model’s performance with different combinations of these values.When tuning the model to achieve optimal performance, a potential concern is overfitting, where the model becomes too closely aligned with the training data and fails to generalize well to new, out-of-sample data. To address this issue, a suitable approach is to utilize the cross-validation method (Hastie, Tibshirani, & Friedman, 2009). Cross-validation involves dividing the available data into multiple subsets or “folds.” The model is trained on a portion of the data and evaluated on the remaining fold, and this process is repeated multiple times, with each fold serving as the evaluation set once. This allows for a more robust assessment of the model’s performance and helps mitigate the risk of overfitting. By combining techniques such as Grid Search and cross-validation, ML algorithms can be effectively tuned to optimize performance while avoiding overfitting and ensuring generalizability to new, unseen data.3.2.3How an ML Model is EvaluatedMetrics such as the coefficient of determination (R2) and Root Mean Square Error (RMSE) are commonly used to evaluate the predictive performance of ML models in regression problems (Hyndman & Koehler, 2006; Pelánek, 2015). R2 measures the proportion of the variance in the outcome variable that can be explained by the model’s predictors.It quantifies how well the model captures the variability in the data. R2 is defined as follows:R2=1−∑i=1n(yi−yˆi)∑i=1n(yi−y¯),{R}^{2}=1-\frac{{\sum }_{i=1}^{n}({y}_{i}-{\hat{y}}_{i})}{{\sum }_{i=1}^{n}({y}_{i}-\bar{y})},where y¯=1n∑i=1nyi\bar{y}=\frac{1}{n}{\sum }_{i=1}^{n}{y}_{i}denotes the mean of the observed outcome. R2 ranges from 0 to 1.RMSE assesses the average magnitude of the squared errors between the predicted values (yˆi)({\hat{y}}_{i})and the observed values (yi). It quantifies the dispersion of errors in the predictions. A smaller RMSE indicates that the predicted values are closer to the observed values. RMSE is calculated as follows:RMSE=1n∑i=1n(yˆi−yi)2.\text{RMSE}=\sqrt{\frac{1}{n}\mathop{\sum }\limits_{i=1}^{n}{({\hat{y}}_{i}-{y}_{i})}^{2}}.For an excellent predictive ML model in regression problems, we generally aim for a high value of R2 and a small value of RMSE.3.3SVRSVR is a widely used ML model for regression tasks and has found applications in various fields, including time series analysis, financial prediction, and engineering analyses. SVR builds upon certain aspects of traditional linear regression but introduces some key differences in its approach. While linear regression aims to minimize the sum of squared errors between the predicted and actual values, SVR focuses on minimizing the margin violation of a defined ϵ-insensitive zone around the true values. For a more comprehensive understanding of how SVR works and its underlying techniques, a detailed explanation and graphical illustrations can be found in references such as Bishop (2006) and Smola and Schölkopf (2004). These references delve into the mathematical foundations and algorithms behind SVR, providing a deeper insight into its methodology and applications.Given a dataset with n observations (x1,y1), …, (xn,yn), xi∈ Rd, and yi∈ R, the aim is to find a regression hyperplanef(x)=xTβ+β0,f({\bf{x}})={{\bf{x}}}^{T}\beta +{\beta }_{0},which leads to minimal deviation from the observed values y and, at the same time, is as flat as possible. Flatness means that small β are sought. The error function is then given as follows:12∑i=1n(f(xi)−yi)2+12∥β∥2.\frac{1}{2}\mathop{\sum }\limits_{i=1}^{n}{(f({{\bf{x}}}_{i})-{y}_{i})}^{2}+\frac{1}{2}{\parallel {\boldsymbol{\beta }}\parallel }^{2}.Instead of minimizing the quadratic error function, in which each deviation from the real outcome is penalized, SVR allows sparse solutions with at most ϵ deviation (ϵ > 0). Thus, the quadratic error function is replaced by an ϵ-insensitive error function (Cortes & Vapnik, 1995), in which the absolute difference between the prediction and the real outcome less than ϵ is not considered an error. The ϵ-insensitive function, as illustrated in Figure 1, is given as follows:E(u)=E(f(x)−y)=0,if|f(xi)−yi|≤ϵ,|f(xi)−yi|−ϵ,otherwise.E(u)=E(f({\bf{x}})-y)=\left\{\begin{array}{cc}0,& {\rm{if}}\hspace{1em}|f({\bf{x}}i)-{y}_{i}|\le \epsilon ,\\ |f({\bf{x}}i)-{y}_{i}|-\epsilon ,& {\rm{otherwise}}.\end{array}\right.Figure 1ϵ-Insensitive error function (red) with the error increasing linearly vs quadratic error function (green).To deal with the points, for which |f(xi)−yi|>ϵ|f({{\bf{x}}}_{i})-{y}_{i}|\gt {\epsilon }, two slack variables ξ ≥ 0 and ξˆ≥0\hat{\xi }\ge 0are introduced. Thus, as illustrated in Figure 2, ξi> 0 corresponds to a point for which yi> f(xi) + ϵ and ξˆi>0{\hat{\xi }}_{i}\gt 0corresponds to a point for which yi< f(xi) − ϵ. Points lying inside the ϵ-tube correspond to f(xi) − ϵ ≤ yi≤ f(xi) + ϵ and have ξi=ξˆi=0{\xi }_{i}={\hat{\xi }}_{i}=0. Therefore, the optimization problem to solve in ϵ-SVR has the following form:Figure 2ϵ-SVR with the ϵ-insensitive tube and slack variables ξ and ξˆ\hat{\xi }.minC∑i=1n(ξi+ξˆi)+12‖B‖2subject to yi−xTB−β0≤ϵ+ξixTB+β0−yi≤ϵ+ξˆiξi,ξˆi≥0.\min C\mathop{\sum }\limits_{i=1}^{n}\left(\phantom{\rule[-0.75em]{}{0ex}},({\xi }_{i}\left+{\hat{\xi }}_{i})+\frac{1}{2}{\Vert {\rm B}\Vert }^{2}\right){\rm{subject\; to}}\hspace{5em}\left\{\begin{array}{c}{y}_{i}-{{\bf{x}}}^{T}{\rm B}-{\beta }_{0}\le \epsilon +{\xi }_{i}\\ {{\bf{x}}}^{T}{\rm B}+{\beta }_{0}-{y}_{i}\le \epsilon +{\hat{\xi }}_{i}\\ {\xi }_{i},{\hat{\xi }}_{i}\ge 0\end{array}\right..The constant C > 0 determines the trade-off between the flatness of the function f and the amount up to which deviations larger than ϵ are tolerated.Both constants C and ϵ represent the hyperparameters of the SVR, which have to be tuned by the user via Grid search for optimal performance.To solve the optimization problem, its dual formulation with the help of the Lagrange function is used (for more details, see Bishop, 2006).4Datasets and Patterns of Earnings and EducationThe data used for analysis in this study consist of recent national data collected between 2010 and 2018 from eight different countries located in diverse regions. The dataset includes a range of demographic characteristics such as age, gender, marital status, household size, number of children per household, location of the household, social and religious group, number of completed school years, and the individual’s relation to the household head. In addition, the dataset comprises income-related variables such as household income, consumption, household assets, and the source of income. Employment-related variables such as occupation, industry, and the number of hours worked are also included in the dataset. These rich and comprehensive data allow for a detailed analysis of the relationship between various socioeconomic factors and the returns to education across different countries and regions.The data utilized in this study are primarily sourced from the Cross-National Equivalent File (CNEF).Details on the CNEF project can be found in https://cnef.ehe.osu.edu/.The CNEF project is a collaborative initiative involving multiple individuals and institutions that collect panel survey data from nine different countries as of 2020. These countries include Australia, Canada, Germany, Great Britain, Japan, Korea, Russia, Switzerland, and the USA. The CNEF project harmonizes and equivalently defines variables across surveys, enabling researchers to compare social and economic outcomes consistently over time and across countries. For this study, data from five CNEF countries, namely Australia, Germany, Korea, Switzerland, and the USA, were accessible. In addition, data from two African countries, South Africa and Tanzania, as well as India, were also included to facilitate cross-regional comparisons. By leveraging the CNEF dataset and incorporating data from diverse regions, this study aims to provide insights into the returns to education on a global scale and explore potential variations across different countries and regions.The analysis in this study focuses on sub-samples comprising individuals who are employed either full-time or part-time and fall within the age range of 18–65 years. The following variables are included in the analysis: annual earnings, which encompass wages, salaries, bonuses, overtime, commissions, and earnings from self-employment, professional practice, or trade; gender; the number of hours worked in a year; and the number of years of education completed. To facilitate comparisons and ensure consistency, the earnings of each individual are converted into hourly earnings. Only individuals with hourly earnings higher than the national minimum hourly wage are included in the final sample. The labor market experience variable is calculated by subtracting the number of years of education and 6 from the individual’s age. However, the maximum value for labor market experience is capped at 100 years to avoid extreme values. Education level is determined based on the total number of years of education completed. Following the standard classification in the literature (e.g., Peet et al., 2015), primary education corresponds to 1–6 years of education, secondary education corresponds to 7–12 years of education, and tertiary education corresponds to 13 years of education and above. The description and summary statistics of each dataset are provided, including information on variables such as earnings, education level, and labor market experience (Table 1). Furthermore, Figure 3 illustrates the relationship between earnings and age, stratified by education level.Table 1Summary statistics of the different dataYearObs.MenMean age in yearsMean years of educationMean hourly earnings in local currencyMenWomenTotalMenWomenTotalMenWomenTotalAfricaSouth Africa20175,49851.1%36.537.937.211.311.911.654.841.548.3(Rands)20154,74553.0%36.437.837.110.911.611.240.734.637.820122,97852.4%37.538.538.010.911.911.438.834.036.520102,48953.9%38.238.738.410.611.611.135.929.733.0Tanzania20152,04865.0%35.233.234.58.08.08.01,7481,3061,595(Schilling)20133,05666.0%33.832.333.38.18.08.11,5989661,38420112,28068.5%34.732.934.17.97.97.91,5141,0121,356AmericaUSA20179,48248.9%40.840.540.613.714.213.930.723.927.2(Dollars)20158,72848.3%41.040.840.913.714.214.029.922.626.220138,72649.6%41.340.841.113.714.113.929.022.525.720118,59449.0%41.440.841.113.714.013.930.522.226.2AsiaIndia (Rupee)201249,32170.0%37.338.537.67.03.96.031.719.828.2South Korea20186,67561.2%45.544.345.113.713.213.518,16012,37915,919(Won)20176,79661.5%45.344.344.913.613.013.417,39811,52215,13420166,63862,1%45.244.144.813.513.013.316,58910,94314,44820156,53862.2%44.844.044.513.412.813.216,11610,55714,01620146,27962.4%44.944.144.613.312.613.015,28610,35013,43020136,48662.7%44.443.444.013.212.613.014,8809,67612,93920126,52263.4%44.243.143.813.112.412.914,2489,33312,44820116,39063.2%43.742.443.213.012.312.813,8158,70611,93620106,38363.6%43.641.942.912.912.312.712,8908,15111,165AustraliaAustralia20189,36150.5%39.439.739.513.013.513.341.437.439.4(Dollars)20179,26850.5%39.439.739.613.013.513.240.437.238.820169,29851.1%39.439.539.413.013.413.239.035.937.520159,21551.5%39.439.639.513.013.413.238.735.036.920149,13151.4%39.239.739.412.913.313.137.935.236.620139,12851.5%39.239.439.312.913.213.138.235.636.920129,17451.8%39.239.639.412.913.213.036.232.634.420117,19851.6%39.039.439.212.713.012.935.132.533.820107,01051.4%39.139.539.312.713.012.935.631.933.8EuropeGermany201713,62550.4%45.245.045.113.013.213.123.118.820.9(Euros)201611,80950.5%45.044.644.812.913.213.122.618.520.5201511,99051.3%45.144.644.912.913.213.122.218.220.2201412,67452.1%44.644.144.412.913.213.122.218.120.2201312,11051.9%44.944.244.613.013.313.222.518.020.3201212,80152.6%44.643.844.213.013.313.121.917.820.0201112,75153.9%44.343.443.913.013.313.121.817.719.9201011,95454.6%43.442.543.013.013.313.121.217.619.6Switzerland20184,48949.4%45.144.644.815.014.414.750.541.445.9(Swiss Francs)20174,46349.4%45.444.845.115.114.314.750.741.846.220164,75349.6%45.644.945.314.914.214.650.141.445.720155,28350.0%45.544.545.014.814.114.550.341.645.920145,83050.8%45.044.344.614.714.014.349.040.344.720133,51049.5%44.744.244.414.713.914.348.440.444.320123,64249.5%44.844.144.414.613.814.249.140.144.520113,70750.4%44.543.644.114.513.814.148.840.244.620103,71149.7%44.543.744.114.513.614.147.739.643.7Figure 3Average hourly earnings by age in the different countries.4.1National Income Dynamics Study (NIDS) DataThe NIDS is a significant household panel study in South Africa, recognized as the first nationally representative and longitudinal study of its kind (Brophy et al., 2018). Since its initiation in 2008, NIDS has been conducting regular interviews and tracking over 28,000 individuals in 7,300 households across the country. The dataset encompasses five waves, specifically from the years 2008, 2010, 2012, 2015, and 2017. For this study, the focus is on the four most recent waves, which align with the timeframe of 2010–2019. A balanced distribution between men and women is observed across the different datasets, with women generally being slightly older than men. The average years of schooling amount to approximately 11.5 years, and women tend to have higher educational attainment compared to men. In 2017, the average hourly earnings were recorded as 48.3 South African Rands, representing a 31% increase compared to 2010. However, a gender pay gap of 24.3% in favor of men was observed in that year. Moreover, there exists a positive relationship between earnings and age, indicating that hourly earnings tend to increase with age. Individuals with higher education (38.0% of the sample) earned above the average earnings, while those with lower education earned below the average.4.2National Panel Survey (NPS) DataThe Tanzania NPS is a nationally representative household panel survey designed to collect information on the living standards of the Tanzanian population (National Bureau of Statistics, 2016). The NPS has been conducted since 2008 and consists of four waves (2009, 2011, 2013, and 2015). The survey includes approximately 5,000 households and around 10,000 individuals living within those households, who are interviewed and tracked regularly over the years. For this analysis, data from three waves (2011, 2013, and 2015) are used.In the dataset, men are overrepresented, accounting for 65% of the data, while women comprise the remaining 35%. On average, both men and women have the same age and an equal number of years of education, specifically 8 years. Between 2011 and 2015, the mean hourly earnings increased from 1,356 to 1,595 Tanzanian Shillings, representing a growth of 15%. In 2015, the gender pay gap was approximately 25.3%, indicating a disparity in earnings favoring men. It is noteworthy that individuals with secondary education or lower attained similar average earnings, while those with higher education (7.3% of the sample) earned above the average.4.3Panel Study of Income Dynamics (PSID) DataThe PSID is a nationally representative longitudinal study focused on families and individuals in the USA. It was initiated in 1968 and has since continued to interview the same families and their descendants over nearly five decades, spanning from 1968 to 2017 (PSID, 2019). For this analysis, data covering the period from 2010 to 2017 are used, encompassing four waves (2011, 2013, 2015, and 2017). Each wave of data collection involves approximately 25,000 individuals from 9,000 households. In the PSID dataset, men and women are equally distributed and span various age groups. The average number of years of education remains relatively constant over the years, at approximately 14 years. Females tend to have slightly higher levels of education compared to males. In 2017, the average hourly earnings in the USA were 27.2 USD, representing a growth of about 4% compared to 2011. In addition, the average hourly earnings for men in 2017 were approximately 30.7 USD, which is on average 22% higher than women’s earnings. Examining the relationship between age and hourly earnings reveals a positive association, as hourly earnings tend to increase with age and stabilize around the age of 40. Individuals with higher education (comprising 65.6% of the sample) have the highest and above-average hourly earnings, while those with no education attain the lowest hourly earnings.4.4India Human Development Survey (IHDS) DataThe IHDS is a nationally representative survey that covers a wide range of topics. It initially collected data during the period of 2004–2005, and then individuals from the same households were re-interviewed in 2011–2012 (Desai & Vanneman, 2018). For this analysis, we focus specifically on the data from 2012. In the IHDS dataset, men constitute 70% of the sample, while women make up 30%. On average, both men and women have an age of 38 years. The average years of schooling amount to approximately 6 years, with men having higher levels of education compared to women. In 2012, the average hourly earning was around 28.2 Rupees. Men, on average, earned about 37.5% more than women. There is a positive relationship between earnings and age, indicating that earnings tend to increase with age. Individuals with higher education (10% of the sample) have the highest earnings, well above the average, while those with secondary education earn at or below the average level.4.5Korean Labor and Income Panel Study (KLIPS) DataThe KLIPS is a comprehensive labor-related panel survey that combines cross-sectional and time-series data in Korea (KLIPS, 2020). The survey is conducted annually, with a sample of approximately 5,000 urban households and 13,000 individuals. The survey has been conducted since 1998, with the latest wave (21) completed in 2018. For this article, the focus is on the data from the last decade (2010–2018). Each year’s data consist of approximately 60% men and 40% women, with both groups being roughly the same age, averaging 45 years. The average years of education show a slight increase over the years, from 12.7 in 2010 to 13.5 in 2018, with men generally having slightly higher levels of education than women. Over the years, the average hourly earnings in Korea have shown an increase of approximately 30%, rising from 11,165 Korean Won in 2010 to 15,919 Korean Won in 2018. However, there remains a gender pay gap of approximately 32%, favoring men. Individuals with higher education (accounting for 56.4% of the sample) have the highest earnings, indicating the importance of educational attainment in earning potential.4.6Household, Income and Labour Dynamics in Australia (HILDA) DataThe HILDA survey is a nationally representative longitudinal study that focuses on Australian households (Summerfield et al., 2019). The survey began in 2001 and currently consists of 18 waves of data, with the 18th release covering the period from 2001 to 2018 (waves 1–18). The dataset includes approximately 20,000 individuals from 9,000 households. For this analysis, the focus is on the data from the last decade (2010–2018). Men and women are equally distributed within the datasets, and their average age is approximately 39.5 years. The level of education is consistent for both men and women, amounting to about 13 years of education. Between 2010 and 2018, the average hourly earnings in Australia increased by 14.2%, rising from 33.8 USD to 39.4 Australian Dollars. In 2018, Australia’s gender pay gap was reported to be 9.7%. In addition, there is a positive relationship between age and hourly earnings, with earnings tending to increase with age and stabilizing around the age of 40. Individuals with higher education (accounting for 45.7% of the sample) are the highest earners, surpassing the average earnings. On the other hand, individuals with secondary or primary education levels tend to earn below the average.4.7Socio-Economic Panel (SOEP) DataThe German SOEP is a significant and long-running multidisciplinary household survey that includes approximately 15,000 households and 30,000 individuals (Markus, 2019). The survey has been conducted since 1984 and continues to interview many of the same families and individuals, with the latest data released in 2017. For this analysis, the focus is on data covering the period from 2010 to 2017. Men and women are equally represented within the datasets and have similar age distributions. The average schooling level is approximately 13 years of education. In 2017, the average hourly earning in Germany was around 20.9 Euros, reflecting an increase of 6.2% compared to 2010. On average, men earned about 18.6% more than women in 2017. There is a positive relationship between age and hourly earnings, indicating that earnings tend to increase with age. Individuals with higher education (representing 44.3% of the sample) have the highest earnings, surpassing the average level. Conversely, individuals with secondary education tend to earn below the average level.4.8Swiss Household Panel (SHP) DataThe SHP is a large-scale, nationally representative longitudinal study conducted in Switzerland since 1999, collecting data on households and individuals (Voorpostel et al., 2020). The SHP dataset currently consists of 20 waves, covering the period from 1999 to 2018. Each year, approximately 12,000 individuals from 5,000 households are surveyed. For this analysis, the focus is on data from the last decade (2010–2018). The data in the SHP are equally distributed between men and women, with similar age distributions. The average years of education show a slight increase over the years, rising from 14.1 in 2010 to 14.7 in 2018. Men generally have slightly higher levels of education compared to women. In 2018, the average hourly earning in Switzerland was approximately 45.9 Swiss Francs, reflecting a growth of 5% compared to 2010. On average, men earned about 18% more than women in terms of hourly earnings. There is a positive relationship between age and hourly earnings, indicating that earnings tend to increase with age. Individuals with higher education (comprising 59.6% of the sample) earned above the average and more than individuals with lower levels of education.4.9Patterns of Education and Earnings Across RegionsThe analysis of education levels across different regions reveals interesting patterns. European countries, Australia, South Korea, and the USA have the highest levels of education, as indicated in Table 2. This finding aligns with previous predictions made by the International Institute for Applied Systems Analysis (IIASA) (IIASA, 2015). On the other hand, India has the lowest level of education, with 30.7% of its population lacking formal education. This observation supports IIASA’s predictions as well. A surprising finding is that Tanzania has a very low percentage (0.2%) of its population with no education, in contrast to the approximately 20% predicted by the IIASA. This discrepancy may reflect improvements in educational access and policies in Tanzania. Countries such as the USA, South Korea, and Switzerland have high proportions of their populations attaining higher education, with rates exceeding 50%. In addition, there is a gender gap favoring women in terms of educational attainment and the population with higher education. This trend is observed in regions such as Africa, the USA, and Europe, which aligns with findings from the OECD (2019). However, it is important to note that this gender gap is generally not observed in Asian countries. These findings shed light on the variation in education levels and gender gaps across different regions, highlighting the importance of considering regional differences in educational attainment and gender disparities in higher education.Table 2Pattern of education and earning across regionsAfricaAmericaAsiaAustraliaEuropeSouth AfricaTanzaniaUSAIndiaSouth KoreaAustraliaGermanySwitzerland20172015201720122018201820172018Years of education11.68.013.96.013.513.313.114.7Population with no educ.1.7%0.2%0.2%30.7%0.1%0.1%0.0%0.0%Population in higher educ.38.0%7.3%65.6%10.0%56.4%45.7%44.3%59.6%Education gender gap (years)−0.6***0.0−0.5***3.1***0.5***−0.5***−0.2***0.6***Gender gap in higher educ.−8.4%***−0.3%−9.7%***3.4%***7.0%***−13.8%***−4.1%***6.0%***Gender pay gap24.3%***25.3%***22.1%***37.5%***31.8%***9.7%***18.6%***18.0%***Gender pay gap in higher educ.26.5%***2.5%27.8%***10.3%***24.6%***19.3%***24.8%***19.4%***Pay gap (higher vs sec. educ.) 157.2%***70.5%***37.9%***57.0%***27.8%***28.0%***30.8%***27.0%***Notes: In the gender gap, men are taken as reference. educ.: education; sec.: secondary.1The pay gap is calculated between people who attained higher education (reference) and those in secondary education. ***p-value < 0.01.The gender pay gap, which reflects the difference in mean gross hourly earnings between men and women expressed as a percentage of men’s earnings, indicates gender inequality in hourly pay. Across regions, the gender pay gap tends to favor men, with women earning, on average, 23% less than men. In Europe, the gender pay gap is around 18%, which is consistent with the estimate of 14.8% across Europe in 2018 as reported by Eurostat (2020). Australia has the lowest gender pay gap at 9.7%, while Asia has the highest gender pay gap at an average of 35%. These findings align with the observations of the International Labour Organization.https://www.ilo.org/global/about-the-ilo/multimedia/maps-and-charts/enhanced/WCMS_650829/ lang--en/index.htmEven among individuals with higher education, the gender pay gap persists, with men earning 19% more than women. In addition, when comparing the pay gap between individuals with higher education and those with secondary education, significant differences are observed. In Africa and India, the pay gap between these two groups is around 60%, while in other countries, it is around 30%. These findings are consistent with the research conducted by Livanos and Nunez (2012), which also revealed wage inequalities between individuals with higher education and those with secondary education.5Estimates on the Returns to EducationIn this section, we present the estimates of the returns to education using the Mincerian equation (as described in Section 3.1) and the ML algorithm SVR.Specifically, we employ the SVR algorithm and tune the hyperparameters using the R programming language (Core R, 2020).The returns to education are computed separately for each country and survey year and then averaged across the regions. We estimate the returns for each level of education (primary, secondary, and tertiary) based on the extended Mincerian equation. In addition, we calculate the returns by gender. To evaluate the models, we employ cross-validated (out-of-sample) RMSE and coefficient of determination (R2) metrics. We conduct 200 simulations, which consist of 20 repetitions of 10-fold cross-validation, to obtain robust estimates. We compare the results obtained from SVR with those obtained from OLS regression. By using these evaluation metrics and comparing the results with OLS, we can assess the predictive performance and goodness of fit of the SVR models. This approach allows us to obtain reliable and robust estimates of the returns to education across different countries and survey years.5.1Returns to EducationThe returns to an additional year of education for each country are summarized in Table 3. In the first row, the total private rate of returns to another year of schooling is reported to be 10.4% for the period from 2010 to 2018. This result surpasses other findings reported by Psacharopoulos and Patrinos (2018) (9%), Patrinos (2016) (9.7%), and Montenegro and Patrinos (2014) (10.1%). Globally, the returns to education are highest in Africa, with a rate of 17.8%, significantly above the average. Healthy returns are also observed in America, Asia, and Australia, with rates of 11.5, 10.7, and 9.8%, respectively. On the other hand, the lowest returns are found in Europe, at only 7.2%. These findings highlight the variations in the returns to education across different regions. Africa stands out with the highest returns, while Europe lags behind with the lowest returns. These results provide important insights into the economic value of education in different parts of the world.Table 3Average returns to education across regionsYearAverage years of educationAverage returns to educationAverage returns to primary educationAverage returns to secondary educationAverage returns to tertiary educationAverage male returns to educationAverage female returns to educationTotal2010–201812.80.104 (0.003)0.1030.0640.1200.1010.106Africa2010–20179.90.178 (0.003)0.1050.0710.2490.1640.209South Africa201711.60.179 (0.001)0.1990.0940.2010.1720.203201511.20.152 (0.001)0.1090.0740.1760.1410.181201211.40.144 (0.004)0.1180.0600.1790.1350.175201011.10.156 (0.001)0.0940.0830.1920.1440.193Tanzania20158.00.206 (0.004)0.0610.0520.3240.1920.23220138.10.200 (0.004)0.0750.0720.3420.1800.23420117.90.207 (0.003)0.0810.0610.3300.1870.246America2011–201713.90.115 (0.004)0.0550.1050.1220.116USA201713.90.111 (0.004)0.0460.1050.1200.112201514.00.121 (0.005)0.0630.1060.1290.123201313.90.113 (0.002)0.0620.1010.1210.113201113.90.113 (0.002)0.0470.1060.1180.115Asia2010–201812.40.107 (0.004)0.0830.0550.1320.0880.109India20126.00.082 (0.002)0.0830.0550.1550.0710.080South Korea201813.50.095 (0.001)0.1060.0800.097201713.40.103 (0.003)0.1200.0820.111201613.30.100 (0.002)0.1170.0790.108201513.20.111 (0.002)0.1340.0900.115201413.00.103 (0.002)0.1210.0810.111201313.00.115 (0.007)0.1350.0970.112201212.90.117 (0.006)0.1420.0960.118201112.80.119 (0.004)0.1400.1020.112201012.70.122 (0.003)0.1510.0980.128Australia2010–201813.10.098 (0.003)0.1010.1050.095Australia201813.30.103 (0.005)0.1070.1080.104201713.20.101 (0.002)0.1020.1080.098201613.20.096 (0.002)0.0940.1020.095201513.20.099 (0.002)0.0990.1060.097201413.10.103 (0.007)0.1120.1080.104201313.10.096 (0.002)0.1000.1010.096201213.00.093 (0.002)0.0960.0990.090201112.90.100 (0.002)0.1060.1080.094201012.90.091 (0.001)0.0920.1030.081Europe2010–201813.80.072 (0.002)0.0720.0760.065Germany201713.10.083 (0.002)0.0840.0900.078201613.10.082 (0.003)0.0830.0890.077201513.10.082 (0.003)0.0840.0900.075201413.10.081 (0.004)0.0830.0870.076201313.20.081 (0.003)0.0820.0900.072201213.10.079 (0.004)0.0790.0880.071201113.10.076 (0.002)0.0760.0860.066201013.10.084 (0.004)0.0840.0910.076Switzerland201814.70.061 (0.001)0.0580.0640.052201714.70.065 (0.001)0.0620.0670.055201614.60.064 (0.001)0.0640.0650.056201514.50.064 (0.001)0.0630.0640.057201414.30.064 (0.001)0.0620.0650.056201314.30.064 (0.001)0.0670.0600.059201214.20.066 (0.001)0.0680.0630.061201114.10.062 (0.001)0.0620.0610.054201014.10.065 (0.001)0.0650.0640.0565.1.1Returns by GenderThe analysis reveals that, on average, returns to education are higher for women than for men, which aligns with the existing literature (e.g., Patrinos, Psacharopoulos, & Tansel, 2019; Psacharopoulos & Patrinos, 2018). The total rate of returns to another year of education is estimated to be 10.1% for men and 10.6% for women. In Africa and Asia, the returns to education for women are statistically significantly higher than for men, with a p-value of less than 1%. This suggests that women benefit more from investing in education in these regions. However, in the USA, men’s returns to education are similar to those of women, and in Europe, men’s returns are even higher. This contrast aligns with the findings of Mendolicchio and Rhein (2014) when analyzing West European countries. Notably, Switzerland stands out with the lowest returns to education for men, at only 6%. On the other hand, Tanzania shows the highest returns to education for women, exceeding 20%.5.1.2Returns by Level of EducationReturns to different levels of education follow a specific pattern. Globally, higher education yields the highest returns, followed by primary education (only calculated in Africa and India), while secondary education has the lowest returns. These findings align with the existing literature (Montenegro & Patrinos, 2014; Psacharopoulos & Patrinos, 2018). On average, the returns to tertiary education amount to 12%, indicating that individuals who invest in higher education experience a higher increase in earnings compared to those with lower levels of education. Similarly, the returns to primary education are estimated to be 10%. In Africa, the returns to tertiary education are particularly high, reaching 24.9%, highlighting the significant economic benefits of pursuing higher education in this region. Returns to primary education in Africa amount to 10%, which still represents a positive return on investment. In contrast, Europe exhibits the lowest returns to higher education, at 7.2%, suggesting that the economic gains from pursuing tertiary education are relatively lower in this region compared to others.5.1.3Returns to Education Over TimeAs displayed in Figure 4, the returns to education tend to decrease slightly over time, with a rate of approximately 0.1% per year. This finding suggests that the economic benefits individuals derive from investing in education may gradually decline over time. This observation is consistent with previous research studies (Montenegro & Patrinos, 2014; Patrinos, 2016; Peet et al., 2015; Psacharopoulos & Patrinos, 2018). In contrast to the declining returns to education, the average number of years of education has been increasing over time at a rate of approximately 0.16 years per year or 1% per year. This finding implies that individuals are investing more in their education and acquiring higher levels of schooling as time progresses. The simultaneous increase in the average number of years of education and the decline in the returns to education may indicate a shift in the labor market’s demand for specific skills and qualifications. It suggests that while education remains important, other factors such as technological advancements, changing labor market dynamics, and skill requirements may influence the returns to education.Figure 4Returns to education and years of education over the last decade.5.1.4Returns to Education and Average Years of EducationThe regression analysis displayed in Figure 5 suggests that there is a negative relationship between the number of years of education and the returns to education. Specifically, the analysis indicates that for each additional year of education, the returns to education decrease by 1.4 percentage points. This finding supports the notion that education responds to price signals in the economy. As the level of education increases within a given economy, the supply of educated individuals also increases. This increased supply can lead to a decrease in the returns to education as there is a larger pool of individuals with similar levels of education competing for similar job opportunities. The result obtained in this analysis is consistent with the findings of Montenegro and Patrinos (2014), suggesting that the relationship between education and the returns to education is influenced by market dynamics and the supply of educated individuals.Figure 5Decrease of returns to education as education increases.5.2Comparison with OLS ReturnsTo compare the returns to education estimated by the SVR model and the OLS model, both models are applied to the same dataset using the same set of variables. The performance of each model is evaluated using the R2 and RMSE metrics. By conducting 200 cross-validated simulations, the models’ performance is assessed on multiple subsets of the data, ensuring robust and reliable estimates of the model’s performance. This approach helps account for the potential variation in the results that could occur due to different data partitions.According to the results presented in Table 4, the returns to education estimated using the SVR model are generally higher compared to those estimated using the OLS model. This suggests that the SVR model captures additional nonlinearities or patterns in the data that are not captured by the linear OLS model. In terms of predictive performance, the SVR models consistently outperform the OLS models. The SVR models achieve higher R2 values, indicating that they explain a larger proportion of the variance in the returns to education. In addition, the SVR models have lower RMSE values, indicating that their predictions are closer to the actual observed values. These results suggest that the SVR model, being an ML algorithm, is better able to capture the complex relationships and patterns in the data, resulting in more accurate and robust estimates of the returns to education. The SVR model’s ability to handle nonlinearity and capture more nuanced patterns gives it an advantage over the traditional OLS model in this context. It is important to note that the performance of the SVR model may vary depending on the specific dataset and the choice of hyperparameters. Nonetheless, the results presented in Table 4 suggest that the SVR model provides more reliable and accurate estimates of the returns to education compared to the OLS model in this particular analysis.Table 4Comparison of SVR and OLS estimated returns and cross-validated model performancesRate of returnR2RMSECountrySVROLSSVROLSSVROLSSouth Africa (2017)0.179 (0.001)0.172 (0.001)0.466 (0.033)0.344 (0.031)0.647 (0.022)0.679 (0.022)Tanzania (2015)0.206 (0.004)0.189 (0.002)0.355 (0.046)0.282 (0.036)0.871 (0.051)0.899 (0.052)USA (2017)0.111 (0.004)0.110 (0.001)0.252 (0.015)0.210 (0.019)0.544 (0.014)0.550 (0.017)India (2012)0.082 (0.002)0.081 (0.000)0.386 (0.012)0.272 (0.007)0.591 (0.007)0.604 (0.007)South Korea (2018)0.095 (0.001)0.084 (0.001)0.210 (0.014)0.175 (0.019)0.563 (0.016)0.568 (0.017)Australia (2018)0.103 (0.005)0.070 (0.001)0.222 (0.014)0.179 (0.014)0.465 (0.018)0.468 (0.017)Germany (2017)0.083 (0.002)0.076 (0.001)0.231 (0.012)0.192 (0.012)0.441 (0.012)0.442 (0.011)Switzerland (2018)0.061 (0.001)0.061 (0.001)0.280 (0.028)0.258 (0.030)0.366 (0.020)0.371 (0.019)6Conclusion and Policy ImplicationsThis article contributes to the existing literature by examining the private rates of returns to education across different regions worldwide using very recent data from 2010 to 2018. The analysis includes data from South Africa, Tanzania, the USA, India, South Korea, Australia, Germany, and Switzerland. In contrast to the traditional approach of using OLS, this study employs an ML technique called SVR to estimate the returns to education. The SVR approach allows for capturing nonlinear relationships and complex patterns in the data, potentially providing more accurate and robust estimates. The analysis is based on the Mincer earnings function, both the basic and extended versions, which consider factors such as years of education, labor market experience, and educational levels. The samples used in the analysis are carefully homogenized to ensure comparability across regions, considering variables such as age, level of education, employment status, and earnings. By applying the SVR approach and considering recent data, this study aims to provide updated and reliable estimates of the private rates of returns to education in different regions. The use of ML techniques and the careful selection of variables and samples enhance the understanding of the relationship between education and earnings in these specific contexts.The findings of this study highlight the positive and significant returns to education globally, with an average rate of 10.4% per year for another year of education. However, the returns to education vary across regions, ranging from 7% in Europe to 18% in Africa. Africa stands out with the highest returns, which are approximately twice as high as those in America, Asia, and Australia. One possible explanation for the higher returns to education in Africa could be the structure of the labor market. The labor market in Africa is often characterized by two segments: an informal sector and a formal sector. Individuals with lower levels of education or only primary education may face difficulties in accessing formal sector employment and may choose to remain unemployed or work in the informal sector. On the other hand, individuals with secondary or tertiary education tend to enter the formal sector, which offers higher returns to education. This segmentation of the labor market contributes to the higher returns observed in Africa. The discussion by Kuepié and Nordman (2016) likely provides further insights into the reasons behind the high returns to education in Africa. Understanding these factors is crucial for policymakers and stakeholders to design effective education and labor market policies that can enhance human capital development and improve economic outcomes in the region.The finding that tertiary education yields the highest returns globally, followed by primary and then secondary education, challenges previous research results that often indicated higher returns for primary education. This finding has important implications for policymakers and educational institutions. First, policymakers should consider increasing investment in tertiary education. The higher returns associated with tertiary education suggest that individuals who obtain higher education qualifications are likely to have better employment opportunities and higher earning potential. This can contribute to economic growth and development. Therefore, allocating educational budgets to support tertiary education programs and institutions can be a strategic decision to meet the demand for highly skilled professionals in various sectors. However, it is also crucial to prioritize investments in primary education and ensure universal access to quality primary education worldwide. Primary education serves as the foundation for individuals’ educational journey and plays a fundamental role in shaping their future opportunities. Neglecting primary education could lead to widening inequalities and hinder the development of a skilled workforce. Furthermore, secondary education should not be overlooked, as it serves as a crucial bridge between primary education and tertiary education. It provides essential knowledge and skills that prepare individuals for higher education or vocational training. Emphasizing the importance of secondary education can help ensure a smooth transition to higher levels of education. While the finding of higher returns to tertiary education highlights the importance of investing in higher education, policymakers should maintain a balanced approach. Prioritizing universal access to quality primary education and recognizing the significance of secondary education can contribute to creating a comprehensive and inclusive education system that caters to the diverse needs of individuals and promotes overall societal development.Returns to education are generally higher for women than for men is an important observation. It indicates that investing in education yields a greater payoff for women in terms of their earnings and economic outcomes. While it does not necessarily mean that women earn higher salaries than men, it underscores the significance of education for women and girls in improving their economic prospects. Ensuring that women have access to and complete at least primary and secondary education is crucial. Education equips women with valuable skills, knowledge, and opportunities, which can enhance their productivity and contribute to their overall well-being. By closing the gender gap in education enrollment and promoting educational opportunities for women, societies can empower women to participate fully in economic and social life. Moreover, investing in women’s higher education is an essential priority. Higher education has been shown to have significant returns, and enabling more women to pursue university education can lead to greater efficiency gains and reduce the gender pay gap. By providing women with access to higher education and supporting their educational aspirations, barriers to career advancement can be overcome, leading to more equitable economic outcomes and increased opportunities for women in various sectors. It is worth noting that education not only enhances women’s skills and productivity but also plays a role in reducing the gender pay gap by addressing factors such as discrimination, preferences, and societal circumstances that contribute to unequal earnings. By improving educational opportunities for women and promoting gender equality in education, societies can strive toward more inclusive and equitable economic systems. Investing in women’s education, from primary to higher education, is a critical step toward promoting gender equality, improving economic outcomes for women, and reducing the gender pay gap. Efforts should be directed at eliminating disparities in education enrollment, ensuring access to quality education for women and girls, and creating an enabling environment that supports their educational and professional advancement.Indeed, the use of ML techniques, such as SVR, in calculating returns to education, has the potential to revolutionize the analysis and understanding of the economic value of education. ML algorithms offer advantages such as improved accuracy, better handling of complex and non-linear relationships, and the ability to incorporate a wide range of variables and factors into the analysis. By leveraging ML algorithms, policymakers, institutions, and students can gain valuable insights into the returns to education and make more informed decisions. This includes designing relevant and effective curricula, providing tailored career guidance, and allocating resources more efficiently. ML techniques can also assist in evaluating the effectiveness of education financing programs, such as student loans, by analyzing the potential returns on investment for different educational paths. While this study focuses on private returns to education, which directly benefit individuals in terms of financial outcomes, it is important to acknowledge the broader social benefits of education as well. Investigating the social returns to education would provide insights into the positive externalities and wider societal impacts that education brings, including improved productivity, social mobility, and reduced inequalities. Further research into social returns to education can help policymakers and stakeholders better understand the long-term benefits of investing in education and guide policy decisions that promote inclusive and equitable educational opportunities. By considering both private and social returns, a comprehensive understanding of the full value and impact of education can be achieved, leading to more effective educational policies and interventions. Overall, the integration of ML techniques in the analysis of returns to education opens up new avenues for research and policy development, enabling more data-driven and evidence-based decision-making in the field of education.

Journal

Open Education Studiesde Gruyter

Published: Jan 1, 2023

Keywords: human capital; returns to education; machine learning; support vector regression; policy; I2; J31; J24; O15

There are no references for this article.