“Determinants of labor productivity in the USA”

This study aims to examine whether the labor productivity of the US population directly depends on public or private insurance coverage of people, employment level, life expectancies, spending on the public health system as a percentage of GDP, and spending on the public health system in natural terms. Empirical testing was carried out on the USА statistical data for 1987–2021 using a regression model with the fitting procedure backward stepwise selection (in Statgraphics software), and a multivariate adaptive regression spline MARS (using Salford Predictive Modeler software). The research hypothesis was confirmed for only two indicators: life expectancies and spending on the public health system in natural terms. Their impact on labor productivity appeared to be directly proportional. As an indicator, spending on the public health system has a greater impact on the change in productivity (0.0058%), whereas life expectancy has a lesser effect (0.0047%). The study showed that the MARS model provides more objective and accurate results compared to the regression model with the fitting procedure – backward stepwise selection. This conclusion is based on a comparison of real data modeled by both methods. The study proved that labor productivity in the USA grew yearly from 1987 to 2021 (the constant term in the MARS model’s regression equation is +0.48428). To calculate the specific values of labor productivity for each year, a model was developed depending on the optimal basic functions (automatically generated by the MARS model depending on the current values of life expectancies and spending on the public health system in natural terms).


INTRODUCTION
The labor productivity level depends on various indicators and factors.The combined and individual impact of these factors can positively or negatively affect the increase or decrease of the productivity level of a particular production process (Skrynnyk, 2023).Gardner et al. (2023) considered how international trade indicators affect labor productivity; the calculation was carried out using statistical methods.Atiyatna et al. (2021) applied the logistic regression method to search for factors influencing labor productivity and found that human capital, place of residence, gender, and working hours significantly affect labor productivity indicators.Such variables most often include the workforce's qualification level, the quality of the working environment, the availability of opportunities for further training, and access to modern technologies and the latest equipment (Sotnyk, 2012;Sheliemina, 2023).Important factors are also personnel management style, satisfaction with working conditions, employee motivation (Blašková et al., 2017;Kochmańska, 2019Kochmańska, , 2021)), and corporate culture (AL-Hashimi et al., 2023;Trippner, 2020).In addition, the quality of the working environment and the increase in labor productivity are affected by the environmental conditions of production, the regularity of the working day, week, month, labor protection, and health insurance.Kornieieva et al. (2022) established the influence of several innovative development parameters on increasing labor productivity.
Labor productivity indicators significantly impact the micro-and macro-efficiency of economic processes (Lyeonov et al., 2021a(Lyeonov et al., , 2021c)).These indicators demonstrate the volume of products produced per unit of labor productivity.The higher the labor productivity, the less working time is required to produce one production unit, reducing costs and increasing the producer's profit (Amato et al., 2022).In turn, business entities increase their investments, contributing to the economy's development (Dzwigol, 2019(Dzwigol, , 2021)).One of the most important ways of supporting labor productivity is investing in new technologies, which helps to increase production efficiency (Kornieieva et al., 2022).In addition, investing in the education and training of workers is also an essential factor in maintaining high labor productivity (Yu et al., 2023).It helps improve the state of the economy and supports its growth.
On the other hand, enterprises with high indicators of labor productivity can use the accumulated economic benefit to invest in the latest investigations and improve the existing production.The introduction of innovative approaches to the working process is a significant factor that stimulates and accelerates economic growth in a separate business and the industry as a whole and positively impacts the country's economy.In particular, in the USA, despite the obstacles related to the consequences of COVID in 2020 and 2021, in May 2022 the number of occupied jobs increased compared to last year (in the field of professional and business services, the net number of new employees increased by 64 thousand people, in the area of public services -for 56 thousand jobs, in the area of health care -for 52 thousand people, recreation and hospitality -for 48 thousand and construction -for 25 thousand people).

LITERATURE REVIEW
Studying labor productivity is vital for every country, as its high level contributes to both the growth of the country's competitiveness, the development of innovative technologies, and the improvement of the population's living standards (higher incomes, better living conditions, and access to social services).

The impact of insurance on labor productivity
Many scientists consider the impact of health insurance on economic processes.For example, Ho (2015) used the prism of the impact of health insurance schemes on financing the needs of people in different countries.Spending on the public health system as a percentage of GDP can positively and negatively impact labor productivity and will depend on the specific nature of spending on the public health system (Kuzior et al., 2022a).On the one hand, increased spending on the public health system and access to new medical technologies improve health, increasing labor productivity (Horváth & Gyenge, 2023).On the other hand, increased spending on the public health system may reduce the amount of money available for other investments, such as education and infrastructure, potentially leading to lower productivity.Moreover, an increase in spending on the public health system leads to an increase in taxes, which in turn decreases consumer spending and drags on economic growth (Letunovska et al., 2023).Branning and Vater (2016) described the US healthcare system and listed the main problems of the participating parties in providing healthcare services.Thus, various factors influence labor productivity, including health insurance, employment rates, life expectancy, and public health care expenditures.Scholars are also actively exploring the relationship between life expectancy, education and investment in skills, labor productivity, and the impact of healthcare spending on labor productivity and overall national development.
The study aims to check the hypothesis about the direct proportional impact of five indicators on the labor productivity level of the US population using the regression model with a fitting procedure backward stepwise selection and multivariate adaptive regression splines.These influencing indicators are public or private insurance coverage of people, employment level, life expectancies, spending on the public health system as a percentage of GDP, and spending on the public health system in natural terms.

METHODOLOGY
Using regression model with backward stepwise selection allows removing the indicators that have the least impact on explaining the variance of the dependent variable (labor productivity) and, simultaneously, eliminating multicollinearity in these indicators.Therefore, this procedure leaves the most influential indicators for further analysis.Using statistical criteria (p-value, F-test, T-statistics, coefficient of determination) proves the statistical significance of the regression model of labor productivity dependence on the selected indicators.
At the same time, when using a regression model with the fitting procedure and backward stepwise selection, it is assumed that the coefficient characterizing the impact of each predictor variable on the resulting indicator is a constant value.
However, more complex and nonlinear relationships in the predictor variables may not be accounted for in a regression analysis using a stepwise procedure.Furthermore, it is impossible to determine in advance the nature of the relationships in the predictor variables, which can be both linear and nonlinear.Given this, it is advisable to go beyond the regression analysis using a stepwise procedure and conduct a more in-depth study.
The study uses multivariate adaptive regression splines (MARS), as they allow for more complex and flexible forms that better reflect the true underlying relationships in the data.
Thus, the first part of this study presents calculations using a regression model with a fitting procedure, backward stepwise selection, and the second one uses a multivariate adaptive regression spline (MARS).The results are then compared.
The preparatory and mandatory stage of classical multiple regression and MARS models is to substantiate the statistical quality of the input feature space and data cleaning.Since the input sample represents indicators measured in different units of measurement, to develop adequate regression and MARS models, it is necessary to perform a data standardization procedure.In contrast, the quality of the performed calculations will depend on the quality of the selected normalization func-tion.Such normalization methods as min-max normalization, Z-score normalization, decimal scaling, normalization by feature scaling, vector normalization, unit vector normalization, mean normalization, median normalization, max-abs normalization, and unit interval normalization are effective and widely used.Standardization, which considers indicators of the average trend of data changes resistant to emissions, is proposed to be carried out by a modified formula of logistic normalization, which is used in many algorithms of data analysis and machine learning: where K -normalized value of the input variables, Further, to confirm the statistical quality of the model, a descriptive analysis was carried out in the Statgraphics software, allowing to determine the basis numerical features, identify patterns in them, and generalize.
The development of a regression model describing the impact of public or private insurance indicators, employment rate, average life expectancy, spending on the public health system as a percentage of GDP, and national expenditures on medical goods and services on labor productivity is proposed to be carried out using the backward stepwise selection method.It is done in the applied statistical package Statgraphics, which immediately allows for the filtering out of indicators containing multicollinearity.Multiple regression backward selection (MRBS) is an iterative algorithm used to select a subset of variables from a set of predictor variables in a multivariate regression model.The backward stepwise selection (BSS) method is implemented using several steps.In the first step, a criterion for estimating model coefficients is determined, and a set of predictors and a dependent target variable are selected.Then, a correlation analysis is performed to determine the link density between the predictors.In the next step, predictors with the highest correlation density with the target variable are selected.The last step is to develop a regression based on the selected predictors to check its statistical quality and significance.
The statistical quality of the obtained model was tested using the Fisher and Student tests, p-value (significance level), coefficient of determination R 2 , and MAE (mean absolute error).
The next research stage is to develop multivariate adaptive regression splines on the most significant indicators that affect the outcome variable (level of labor productivity) obtained due to the complex screening procedure.MARS models the nonlinear, non-parametric links between the independent variables (inputs) and the outcome variable (output) by fitting a set of piecewise linear functions.The algorithm divides the input space into regions associated with the output variable and then fits a linear model to each region.MARS allows for capturing complex nonlinear links between input and output variables and accurately fitting the data.Therefore, MARS is an extension of the well-known linear regression algorithm.It uses a machine learning algorithm to analyze the linear or nonlinear interaction between dependent and independent variables.Spline functions, i.e., piecewise linear basis functions that connect at breakpoints where the function can have a different slope, are used to define nonlinearity in this technology.
The algorithm uses a method based on finding optimal breakpoints for each variable and optimal combinations of variables, which can additionally reflect nonlinear links between variables.Using basis functions and performing data fitting allows MARS to create flexible models that accurately describe the latent links between variables.
In addition, MARS automatically selects variables and reduces model size, avoiding overtraining.Basis functions are chosen to fit the data best and then combined using a linear combination to create a model that captures the variation in the data.This technique is more accurate than classical linear regression.It is used to accurately predict outcomes for given inputs and understand interactions between variables (Friedman, 1991).
MARS builds a model that is a weighted sum of basis functions ( ) The basis function (hinge function) is a loss function used in machine learning algorithms and support vector machines (SVMs) for training classifiers.The hinge loss is used for "maximum margin" classification.The hinge loss measures the margin between the classifier prediction and the actual feature label and penalizes any cases where the margin is small.In MARS, the hinge function is defined as max Therefore, the MARS model automatically selects the form of the joint function, variables, and their values, and also allows to determine the links between two or more variables using the product of joint functions.
The optimal MARS model is chosen as follows: 1) in the first step, MARS builds an excessively large model by adding "basis functions" -the formal mechanism by which the intervals of the variables are defined.Basis functions are either transformations of one variable or conditions of interaction of several variables.The model becomes more flexible and complex when basis functions are added.This process continues until the user-specified maximum number of basis functions is reached; 2) at the second stage, basis functions are removed in the order of decreasing their contribution to the model until the optimal model is found according to the given test criterion.By allowing any arbitrary shape for features and their interactions, MARS can reliably track very complex data structures often hidden in high-dimensional data.
Thus, backward transition applies a generalized overloading check of basis functions based on the generalized cross-validation (GCV) criterion.
The following logic is used to compare the performance of different subsets of models to select the best one: lower GCV values mean better results.Thus, GCV serves as a regularization method considering the contrast between simplicity and model performance (Craven & Wahba, 1978): where RSS -residual sum of squares (calculated by taking the sum of squares of the differences between the observed values of the response variable and the predicted values from the model); N -number of observations.Effective Number of Parameters (ENP) is found as follows (Friedman, 1991): where NMT -Number of MARS terms (number of MARS members); penalty (fine) is from 2 to 4; ( ) -is the number of hinge function nodes, penalizing the addition of nodes.
Thus, the generalized cross-validation criterion (3) adjusts the training RSS to account for the flexibility of the model.So, introducing a flexibility penalty is necessary because flexible models will form a specific realization of the noise in the data rather than just the systematic structure of the data (Bottegal & Pillonetto, 2018).

RESULTS AND DISCUSSION
Since each variable has its calculation type, bringing the input indicators to a comparable form for modeling is necessary.Applying the normalization procedure (formula ( 1)) is required.The results of the data normalization are shown in Table A1, Appendix A.
Before developing a MARS model that determines the impact of predictor variables on the resulting variable, the labor productivity, one of the key factors characterizing the economic development, the conditions and dignity of labor, country's sustainable development, it is reasonable to carry out strict screening of variables that have multicollinearity and consider the most relevant ones.Using the backward stepwise selection in the Statgraphics statistical the K2 indicator (level of labor productivity) was chosen as the result variable, and the independent variables are K1, K3, K4, K5, K6.
The obtained regression is given by the formula (5): where K2 -level of labor productivity, K4 -life expectancies, K6 -spending on the public health system (in natural terms).
Thus, due to strict exclusion, two indicators are most relevant in terms of impact on labor productivity -average life expectancy and national expenditure on medical goods and services.
The model is statistically significant by Fisher's and Student's tests, R-squared value (99.6662) and P-Value (0.000) (Tables 1 and 2).Table 1 shows the obtained coefficients of the regression model (5) and the verification of its statistical significance using the Student's test, the standard error, and the P-value.The tabular value of the Student's criterion for 35 variables is 2.030 at a significance level of P-Value of 5%, the obtained values for K2 = -5.0827,K4 = 10.0419,K6 = 8.8626.It means that the absolute value of the test statistics exceeds the critical value at the 5% significance level, the null hypothesis H0 (the regression parameter is not statistically significant) is rejected, and the corresponding regression parameter is considered statistically significant.The coefficient of determination (R-squared) at 99.6662% indicates a high level of links between the studied variables.This value shows that almost the complete change in dependence between independent variables can be described using linear regression.The adjusted coefficient of R-squared determination (adjusted for d.f.) = 99.6454%reflects the share of variance of the dependent variable that the model explains.Estimate of the standard deviation of the sample distribution Standard Error of Est.= 0.0202366 and Mean absolute error MAE = 0.015321 also confirm the accuracy of sample statistics and static significance of model (5).
The value of the free term in the regression equation ( 5) is negative, i.e., if the state does not invest in the health sector and life expectancy remains unchanged (at its average level in each American state), the level of labor productivity will be decreased by 0.06%, i.e., by 0.000645177.
Average life expectancy (K4) and spending on the public health system in natural terms (K6) exert a positive direct proportional influence on increasing labor productivity.Thus, with an increase in the average life expectancy by one year, under condition of an entire amount of expenses for medical goods and services, the level of labor productivity will increase by 0.0045% (by the amount of 0.458938).If, at the state level, the amount of spending on improved, modern, powerful medical technologies that improve the quality of medical services and medical care is increased by 1 million US dollars, it will lead to an increase in the level of labor productivity by 0.0057% (by the amount of 0.579035).It is logical because improving the population's health and well-being contributes to increasing the overall productivity and efficiency of the workforce (Elamir, 2020).
Therefore, for a comprehensive assessment of the impact made by an average life expectancy (K4) and spending on the public health system in natural terms (K6) on labor productivity, it is proposed to develop a MARS model based on these indicators.The development of the MARS model was carried out in Salford Predictive Modeler 8 software using the following settings: 1) the initial stage of settings requires the definition of the target variable and predictor variables and the selection of the target type -re-gression, as well as the selection of the analysis engine for MARS regression splines.
2) the final step of the setup is choosing to search for basis functions and setting a limit of 40 basis functions, taking into account the relationship between the predictor variables.
The MARS modeling results are shown in Figure 1.During the construction process, 11 basis functions were created, among which four basis functions were automatically selected as the optimal number based on GCV evaluation (Fig ure 1).
Detailed statistical information on the basis functions is presented in Table 3.
The optimal multivariate adaptive regression spline model using four basis functions is given by formula ( 6) and in Table 4: K BF  The basis functions of MARS ( 6) are as follows: ( ) ( ) ( ) ( ) ( ) Statistical characteristics of the optimal MARS model ( 6) are presented in Table 5.Thus, based on the results of the conducted research, a comparative table was created containing input statistical data of the labor productivity of the US population (OECD, n.d.).Besides, regression model values obtained using the method of strict screening of insignificant variables (5), which describes the dependence of labor productivity (K2) on life expectancy (K4) and state spending on medical goods and services (K6) were calculated.Based on four basis functions, the MARS model ( 6) values were found.The results are presented in Table 6.Thus, comparing the obtained results (Table 6), the MARS model provides more accurate results.The conducted machine learning using intelligent data analysis based on a multivariate adaptive regression spline using the optimal number of four basis functions made it possible to determine the

CONCLUSION
By the aim of the research, which is devoted to checking the hypothesis about the direct influence of five indicators on the labor productivity level of the US population, only two were satisfied.Thus, it was proved that life expectancy and spending on the public health system in natural terms directly influenced the labor productivity level of the US population.The developed regression model was statistically significant according to the verification criteria (by mean absolute error, coefficient of determination, p-value, Fisher's and Student's tests).Expenditures on the health care system have a greater impact on changes in labor productivity levels than indicator life expectancy.The difference between them is 1%, which is quite logical, as the relationship between public healthcare spending and life expectancy is complex and multifaceted.For example, the population with better access to medical services is more likely to receive timely and effective treatment, thereby contributing to an increase in life expectancy.It should be noted that adequate government spending on healthcare helps control the spread of infectious diseases, treat chronic conditions in a timely manner, and provide preventive care, thereby increasing average life expectancy.
Comparing the results obtained using the regression model with backward stepwise selection and the multivariate regression spline indicates a more accurate value of the developed forecast using the MARS model.In particular, in 2021, the labor productivity level calculated using a regression model with backward stepwise selection was 1.0533, whereas it was 0.9517 calculated using the MARS model.After normalizing the initial statistical data, the productivity level for this year was determined to be 0.9526.It is evident that the productivity level modeled by the MARS model closely aligns with the actual data, confirming that the MARS model provides more reliable and accurate data than the regression model; in 2020, the labor productivity level calculated using a regression model with backward stepwise selection was 1.0469, whereas calculated using MARS-model -0.9340.After normalizing the initial statistical data, it was determined to be 0.9290.
The study showed that labor productivity in the United States grew yearly during 1987-2021, (the constant term in the MARS model's regression equation is +0.48428).To calculate the specific values of labor productivity for each year, a model was developed depending on the optimal basic functions (automatically generated by the MARS model depending on the current values of life expectancies and spending on the public health system in natural terms).
Based on this technique, it is possible to describe the influence of various variables in numerical values on the investigated resulting change with high accuracy.Besides, a feature of successfully applying the proposed methodology is the ability to identify and study complex dependencies between input and output data based on the training sample.
Papanicolas et al. (2018) analyzed the impact of spending on the healthcare system in the US and other high-income countries.Hartman et al. (2021) examined the impacts of increased public health spending in the US.V. Raghupathi and W. Raghupathi (2020) studied the relationship between US healthcare spending and macroeconomic indicators.Tran et al. (2017) examined this spending through the lens of its redistribution based on the cost-effectiveness of healthcare programs.Awojobi et al. (2023) investigated the socio-economic consequences of the quarantine caused by COVID-19 and found a subtle difference in the consequences for men and women.Ober and Karwot (2023) examined the impact of the pandemic on the functioning of enterprises, society, and government.Covid-19 has also changed the work environment toward digitalization (Kuzior et al., 2022b).
i x -the input value of the indicator 1, , 35, i =  md -the median of the input indicator, mx - the maximum value of the input indicator.
The information base of the study contained statistical indicators for the United States of America from 1987 to 2021.They describe public or private insuring of people (USA Facts, n.d.a), level of labor productivity (OECD, n.d.), level of population employment (U. S. Bureau of Labor Statistics, n.d.), average life expectancy(Macrotrends, n.d.), spending on the public health system as a percentage of GDP (USA Facts, n.d.b), spending on the public health system in natural terms (USA Facts, n.d.c).According toKoibichuk et al. (2023), the values of these indicators (K1 -public or private insurance coverage of people (USA Facts, n.d.a), K2 -productivity level, calculated as real GDP per hour worked.(OECD, n.d.), K3 -employment level (U. S. Bureau of Labor Statistics, n.d.), K4 -life expectancies (Years) (Macrotrends, n.d.), K5 -spending on the public health system as a percentage of GDP (USA Facts, n.d.b), K6 -spending on the public health system in natural terms ($ USA) (USA Facts, n.d.c) have already been considered to describe medical insurance as a stimulating factor that increases labor efficiency.In this study, it is recommended to use them for an in-depth analysis of the influence of other indicators on the level of labor productivity.Koibichuk et al. (2023) present the study's input statistical data.

Table 2 .
Analysis of variance

Table 5 .
Statistical characteristics of the optimal MARS model

Table 6 .
(6)parison table of real values of labor productivity and predicted values by regression(5)and MARS(6)

Table 4 .
Basis functions