“Evaluating development prospects of smart cities: Cluster analysis of Kazakhstan’s regions”

This study aims to study Kazakhstan’s regions and identify places with the best potential for developing smart cities based on cluster analysis. To analyze the differentiation by the level of development, 17 regions of Kazakhstan are grouped according to 2020 data from the statistical bulletin of the National Bureau of Statistics of the Republic of Kazakhstan. The formation of groups of regions with different values of indicators was carried out based on agglomerative clustering using the single linkage, complete linkage, and Ward’s clustering methods. In agglomerative clustering, the algorithm groups regions based on observations into clusters, and indicators determine each area’s innovative development level. The instrument to build clustering is the “RStudio” software package. As a result, regions with their essential characteristics were identified, and an assessment of their prospects was obtained with the most significant potential for developing and managing “smart cities” – Atyrau region, Almaty city, and Astana city. The remaining clusters include regions where favorable conditions for the development of innovations have not yet been formed, which require more resources and efforts to build “smart cities.” Therefore, they should not be the first to implement this concept. They need a more balanced, integrated approach, ideally supported by experience in implementing the idea in more promising regions. In a sense, clustering also allowed for identifying potential (or even existing) innovation clusters in regions of Kazakhstan. The study results can be used in developing government programs to form smart cities and further study the potential of smart cities.


INTRODUCTION
Nowadays, the critical task is to create conditions for the development of cities of all types, ensuring the growth of their competitiveness and sustainable development of the country's territories.Moreover, the decisive role here is played by interactions and mutual assistance based on the most effective use of limited resources, primarily intellectual.This actualizes the task of creating conditions for the development of modern cities as centers that ensure the development of information parameters of urban development through the formation of smart cities.
Therefore, there is a process of rethinking city management worldwide, and an increasing number of cities are moving to the concept of innovative city development.Developing countries that seek to increase their intellectual potential and ensure sustainable growth of territories are no exception.Smart cities allow to improve the life quality of citizens, reduce socio-economic inequalities, and make city management more effective.The COVID-19 pandemic has reinforced these changes and increased their relevance.
The main goal of a smart city is to improve the resident's quality of life with the help of innovative technologies.Such technologies make urban space more effective in meeting the population's needs and implementing the most modern urban upgrading forms to make citizens' lives comfortable and safe.Developing smart cities requires specific conditions that arise unevenly in different regions of the country.The further development of smart cities is highly dependent on innovation activity because they naturally depend on the introduction of new technological solutions to existing ones.Such "cores" have unique characteristics that can be distinguished using clustering.
Kazakhstan, as a developing country with many regions, is just beginning its transition to the digitalization of the city, introducing local solutions.However, some cities and regions of Kazakhstan already have an excellent infrastructure to implement the concept of a "smart" city.Thus, Kazakhstan needs to keep pace with global changes, such as the worldwide informatization of society, urbanization, and the increasing importance of cities.
Cluster analysis has clear advantages over other types of research.This approach makes it possible to identify promising areas for further sustainable development accurately.It should also be noted that it allows determining the most successful place for the location of a smart city, the type of object required by the name here, and the specific area of the object within the cluster, which is the most economically advantageous.So, in this study, cluster analysis will be used, which should become the basis for determining the level of development of regions and the grouping of cities.It gives an objective picture and predicts the direction of development of smart cities.This is especially important for the zoning of urban areas of Kazakhstan, which will allow rational use of their potential.
Many developing countries, Kazakhstan among them, are experiencing rapid urbanization.The proportion of urban residents and their demands for quality of life are becoming more sophisticated.The development of smart cities is one of the modern ways to meet these demands.Exploring the potential for developing and managing smart cities in Kazakhstan has begun very recently.Therefore, this research topic is relevant and requires a more detailed study.

LITERATURE REVIEW
All studies on the topic agree that smart cities improve the efficiency of placement and use of resources and enhance the population's quality of life by improving security, boosting processes, and providing new services.Existing research on smart cities contains a wide variety of topics.All of them, however, can be divided into several areas.Some studies are solving issues related to the application of specific technologies in smart cities.They are completely out of the scope of this study.Others develop an understanding of the term "smart city" and classify existing cities accordingly.These will be covered briefly.The last group, which is of the most significant interest within the framework of this paper, attempts to identify and use factors for the implementation, development, and management of smart cities.
The history of smart cities began in 1974 when the first Big Data project was launched in Los Angeles.Further, Gibson et al. (1992) represented information, ideas, and infrastructure that accelerate the creation of smart cities, fast systems, and global networks.Mahizhnan (1999) investigated the case of Singapore being a "smart city."Finally, Hall et al. (2000) proposed a "vision" of a smart city, one of the first to suggest its more or less integrated description.However, there is no consent in academia about the nature and features of smart cities, and one has to stick to one preferred among others or is forced to reside in eclecticism.
Later research provides more profound insights into the formation and development of smart cities. Hollands (2008) explored smart cities by identifying essential issues: assumptions about the smart city as a celebratory tag, that the tag is more of a marketing hype than a practical driver of infrastructure change, and the term itself carrying an uncritical, developmental connotation.Leydesdorff and Deakin (2011) emphasized that smart cities are a process of cultur-al reconstruction underpinned by policy, academic leadership, and corporate strategy in their leadership.It was also noted that over the past two decades, megacities worldwide had been involved in initiatives to improve urban infrastructure and services aimed at improving the environment, and social and economic conditions, increasing the attractiveness and competitiveness of cities (Jong et al., 2015).
Despite the discussion on various concepts and theories, there is no consensus on a clear definition of the term "smart city" (Hortz, 2016).In general, smart cities extensively use information and communication technologies to help large cities build their competitive advantage (Angelidou, 2014).In the case of Doha, smart city practices are more of an interaction between urban technologies and knowledge economy activities (Conventz et al., 2015).At the same time, Pancholi et al. (2015) noted the practice of integrating Brisbane's smart technologies into good urban and spatial design practices.
A similar methodological approach to assessing smart cities using cluster analysis has been tested by Cantuarias-Villessuzanne et al.Muntean (2019) used cluster analysis to predict and solve the parking occupancy problem in Birmingham's smart city.This approach first groups the dataset to get the appropriate periods throughout the day and then predicts the data in those clusters.Safitri et al. (2020) conducted a cluster analysis of the smart city regions of Banda Aceh, Indonesia.Within the framework of this study, the similarities of the characteristics of each object in the regions of Aceh were determined.In addition, Srinivas and Hosahalli (2021) researched the MapReduce distributed computing environment based on clustering using K-means evolutionary computing for a smart city based on the Internet of Things.The use of cluster analysis in this paper evokes associations with innovative clusters since the concept of a smart city relies heavily on introducing innovations.
Brakman and van Marrewijk (2013) believe that the effect of clusters in cities lies in their impact on cities' national and regional economies since their effect favorably affects the existing mechanism in agglomerations.Van Klink and de Langen (2001) believe that a cluster of cities must go through a mandatory development life cycle of creating innovative products.The cluster of cities provides for the emergence in the form of small companies that are developing innovative products in smart cities.In addition, the cluster is growing with the arrival of new companies and highly qualified specialists who carry out innovative "smart" projects.The specialization of a cluster increases until it reaches maturity before a specific period.If companies in a cluster do not have time to adapt to changing economic conditions, then decline sets in.
According to Köcker and Müller (2015), the main objectives of the cluster policy are increasing labor productivity, the speed of development of innovative products, and the competitiveness of small and medium-sized enterprises in the region.Noiva et al. (2016) analyzed a dataset of 142 cities, including annual per capita water use.With these urban water supply and consumption measures, they conducted a hierarchical cluster analysis to identify relative similarities and distances between 142 cases.Kubina et al. (2021) compared standards, implementation, and cluster models for smart cities in North America and Europe using cluster analysis.Finally, Héraud and Muller (2022) studied the interaction between smart cities and innovation clusters, as well as people involved in technology clusters, research centers, factory labs, living labs, etc.
Nazarova and Demianenko (2018) conducted a cluster analysis of the regions of Ukraine.According to the cluster analysis results, the regions are grouped into six clusters.The dynamics of the quantitative distribution of the regions of Ukraine according to the selected clusters were also analyzed.The study identified cores with a constant composition of regions and presented the characteristics of each cluster.
The analysis of Kazakhstan's "smart cities" and cluster analysis of the regions and cities were carried out also in Kazakh studies.Urdabayev and Turgel (2021) evaluated the applicability of the "Smart Aqkol" case in the development of "smart cities" in other cities in Kazakhstan.In addition, there is the "WeAlmaty" project implemented by the British Council, the city government of Almaty, JSC "City Development Center of Almaty," and Kazakhstan-British Technical University.
Aralbaeva and Berikbolova (2021) considered a cluster analysis of Kazakhstan's regions regarding the level of innovative development.In their opinion, one of the effective methods to manage the possibilities of sustainable development of cities is the cluster policy of Kazakhstan, with the help of which interconnected forms of suppliers and universities are formed (Kulanov et al., 2020).Cluster policy is an effective form of relations in the internal environment of the regions of the Republic of Kazakhstan, which has recently been a dominant component.The main task of cluster policy is to create favorable conditions for the development of the regional economy, depending on the category of the cluster and the strategy of the regions (Aralbaeva & Berikbolova, 2021).2020) proposed methodological tools based on a systematic approach using economic and statistical methods and the 5Ms concept.Furthermore, Mussabalina and Kireyeva (2019) believe that the topic of cluster development in Kazakhstan deserves special attention.Therefore, attempts have been made to maintain and develop a cluster policy aimed at the socio-economic development of Kazakhstan and its regions.

Satpayeva et al. (
Based on the literature review, it can be concluded that there are studies devoted to the problems of the formation and management of the development of smart cities.Furthermore, different studies are dedicated to the use of cluster analysis to determine the level of development of regions and the grouping of cities.Thus, some studies are related to analyzing "smart cities" and existing clusters in Kazakhstan, as well as articles in which cluster analysis is used as a research method on similar topics.Nevertheless, very few studies use cluster analysis to study the regional environment to form a smart city.Moreover, until now, no research used cluster analysis to zone the potential of the regions of Kazakhstan for developing "smart cities." The cluster policy for developing and managing smart city clusters makes it possible to increase innovation activity by strengthening small and medi-um-sized enterprises, focusing on a common strategic goal and innovation activity.
Therefore, this paper aims to study Kazakhstan's regions and identify places with the best potential for developing "smart cities" based on cluster analysis.
In agglomerative clustering, the algorithm groups up observations in clusters, starting with as many clusters as there are observations and merging them one by one until only one cluster contains all observations.The closer the clusters are to each other, the sooner they merge.Thus, the general algorithm for applying the methods looks as follows.First, suitable quantitative indicators are selected, then the "distances" between all observations are calculated.Next, the observations are combined into clusters according to a specific criterion.It is the researcher's job to identify a set of clusters that can be meaningfully and usefully interpreted.The method does not imply a clear procedure for choosing the number of clusters.Ward (1963) first described this method.The function D (X, Y) calculating the between-clusters distance measures the increase in the "error sum of squares" (ESS) after combining two clusters.
where ESS(.) is of the form: , 1 where N X is the number of elements in the cluster, x i and x j are the elements of the cluster.The goal of this method is to choose such a sequence of clustering steps that minimizes D(X, Y) (the increase in ESS at each step).
The single and the complete linkage methods work differently.They still start from as many clusters as there are observations, ending with only one, but the merging criteria differ from Ward's method.Both methods use Euclidean distance: where a i and b i are the chosen variables of the respective observations A and B.
At each step, two clusters with the smallest distance are merged.The methods differ in how the new distance is calculated.For the single linkage, the distance is calculated between the two closest points of two clusters (i.e., the closest neighbors), for the complete linkage, the distance is calculated between the two farthest points of the two given clusters (i.e., the farthest neighbors) (Sharma & Neha, 2019).
Before clustering, the data were standardized using z-score to allow comparison of the indicators with different units of measurement.The data source is the National Statistics Bureau of the Agency for Strategic Planning and Reforms of the Republic of Kazakhstan.The objects to the cluster are 17 administrative units of the highest order of Kazakhstan, of which 14 are "regions" and 3 are "cities of republican significance." The selected set of indicators determines the level of innovative development of regions.Since the development of "smart cities" is closely related to digital technologies, it is possible to use the same indicators as an approximation.They are of two types: indicators of the region's innovation responsiveness (gross regional product, labor productivity, employment, fixed assets, returns on assets and ecological properties of production) and measures of innovation activity in the region (spending on R&D, the share of firms using innovations, production of innovative goods).

RESULTS
As a result of the clustering, there are three dendrograms (Figures 1, 2, and 3), one for each method are used.The visual analysis was the primary decision-making tool to cut dendrograms in this study, supplemented with considerations about the possible interpretation of the clusters.The red frames represent the selected clusters.As a unit of measurement in the pictures, the scale uses the Euclidean distance represented by standard deviations.

Single linkage
Figure 1 shows a dendrogram characteristic of the single linkage method.Most regions are connected into one large cluster, and the remaining clusters account for a small fraction of the remaining observations.Here the dendrogram is easily cut into three clear clusters: 1. Astana city and Almaty city.

All other regions.
Source: Developed by the authors using the data of the Bureau of National Statistics (2021).

Table 1. Description and interpretation of the regional clusters of Kazakhstan
Source: Authors' elaboration.

No. Regions Maximum Minimum
Three clusters (the same for all three used methods) The main result of the analysis is three clusters.The choice of method does not affect the composition of these clusters in any way, but it does affect the aggregation sequence.
"Long-term growth zone" is the largest of the clusters.It has the highest average return on fixed assets but the lowest average labor productivity, fixed assets, GRP, and R&D spending.It represents regions with almost no advantages for smart cities and needs to develop some before implementing them.It is easy to see that both strengths and weaknesses of the cluster are of economic nature, whereas its ecological and innovative indicators are somewhere in-between.The highest average return on fixed assets means this cluster is somewhat promising for future investors.Still, it will be necessary to carefully manage policy to stimulate investments in R&D, IT, and infrastructure instead of, for example, mining.However, with only one strength and many weaknesses, this cluster is not the best choice to start the development of smart cities in Kazakhstan.The following clusters seem to be more suitable."Cores of smart cities" are two cities of republican importance: Almaty and Astana.They have developed industrial IT clusters with the highest average labor productivity.The most significant average spending on R&D and innovation activity would allow for easier development and introduction of "smart" technologies.The highest average GRP and employment indicate significant market capacities and purchasing power, which could support the demand for services in a smart city.On top of that, Astana and Almaty have the most emissions-efficient economies, but the volume of emissions is still the biggest among clusters.The combination of these factors makes this cluster the most suitable for the pilot projects of smart cities in Kazakhstan.
"Innovative hub of the West" is, in fact, only one region, Atyrau.However, it is so different from all other clusters that its "closest neighbors" are only cities from the cluster "cores of smart cities."However, Atyrau region has its own advantages -the highest labor productivity in the sectors "Professional and scientific or technology activity," "Processed industry," "Transport and storage," and "Construction;" the highest average value of fixed assets and the highest amount of produced innovative goods.Those indicate that there exist firms capable of producing the staff necessary for the development of smart cities.These strengths make the cluster suitable for smart cities as well, but the weaknesses here are more influential than in the previous sector.In particular, this cluster has the lowest average employment and returns of fixed assets.
Applying the complete linkage method or Ward's method leaves "Cores of smart cities" and "Innovative hub of the West" unchanged but picks out certain regions from the "Long-term growth zone": • The complete linkage method gives an "Obsolete industries zone" cluster.This is a cluster of regions with non-innovative industry, which is also inefficient in terms of environmental pollutant emissions and has low labor productivity.The development of smart cities in these regions will require a complete ).Description and interpretation of the regional clusters of Kazakhstan overhaul.Significant attention will need to be paid to reducing pollution, stimulating entrepreneurial activity, and increasing labor productivity.
• With Ward's method, it is possible to distinguish a "Traditional industrial zone" cluster.Unlike the "Obsolete industries zone," this one has the advantage of the highest average return on fixed assets, and only a low return on GRP in relation to emissions remains among the disadvantages.This cluster includes regions with efficient enterprises in terms of capital investments, which, nevertheless, still pollute the environment heavily.The efficiency of capital investments can be used to manage a financial basis for developing smart cities.
Summing up, there are definitely three clusters.
Changing the approach makes it possible to determine an additional cluster from the "Long-term growth zone" and interpret it meaningfully.The choice of specific clustering depends on management preferences.Regardless of the approach, the most promising for developing smart cities are clusters "Cores of smart cities" and "Innovative hub of the West," which already have conditions for implementing pilot projects.Their cases should further serve as an example to develop smart cities in the remaining cluster(s) regions.It is important to note that the three most promising regions best suited to developing "smart cities" are well-distributed geographically.Those are Astana city for the northern part of the country, Almaty, for the southern one, and Atyrau, for the western region.

DISCUSSION
Cluster analysis to assess the potential of smart cities in Kazakhstan has not been used before.The fact that the cities of Astana and Almaty would be the most favorable for smart cities is not surprising since these are the two most developed cities in the country, being attractors for capital and migration.Suddenly, the city of Shymkent, which became the third million-plus city in the country, and several years ago received a "promotion" in the form of obtaining the status of a "city of republican significance," does not have such great potential.Despite the new position, its characteristics remain at the level of the least developed regions in the study context.At the same time, the Atyrau region turned out to be so unexpectedly original that it needed to be singled out in a separate cluster, which could not be expected after a superficial study.The presence of a large group of regions with low potential is also not surprising.Still, the opportunity to single out a cluster in a "transitional state" is also unexpected and made it possible to understand the capabilities of these regions better.Obviously, changing the clustering method affects the resulting clusters.However, two clusters are resistant to method changes."Cores of smart cities" and "Innovative hub of the West" remain the same regardless of the method used.This may indicate the "robustness" of their uniqueness.
Future research may focus on the following areas: • The most obvious is expanding the list of indicators for clustering.The list of indicators used in this paper is not exhaustive.For example, indicators of the development of political institutions were not included.However, under a unitary autocracy, the assumption of a slight difference in political institutions in the regions is quite reasonable.However, one may try to identify and include any differences in the analysis.
Expanding the list of indicators of technological development can also improve the investigation, but the research will face difficulties in collecting data.
• The next option is to change the administrative level.The present analysis is made for the highest level of the administrative system of Kazakhstan, which contains only regions and cities of republican significance.The change to a lower level could give more data points to single out individual cities or parts of regions in which smart city projects can be launched and efficiently managed.
• Using a different approach to clustering is an option, as well.In this paper, hierarchical agglomerative methods are used.At the same time, the cluster analysis toolkit is much more comprehensive.It includes hierarchical divisive methods and a large group of non-hierarchical methods that can give other valuable results.

CONCLUSION
This study aimed to evaluate the features of the regions of Kazakhstan to highlight those having the best potential to develop and manage "smart cities."The analysis gave either expected and unsuspected results.The study highlighted deep regional inequality affecting the potential to successfully develop and manage smart cities.There are three regions with the highest potential -Atyrau region, Almaty city, and Astana city.They have a lot of capital, high R&D and innovational activity, employment rate, and labor productivity to support "smart cities" development.The remaining regions will demand much more resources and effort to build "smart cities" and thus should not be the first to implement the concept.
(2021).They analyzed the smart strategies of European cities and developed a clustering of smart cities based on the activities implemented by the cities.Lytras et al. (2019) researched the clustering of smart city services.Finally, Xiang et al. (2019) explored streaming data processing applications in a smart city using a cluster analysis approach.

Figure 1 .Figure 2 .
Figure 1.Dendrogram of clusters of regions of Kazakhstan, single linkage

Figure 3 .
Figure 3. Dendrogram of clusters of regions of Kazakhstan, Ward's method

Table 1 (cont.
European scale with many relatively developed cities.At the same time, there have been no similar studies regarding the identification and development of smart cities in Kazakhstan.