HighTech and Innovation

Nowadays, machine learning methods are actively used to process big data. A promising direction is neural networks, in which structure optimization occurs on the principles of self-configuration. Genetic algorithms are applied to solve this nontrivial problem. Most multicriteria evolutionary algorithms use a procedure known as non-dominant sorting to rank decisions. However, the efficiency of procedures for adding points and updating rank values in non-dominated sorting (incremental non-dominated sorting) remains low. In this regard, this research improves the performance of these algorithms, including the condition of an asynchronous calculation of the fitness of individuals. The relevance of the research is determined by the fact that although many scholars and specialists have studied the self-tuning of neural networks, they have not yet proposed a comprehensive solution to this problem. In particular, algorithms for efficient non-dominated sorting under conditions of incremental and asynchronous updates when using evolutionary methods of multicriteria optimization have not been fully developed to date. To achieve this goal, a hybrid co-evolutionary algorithm was developed that significantly outperforms all algorithms included in it, including error-back propagation and genetic algorithms that operate separately. The novelty of the obtained results lies in the fact that the developed algorithms have minimal asymptotic complexity. The practical value of the developed algorithms is associated with the fact that they make it possible to solve applied problems of increased complexity in a practically acceptable time.


Introduction
Today, data, along with capital and labor, have become essential resources for ensuring the prosperity of society [1][2][3]. However, selecting important and helpful data is quite a difficult task, for which machine learning methods are now actively applied [4]. Machine learning is a set of algorithms in which knowledge extraction is improved each time, i.e., by increasing the learning level [5].
Machine learning techniques can be roughly classified as follows: learning with a teacher (supervised learning), learning without a teacher (unsupervised learning), partial learning (semi-supervised learning), learning with support (reinforcement learning), dynamic learning (online learning), and active learning [6,7]. Supervised learning is the most popular machine learning method, where the type of "the object and its corresponding label" organizes the learning system structurally [8]. The problem is to train an algorithm that forms a nontrivial relationship between object labels and their properties, or features. Here, initial data consist of two separate independent sets: a test set and a learning set. Unsupervised learning, on the other hand, uses information with unset object labels. Thus, any object is some set of features or a set of distance metrics in the multidimensional feature space relative to other objects considered in the sample [9][10][11]. In semi-supervised learning, only partially labeled data applies [12]. In online learning, learning objects feed sequentially one after another, so the algorithm needs to process each object sequentially, incrementally learning from the newly arrived knowledge about the new learning object [13]. In active learning, the learning objects feed the algorithm in a strictly selected sequence, which provides the algorithm with more effective learning [14,15]. This approach to learning relates closely to methods of experiment planning. In this paper, the emphasis is on supervised learning algorithms and classification tasks. Classification algorithms operate with discrete data sets. They require a finite set of labels called classes. The task of the learning algorithm is to correctly and unambiguously assign an object to one of the given classes [15,16].
Classification algorithms operate on a specific, discrete dataset. The presence of a final set of objective labels called classes is necessary. The task of the learning algorithm is to relate the object clearly to one of the presented classes [16]. Classification techniques inextricably link important information for the industry. A description type of the properties of the classification objects usually represents the data. Binkhonain & Zhao (2019) [17] presented a detailed description of the classification algorithms. One of the most promising classification algorithms is neural networks, based on the principle of the connection of many simple elements built into an optimal structure [18]. So-called artificial neurons act as simple elements. It is a noticeably simplified arithmetic model of the actual neuron structure. At the initial stage, the neuron receives output signals at its input, which then summarize and exceed the activation function of each of them, resulting in the output signal ( Figure 1a). There is a multilayer frontier neural network of the most direct propagation [19] (Figure 1b), which is an oriented graph with one-way directed edges. A significant disadvantage of neural networks that one may consider is the solving tasks aimed at black box-type pattern recognition. In other words, they are almost understandable for logical thinking and somehow classify a specific object [20]. Problems may arise with the choice of a suitable neural network structure. A proven method to activate recognition often predetermines the structure. To configure the network parameters, a system of automatic correction of some parameters based on optimization algorithms is applied [21]. Often, the task of neural network self-configuration is a multicriteria optimization problem [22], for which solving evolutionary algorithms, a class of stochastic algorithms that simulate the process of natural evolution, is applied [23]. Genetic algorithms (GAs) are among the most demanded evolutionary search techniques [24]. Due to their intrinsic parallelism, these algorithms make it possible to find a set of Pareto-optimal solutions in a single algorithm run [25].
Many scientific papers are devoted to the self-configuration of neural network parameters by applying evolutionary search algorithms. For example, Lessmann et al. [23] configured the system parameters of the Support Vector Machine (SVM) using a genetic algorithm. The genetic algorithm provided more stable system results than the lattice search. Gavrilescu et al. [24] used evolutionary algorithms to configure the properties of neural networks in their study. Akhmedova & Semenkin [25] developed a unique collective algorithm combining various bionic techniques. Tan et al. [26] created a co-evolutionary algorithm for extended criterion optimization. Evolutionary algorithms under development are widely applied to configure the parameters of machine learning techniques [27], fuzzy logic [28], and genetic programming [29]. These algorithms are relevant for solving various practical problems, such as the problem of human-machine interaction [30]. Therefore, special attention is focused on developing software systems that promote quality interaction between human intelligence and personal computers [31][32][33]. Additional tasks of such interaction have become automated user data monitoring, detection of handwritten text features, oral speech, and generation of appropriate system responses in a comfortable form for perception [34][35][36].
Although many scientists and specialists have investigated the self-configuration of neural networks, they have not yet proposed a comprehensive solution to this problem. Moreover, solving the practically essential problem of human-machine interaction based on neural networks requires developing methods of self-configuration for machine learning algorithms in general and neural networks in particular [37][38][39][40][41].
Advanced multicriteria evolutionary algorithms can be divided into three classes: algorithms directly based on the Pareto dominance relation (examples: NSGA-II [42], SPEA2 [43], PESA-II [44]); algorithms that optimize the so-called indicators-functions that characterize the quality of the population as a whole with a single number (examples: IBEA [45], HypE [46]); algorithms that reduce a multicriteria problem to several scalar problems (example: MOEA/D [47]; hybrid algorithms such as NSGA-III [48] are also known. A significant part of the algorithms belonging to the first of these classes (and hybrid algorithms) use a procedure known as non-dominant sorting to rank solutions. The disadvantage is that this non-dominated sorting procedure has a relatively high computational complexity: most algorithms known for it have Θ(n 2 k) complexity, where n is the number of solutions and k is the number of objective functions.
Since this study does not address questions about objective functions and, consequently, about the compliance of the vectors of objective variable values (the "genotype" of an individual) and the vectors of objective function values (fitness of an individual), we will further identify the solution (individual) and its fitness (and consider only fitness), which allows us to consider solutions as points in the k-dimensional space of values of objective functions.
The non-dominated sorting procedure was proposed for use in multicriteria evolutionary algorithms as part of the NSGA algorithm [41,49]. The same publication proposed an algorithm with Θ(n 3 k) complexity in time and O(n) in memory, where n is the number of points. In the next version of this algorithm, NSGA-II [42], the algorithm for nondominated sorting was improved, its complexity was Θ(n 2 k) in time and O(n 2 ) in memory, which partly contributed to the popularity of the NSGA-II algorithm. However, the complexity remains high. Thus, in almost all algorithms using non-dominated sorting, it remains either the bottleneck of the algorithm in terms of asymptotics or one of these points.
In this regard, several subsequent publications have proposed alternative, more efficient algorithms for nondominated sorting. Most of them have Θ(n 2 k) complexity in time in the worst case and O(nk) in memory [50]. However, the algorithm proposed by Jensen [51] and subsequently improved in Fortin et al. [52] has a time complexity of O(n(log n) k-1 ) and greater efficiency than other algorithms for a large number of points.
The question of the efficiency of non-dominated sorting as part of an incremental multicriteria evolutionary algorithm was raised by Nebro & Durillo [53], where it was shown that the incremental version of the NSGA-II algorithm has advantages in the quality of the solutions obtained compared to the classical one (with the same restriction on the number of individuals' fitness calculations), but it significantly loses in time. A team of scholars headed by Deb proposed the first specialized non-dominated sorting algorithm designed to efficiently recalculate ranks when inserting new points, known as Efficient Non-dominated Level Update (ENLU) [54], and although it still requires Θ(n 2 k) time in the worst case per insert operation, in practice the insert is often made faster. However, despite the fact that this result was improved in subsequent research [55], the efficiency of the procedure for adding points and updating rank values in non-dominated sorting-incremental non-dominated sorting-remains low. In addition, these studies did not consider the issue of operating in the context of asynchronous fitness computation when multiple threads insert points in parallel and independently of each other.
The presented research work aimed to improve the quality of information selection while solving neural network problems in human-machine interaction based on creating a self-constructed co-evolutionary optimization algorithm. The co-evolutionary genetic algorithm for multicriteria optimization is developed based on the standard genetic algorithms VEGA (Vector Evaluated Genetic Algorithm), SPEA (Strength Pareto Evolutionary Algorithm), and NSGA-2 (Non-Dominated Sorting Genetic Algorithm). The developed hybrid co-evolutionary multicriteria optimization algorithm combines the advantages and eliminates the disadvantages of its constituent algorithms, thus significantly increasing the efficiency of its work.

Materials and Methods
The authors employed efficient analytical techniques, statistics, probability theory, evolutionary systems, hardware learning, and monitoring of hidden information patterns. Figure 2 shows the algorithm of the research methodology.
To perform research to select informative features, the authors developed software in the C# language in Microsoft Visual Studio. It integrates evolutionary algorithms for multicriteria optimization, including VEGA, NSGA, SPEA, and the hybrid co-evolutionary algorithm.
VEGA is an algorithmic tool invented by Schaffer at the end of the 1980s. It actively uses sampling according to specific criteria; therefore, the percentage of individuals sampled according to given parameters is identical. An algorithm called "NSGA" created by Srinivas and Deb [41], uses a Pareto dominance approach to calculate fitness, dividing fitness to maintain the population distribution ranges in a stable state. In the SPEA methodology used by Zitzler et al. [43], not quite combinable solutions found on the actual iteration remain mostly external. The fitness of individuals is calculated from the position of Pareto dominance. Clustering reduces the number of external solutions stored in the set. The listed optimization methods have their pros and cons. VEGA has greater convergence but lacks a mechanism for uniform distribution of the upper layer of the Pareto front surface. NSGA provides excellent coverage of the Pareto front, like some other techniques, at an increased computational cost. This scientific paper presents a co-evolutionary genetic tool for multicriteria optimization (see Section 3.1), where the investigated algorithms combine into a group.

Figure 2. The research methodology
The study of the rationality of evolutionary algorithms for multicriteria optimization relied on the recognition of handwritten numeric symbols. The primary analytical information came from the Modified National Institute of Standards and Technology (MNIST) dataset, which included sixty thousand training samples and ten thousand handwritten digit samples [43]. The MNIST dataset contains five different handwritten digit recognition tasks:  MNIST represents numeric characters from zero to nine written by hand on a black background;  MNIST rotated represents images inverted at a random angle from the corresponding base;  MNIST random background represents numbers arranged over randomly generated background noise;  MNIST background images represent MNISTs applied to parts of a limited set of images;  MNIST bare background images represent optional angle-spaced MNIST pictures applied to pieces of the images.

Creating a Multicriteria Optimization Algorithm SelfCOMOGA
Based on the VEGA, SPEA, and NSGA-2 methods, the authors developed a co-evolutionary genetic algorithm for multicriteria optimization called the Self-configuring Co-evolutionary Multi-objective Genetic Algorithm, which sounds SelfCOMOGA [28], where the previously listed algorithms belong to a single group. This algorithm is a progression of the technique of selective, extended learning systems with distance learning proposed in his time by the researcher E.A. Sopov [29]. The authors chose the VEGA, SPEA, and NSGA-2 systems because of various approaches to selecting population individuals that provide an opportunity to avoid stagnation. The main feature of the hybrid algorithm assumed to be optimal for a particular activity is monitoring the performance of the evolving algorithms.  In the second stage, the testing of the initialized methods SPEA, VEGA, and NSGA-2 occurs with a given number of iterations. This stage is referred to as the adaptation period and is one of the co-evolutionary algorithms, SelfCOOMGA.
In the third stage, the co-evolutionary algorithm's efficiency is evaluated against the selected quality standards. Any algorithmic iteration has several ideal solutions without the possibility of comparing methods, so many of them cannot be applied qualitatively. The authors used the analyzed sum of the following criteria to monitor individual algorithms in SelfCOMOGA:  K1 represents the percentage of not enough coordinated solutions. A general solution cycle of co-evolutionary systems undergoing non-dominated filtering is developed with further extraction of a large number of the best solutions from the sorted array for further calculating their proportion relative to the co-evolutionary methods included in this set;  K2 is the uniformity of the distribution of not quite dominant solutions of the co-evolutionary method computed as the variance of distances in the displayed criteria space.
The fourth stage is the distribution of resources. The size of the winning algorithm group, found based on the analyzed sum of the criteria described above, increases due to a decrease in the population parameters of other algorithms. The population size of the losing method decreases by a few percent from its stable size. In the fifth stage, already completely new method populations are filled with problems from the general pool selected using rank selection. The stopping criterion of the hybrid co-evolutionary algorithm is a predetermined integration algorithmic number.

Analysis of Algorithm Efficiency
The efficiency evaluation of the developed co-evolutionary algorithm involves ten test tasks of multicriteria optimization for analyzing the practical work of the algorithm.
Test task number 5: where Test task number 6: .
Test task number 7: Comparison of the efficiency of the algorithms under consideration used the IGD (the indicator of generational distinction) metric: where P* is an actual Pareto front, A represents a Pareto front approximation by the optimization algorithm, v is a point of the actual Pareto front, and d(v, D) represents a minimal distance between points by Euclidean metric. The smaller the IGD metric is, the better the optimization problem solution is.
The SPEA, VEGA, NSGA-2, and SelfCOMOGA algorithms presented in this paper can be random; therefore, twenty activations of each algorithm were performed in each of the ten tests. At the end of the analysis, the IGD metric designation and the working time in seconds for the software algorithm were recorded. Based on the selected statistics, the importance of distinctive features of the IGD metric between the studied methods was analyzed using the Wilcoxon test. Table 1 Table 2 shows the results of applying the W-criterion required to predetermine the significance of differences between the IGD-labeled metric of the SelfCOMOGA algorithm and the SPEA, VEGA, and NSGA-2 methods. The results show that in nine of ten tests, the SelfCOMOGA algorithm is statistically significant (p = 0.05) and noticeably outperforms the available SPEA, VEGA, and NSGA-2 algorithms in the IGD parameter, giving way to the more powerful SPEA algorithm in unambiguity. The SelfCOMOGA algorithm outperforms SPEA and NSGA-2 in the implementation rate, being significantly inferior only to the VEGA algorithm and noticeably exceeding the competitor in the IGD parameter, which is the key one in this context. The obtained results prove the rationality of the developed co-evolutionary algorithm and the justification of its coupling into the co-evolution of the most commonly used evolutionary methods of multicriteria optimization with unequal decision selection technologies.

Creating and Integrating the Multicriteria Approach to Selecting Informative Features in the Process of Work
Selecting important informative features is not an easy stage in modern machine learning. The efficiency of the overall machine learning system directly depends on the quality of this stage. This research conducted and studied the method used in the multicriteria approach for automatic feature selection (Figure 8).
The proposed feature selection method based on multicriteria optimization belongs to the so-called "wrapper methods" [34,[56][57]. The presented scientific development involved a comparison with the principal component analysis (PCA). The optimization-driven feature selection tools are designed as written below. The input variables are binary vectors of length t. Here, t represents the initial number of sample features. An individual bit of the mentioned vector gains a value of one or zero, where one denotes selecting a specific feature for future integration into the model, and zero means the absence of selection. Optimization occurs according to a pair of criteria: the classification accuracy represents the maximum possible criterion and the number of features is the minimum criterion. To streamline the process, the authors selected algorithms SPEA, VEGA, NSGA-2, and SelfCOMOGA with low probability of crossover or mutation. Their probability calculation can use the formula: where k = 3 in completed experiments but can take other non-negative values, |P| represents population size parameters, and p is a probability notation.
For conducting experiments on the target problem solving of selecting informative properties, the authors developed software in C# language in Microsoft Visual Studio 2012 with implementing the evolutionary methods of SREA, VEGA, NSGA-2, and SelfCOMOGA multicriteria optimization. The section below presents the results of the conducted digital experiment to monitor the solution of the multicriteria algorithmic problem of selecting the informative properties of the modern microsystem of the available classifiers.

Studying the Efficiency of a Multicriteria Approach to Select Special Informative Features
Studying the rationality of the hybrid learning algorithm occurred on the handwritten digit detection task, with primary data taken from MNIST [43]. The experiment involved a convolutional neural network (CNN) with a pair of convolutional layers, two subsampling layers, and a fully connected output layer. Table 3 presents the application of CNN parameter assignment in the experiments. From the start of the hybrid algorithm (HA), one can safely employ the co-evolutionary HA with the characteristics shown in Table 4.

Table 3. CNN parameter notations for qualitative handwritten digit recognition tasks Parameter Value
Number of feature maps on the 1st convolutional layer 6 Number of feature maps on the 2nd convolutional layer 12 The dimensionality of the convolution kernel of the 1st and 2nd convolutional layers 5 × 5 The scale of the subsampling layers 2 The number of neurons of a fully connected layer 192 Number of learning epochs 50 The authors compared the efficiency of CNNs studied by the error backpropagation (EBP) method and by a hybrid algorithm (HA plus EBP). However, the CNN investigated using the genetic algorithm did not show increased or noticeable efficiency due to the instantaneous spatial search dimension, so the authors omitted the results here. The analyzed criteria are classification accuracy and F-score. Totals of the available efficiency of approaches for the five previously described tasks of the MNIST group by the parameter of classification accuracy are shown in Figure 9, by the criterion of F-scorein Figure 10. According to the experimental results, applying a specialized hybrid algorithm to study CNN is much better than the error backpropagation method for four of the five tasks. The presented results demonstrate that the effectiveness of the generalized method as a whole strongly depends on the initial data used and on the chosen scheme for merging into a team at the final stage. Regarding the effectiveness of various multicriteria optimization algorithms as applied to feature selection and optimal team design, the algorithm that combines the advantages of the SPEA, NSGA-2, and VEGA algorithms turned out to be the most effective, and, as has been shown, it outperforms them noticeably. Thus, a selfconfiguring co-evolutionary algorithm for multicriteria optimization was developed, which outperforms the optimization algorithms included in it as components in solving test problems of multicriteria optimization and practical problems.

Conclusions
This research developed and studied in practice a hybrid learning algorithm for the convolutional neural network called SelfCOMOGA based on a specialized progressive optimization algorithm and the method of backpropagation of the existing error in the system. The hybrid co-evolutionary algorithm SelfCOMOGA simultaneously involves many populations undergoing independent evolution. In obtaining solutions, the listed algorithms exchange the necessary information and permissions. SelfCOMOGA is a unique algorithm that uses three well-known methods simultaneously with proven effectiveness (SPEA, VEGA, and NSGA-2). It is essential to understand that the reasons for using these algorithms are that they allow the most rationalization techniques for selecting individuals and maintaining the diversity of the population as a whole. The authors organized a comparative analysis of the primary algorithms and SelfCOMOGA on ten tests to monitor the performance value by applying two to three properties that need optimization. The experiment predetermined the superiority of the SelfCOMOGA algorithm over the individual algorithms comprising it. If you do not achieve a noticeable increase, the degree of efficiency remains at the same level.
Based on the proposed method, the authors developed software in the C# language in Microsoft Visual Studio, which was used as a basis for conducting experiments to optimize the developed techniques. The system combines VEGA, SPEA, NSGA-2, and SelfCOMOGA and makes it possible to test these algorithms on an extended set of multicriteria optimization tasks, considering the necessity of feature selection.
A hybrid training method for convolutional neural networks (CNN) based on the systematic use of genetic algorithms (HA) and error backpropagation (EBP) was created. The proposed algorithm is noticeably superior to the HA and EBP algorithms that function autonomously on the handwritten digit detection task. A comparative analysis of the available algorithmic efficiency occurred on the parameters of classification determination clarity and the F-score option.
Thus, a hybrid co-evolutionary algorithm was developed, which significantly outperforms all the algorithms included in it, including error backpropagation algorithms and genetic algorithms that operate separately. The scientific novelty of the obtained results lies in the fact that the developed non-dominant incremental sorting algorithms have minimal asymptotic complexity. The practical value of the developed algorithms is related to the fact that they make it possible to solve applied problems of increased complexity in a practically acceptable time.
Among the possible directions for further research, the following can be distinguished: the use of a multicriteria approach for designing ensembles of other classification algorithms; the application of other optimization algorithms for feature selection, classifier ensemble designing, and convolutional neural network pretraining; building ensembles of convolutional neural networks; and other deep learning algorithms.

Recommendations for the Future
Among the available ways of the present research, it is possible to highlight the application of the multicriteria approach aimed at designing ensembles of other classification algorithms, application of more optimization algorithms of feature selection, creating ensembles of classifiers and pre-training of CNN, creating combinations of convolutional neural networks and algorithms of the deep learning process.

Data Availability Statement
All data generated or analyzed during this study are included in this published article.

Funding
Selected findings of this work were obtained under the Grant Agreement in the form of subsidies from the federal budget of the Russian Federation for state support for the establishment and development of world-class scientific centers performing R&D on scientific and technological development priorities dated April 20, 2022, No. 075-15-2022-307.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.