Knowledge management overview of feature selection problem in high-dimensional financial data: cooperative co-evolution and MapReduce perspectives

  • Received August 30, 2019;
    Accepted November 20, 2019;
    Published December 26, 2019
  • Author(s)
  • DOI
  • Article Info
    Volume 17 2019, Issue #4, pp. 340-359
  • Cited by
    4 articles

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License

The term “big data” characterizes the massive amounts of data generation by the advanced technologies in different domains using 4Vs – volume, velocity, variety, and veracity - to indicate the amount of data that can only be processed via computationally intensive analysis, the speed of their creation, the different types of data, and their accuracy. High-dimensional financial data, such as time-series and space-time data, contain a large number of features (variables) while having a small number of samples, which are used to measure various real-time business situations for financial organizations. Such datasets are normally noisy, and complex correlations may exist between their features, and many domains, including financial, lack the al analytic tools to mine the data for knowledge discovery because of the high-dimensionality. Feature selection is an optimization problem to find a minimal subset of relevant features that maximizes the classification accuracy and reduces the computations. Traditional statistical-based feature selection approaches are not adequate to deal with the curse of dimensionality associated with big data. Cooperative co-evolution, a meta-heuristic algorithm and a divide-and-conquer approach, decomposes high-dimensional problems into smaller sub-problems. Further, MapReduce, a programming model, offers a ready-to-use distributed, scalable, and fault-tolerant infrastructure for parallelizing the developed algorithm. This article presents a knowledge management overview of evolutionary feature selection approaches, state-of-the-art cooperative co-evolution and MapReduce-based feature selection techniques, and future research directions.

view full abstract hide full abstract
    • Figure 1. General feature selection process
    • Figure 2. Overall categories of evolutionary computation for feature selection
    • Figure 3. A general architecture of cooperative co-evolutionary algorithm
    • Figure 4. An outline of cooperative co-evolutionary algorithm
    • Figure 5. A typical MapReduce workflow shuffled list
    • Figure 6. The basic flowchart of a MapReduce model
    • Figure 7. Feature selections techniques based on MapReduce
    • Table 1. Feature selection techniques based on cooperative co-evolution
    • Table 2. Feature selection techniques based on cooperative co-evolution and MapReduce