Big Data Fusion & Compression

Dr. Jim Fowler
fowler@ece.msstate.edu

Challenge

Many techniques traditionally used for analyzing data face difficulty in scaling to datasets of large size due to issues in computation, memory, and algorithm validity from a mathematical and statistical standpoint. This project is focused on devising algorithmic solutions to providing big data analytics by targeting fundamental analysis tasks: dimensionality reduction (the reduction of the size of the dataset before analysis), information mining (the discovery of patterns in data), multi sensor fusion (the combining, or fusing, of data arising from disparate sources), and uncertainty quantification (the characterization of uncertainty of analysis results due to inherent mathematical limitations of the algorithms applied in conjunction with the data used).

Current Practice

Reducing dimensionality has long played a critical role in the analysis of big data, yet existing dimensionality-reduction techniques tend to be computationally complex and difficult to extend to very large datasets. Additionally, using multiple sensors and combining the results can often result in better performance due to the additional information contained in the different data types. Again, however, existing techniques for such multi sensor fusion are difficult to scale to large data sizes. Finally, detecting and predicting events are often grounded in the analysis of patterns arising from a combination of measurements/observations, system states and high-level prior domain knowledge, but observing these patterns in very large datasets is hindered by computational and mathematical limitations.

Technical Approach

We are developing an understanding for datasets of sizes that preclude traditional approaches. We are exploring novel methods for dimensionality reduction and fusion of heterogeneous big data sources. The focus is on automated procedures for learning and analyzing patterns and features from the data at unprecedented scales, with the ultimate goal of understanding the inherent nature of the datasets.

Combining basic science, applied research and advanced development tasks, we are developing approaches to:

Fuse multiple heterogeneous data sources simultaneously with dimension reduction via random projection wherein the mapping from high dimension to lower dimension is chosen at random
Extract useful information residing in big data with high spatial, temporal and spectral dimensions
Detect and classify targets from multi sensor sources such as thermal, multispectral, infrared, radar and hyperspectral
Quantify the degree of consensus among multiple heterogeneous data sources regarding the same event (past, present or predicted).

Impact

The ever-expanding proliferation of sensors, coupled with their increasing diversity in sensing modality, will result in this work having significant potential for impact in a large array of practical applications. It is anticipated that dimensionality reduction, sensor fusion and multi sensor analytics will be fundamental to next-generation algorithms for the understanding of big data that will touch most, if not all, upcoming technological challenges to be faced over the next decade.