Graph Walk Optimization

Dr. Ioana Banicescu
ioana@cse.msstate.edu

Data collection and analysis is rapidly changing the way scientific, national security and business communities operate. Data analytics applications, especially the ones involving graph analytics have received increased attention over the years. The performance of these applications is often essential, even critical sometimes, to achieve the objectives proposed by the domain areas which are making use of them. Therefore, many research efforts have been attempted to optimize performance. These optimizations include improving performance (per core), increasing scalability of their execution in parallel and distributed environments, and dealing with dynamically changing large data sets. These optimizations are especially important with applications involving big data. For example, determining and discovering the complex relationships among real time large graph-based data, often comprising of billions of edges bounded by security time constraints, requires a scalable and efficient graph-based query processing.

This project focuses on research efforts to optimize performance of algorithms addressing such challenges. These optimizations include improving performance (per core), increasing scalability of the execution of big data analytics in parallel and distributed environments, and dealing with dynamically changing large data sets.

The primary goal of this project is to address the increasing scale of graph based data informatics with less cost to performance by providing efficient and robust solutions for the resilient execution of graph analytic queries in parallel and distributed environments. A graph walk represents the fundamental operation for answering graph-based queries over a large scale data pertaining to applications from a multitude of domains, such as, large scale infrastructure, social networks, network intrusion detection, network reliability and knowledge discovery. Consequently, enhancing the performance of the graph walk will benefit scalability of the graph analytic queries. Addressing the increasing scale of such graph based data informatics with less cost to performance necessitates the use of efficient and robust solutions for their execution in parallel and distributed environments. For a graph walk, we investigate and employ various computation scheduling algorithms that take into account the reliability of the nodes, in addition to the heterogeneity (availability) of the nodes of the computing clusters. In particular, this project considers the potential role of resilient graph walk scheduling in leveraging big data, to address various cyber security challenges and ensuring that the needed information is available to allow for an effective preparation and a robust and timely response to public emergencies, thereby enabling cyber resilience.