The proposed innovation is Spark-RS, an open source software project that enables GPU-accelerated remote sensing workflows in an Apache Spark distributed computing cluster. Current state-of-the-art parallel systems like Hadoop and Spark offer horizontally scalable analytics and reduced costs for enterprises, but weren't built to natively consume and process large remote sensing raster datasets. Conversely, GPUs can vastly accelerate image processing operations. Some open source projects have arisen that showcase hybrid Hadoop/GPU computing. However, there are no mature open source projects that utilize GPUs within Spark (an eventual replacement of MapReduce) and none that were built to process large remote sensing imagery. This is the primary role of the proposed innovation, Spark-RS. Spark-RS contains three primary components. One is a parallel large image loading component that quickly loads large multi-band imagery into a Spark cluster. The second component is a remote sensing library for Spark applications. It provides an API for reading and writing large images and wraps many common image operations from existing open source and NASA-built remote sensing libraries. The third component is a GPU management library for Spark. It simplifies and abstracts utilization of GPUs within a Spark application.
Potential NASA Commercial Applications: Each of the datasets listed in this SBIR's description and their corresponding applications are all potential candidates for use by Spark-RS since they involve large multi-spectral and hyper-spectral raster-based observations. These include HyspIRI, JPSS-1, NPP, SDO, MRO, MERRA, MERRA2, LandSat among many, many others. Thus, any NASA datacenter that has a Hadoop-based cluster will benefit from the proposed innovation, Spark-RS.