In an attempt to build more computationally powerful systems and improve the FLOPS/dollar and FLOPS/Watt of high-performance computers (HPCs), we have recently seen the proliferation of GPU-based clusters. Many major vendors are now supporting this technology and such systems are becoming increasingly common everywhere from university research labs to the Top500 supercomputer list. To take advantage of these systems, however, requires understanding a new programming paradigm, namely the ability to program GPUs. In this project, we propose the development of tools to make programming massive GPU clusters transparent to the developer, thus allowing them to access their extreme computational power without significant additional effort. Specifically, we propose the development of dense and sparse linear algebra libraries that are optimized for the underlying GPU hardware but are called by the user from a standard, high-level interface. This work will build off our NASA-funded and commercially-successful CULA libraries, a set of GPU-accelerated, dense linear algebra libraries that run on single GPUs. More recently we have begun adding sparse linear algebra libraries to this package and prototyping their transition to multiple GPUs located in a single node. The proposed effort will involve scaling this technology so it is available on massive GPU clusters, thus making the power of such systems easily accessible to all programmers.