Skip Navigation

A Scheduling-Based Framework for Efficient Massively Parallel Execution, Phase II

Completed Technology Project

Project Introduction

Modeling and simulation on high-end computing systems has grown increasingly complex in recent years as both models and computer systems continue to advance. The majority of coding and debugging time is not spent defining the problem physics but instead in balancing computations between multiple heterogeneous devices, handling communication of data, managing distributed memory systems, and providing fault-tolerance. Often, the resulting programs are barely readable as the details of the work being performed are obscured by hardware-specific setup and communication code that dominates a program's codebase. Even worse, the code used to balance computation, manage data communication, and provide fault-tolerance is re-implemented in each piece of an application even though it performs the same tasks across those sections of the software. This makes software more difficult to maintain and upgrade, and hinders porting to new hardware platforms as they become available. The time spent improving, modifying, or debugging these device specific code paths and common code sections could be better spent improving kernel performance or adding new features. To address the problem of separating physical science from computing science, we are developing a solution that decouples the problem definition from the platform-specific implementation details. This is accomplished by dividing the computation into distinct tasks, each of which takes some defined input data and produces some output data. These tasks can then be connected into a task graph by defining their dependencies on each other. This task graph describing a particular code can then be used to automatically manage data and schedule work across heterogeneous devices without requiring further user intervention. Therefore, to make use of new hardware, the user need only port any tasks that might take advantage of the new hardware, and all scheduling, data management, and synchronization required are handled automatically. More »

Primary U.S. Work Locations and Key Partners

Project Closeout

Share this Project

Organizational Responsibility

Project Management

Project Duration

Technology Maturity (TRL)

Technology Areas

Target Destinations

Light bulb

Suggest an Edit

Recommend changes and additions to this project record.