{"project":{"acronym":"","projectId":89603,"title":"A Scheduling-Based Framework for Efficient Massively Parallel Execution","primaryTaxonomyNodes":[{"taxonomyNodeId":10846,"taxonomyRootId":8816,"parentNodeId":10844,"level":3,"code":"TX11.6.2","title":"Automated Exascale Software Development Toolset","definition":"The Automated Exascale Software Development Toolset provides automated, exascale application performance monitoring, analysis, tuning, and scaling.","exampleTechnologies":"Auto parallelizing compiler for shared-memory computers","hasChildren":false,"hasInteriorContent":true}],"startTrl":3,"currentTrl":6,"endTrl":6,"benefits":"These tools could be used to reduce software development and maintenance time and improve the computational performance and scalability of a variety of high-performance computing applications. Specifically, we intend to initially focus on applying this technology to GEOS-5 for earth modeling, this framework can also benefit other earth modeling packages. Another application area of this technology is CFD solvers such as Fun3D and OVERFLOW. It can also be an enabler for the High-End Computing Capability (HECC) project, by enhancing both usability and performance of applications able to take advantage of heterogeneous compute architectures. Additionally, it permits more flexibility in hardware design and purchasing for high-end computing systems by reducing the effort required to port applications to new hardware architectures, such as GPUs and Xeon Phis.
Most HPC software will be able to benefit from this technology, particularly applications meant to scale to large computer systems and/or target heterogeneous hardware configurations. Expected application domains include electromagnetics simulations, computational chemistry, oil and gas exploration, and financial modeling. It also includes any domains that involve large-scale sparse linear algebra operations, large-scale image processing, and other physics-based and multi-disciplinary modeling applications.","description":"Modeling and simulation on high-end computing systems has grown increasingly complex in recent years as both models and computer systems continue to advance. The majority of coding and debugging time is not spent defining the problem physics but instead in balancing computations between multiple heterogeneous devices, handling communication of data, managing distributed memory systems, and providing fault-tolerance. Often, the resulting programs are barely readable as the details of the work being performed are obscured by hardware-specific setup and communication code that dominates a program's codebase. Even worse, the code used to balance computation, manage data communication, and provide fault-tolerance is re-implemented in each piece of an application even though it performs the same tasks across those sections of the software. This makes software more difficult to maintain and upgrade, and hinders porting to new hardware platforms as they become available. The time spent improving, modifying, or debugging these device specific code paths and common code sections could be better spent improving kernel performance or adding new features. To address the problem of separating physical science from computing science, we are developing a solution that decouples the problem definition from the platform-specific implementation details. This is accomplished by dividing the computation into distinct tasks, each of which takes some defined input data and produces some output data. These tasks can then be connected into a task graph by defining their dependencies on each other. This task graph describing a particular code can then be used to automatically manage data and schedule work across heterogeneous devices without requiring further user intervention. Therefore, to make use of new hardware, the user need only port any tasks that might take advantage of the new hardware, and all scheduling, data management, and synchronization required are handled automatically.","startYear":2016,"startMonth":4,"endYear":2020,"endMonth":2,"statusDescription":"Completed","principalInvestigators":[{"contactId":507198,"canUserEdit":false,"firstName":"Evenie","lastName":"Chao","fullName":"Evenie Chao","fullNameInverted":"Chao, Evenie","primaryEmail":"chao@emphotonics.com","publicEmail":true,"nacontact":false}],"programDirectors":[{"contactId":206378,"canUserEdit":false,"firstName":"Jason","lastName":"Kessler","fullName":"Jason L Kessler","fullNameInverted":"Kessler, Jason L","middleInitial":"L","primaryEmail":"jason.l.kessler@nasa.gov","publicEmail":true,"nacontact":false}],"programExecutives":[{"contactId":215154,"canUserEdit":false,"firstName":"Jennifer","lastName":"Gustetic","fullName":"Jennifer L Gustetic","fullNameInverted":"Gustetic, Jennifer L","middleInitial":"L","primaryEmail":"jennifer.l.gustetic@nasa.gov","publicEmail":true,"nacontact":false}],"programManagers":[{"contactId":62051,"canUserEdit":false,"firstName":"Carlos","lastName":"Torrez","fullName":"Carlos Torrez","fullNameInverted":"Torrez, Carlos","primaryEmail":"carlos.torrez@nasa.gov","publicEmail":true,"nacontact":false}],"projectManagers":[{"contactId":96166,"canUserEdit":false,"firstName":"Daniel","lastName":"Duffy","fullName":"Daniel Q Duffy","fullNameInverted":"Duffy, Daniel Q","middleInitial":"Q","primaryEmail":"daniel.q.duffy@nasa.gov","publicEmail":true,"nacontact":false},{"contactId":461333,"canUserEdit":false,"firstName":"Theresa","lastName":"Stanley","fullName":"Theresa M Stanley","fullNameInverted":"Stanley, Theresa M","middleInitial":"M","primaryEmail":"theresa.m.stanley@nasa.gov","publicEmail":true,"nacontact":false}],"coInvestigators":[{"contactId":2950,"canUserEdit":false,"firstName":"Adam","lastName":"Markey","fullName":"Adam Markey","fullNameInverted":"Markey, Adam","primaryEmail":"A.R.Markey@Gmail.Com","publicEmail":true,"nacontact":false}],"website":"","libraryItems":[{"file":{"fileExtension":"pdf","fileId":295315,"fileName":"briefchart","fileSize":62089,"objectId":291840,"objectType":{"lkuCodeId":889,"code":"LIBRARY_ITEMS","description":"Library Items","lkuCodeTypeId":182,"lkuCodeType":{"codeType":"OBJECT_TYPE","description":"Object Type"}},"objectTypeId":889,"fileSizeString":"60.6 KB"},"files":[{"fileExtension":"pdf","fileId":295315,"fileName":"briefchart","fileSize":62089,"objectId":291840,"objectType":{"lkuCodeId":889,"code":"LIBRARY_ITEMS","description":"Library Items","lkuCodeTypeId":182,"lkuCodeType":{"codeType":"OBJECT_TYPE","description":"Object Type"}},"objectTypeId":889,"fileSizeString":"60.6 KB"}],"id":291840,"title":"Briefing Chart","description":"A Scheduling-Based Framework for Efficient Massively Parallel Execution, Phase II Briefing Chart","libraryItemTypeId":1222,"projectId":89603,"primary":false,"publishedDateString":"","contentType":{"lkuCodeId":1222,"code":"DOCUMENT","description":"Document","lkuCodeTypeId":341,"lkuCodeType":{"codeType":"LIBRARY_ITEM_TYPE","description":"Library Item Type"}}},{"caption":"A Scheduling-Based Framework for Efficient Massively Parallel Execution, Phase II","file":{"fileExtension":"png","fileId":296031,"fileName":"SBIR_2015_2_BC_S5.01-8794","fileSize":46493,"objectId":292559,"objectType":{"lkuCodeId":889,"code":"LIBRARY_ITEMS","description":"Library Items","lkuCodeTypeId":182,"lkuCodeType":{"codeType":"OBJECT_TYPE","description":"Object Type"}},"objectTypeId":889,"fileSizeString":"45.4 KB"},"files":[{"fileExtension":"png","fileId":296031,"fileName":"SBIR_2015_2_BC_S5.01-8794","fileSize":46493,"objectId":292559,"objectType":{"lkuCodeId":889,"code":"LIBRARY_ITEMS","description":"Library Items","lkuCodeTypeId":182,"lkuCodeType":{"codeType":"OBJECT_TYPE","description":"Object Type"}},"objectTypeId":889,"fileSizeString":"45.4 KB"}],"id":292559,"title":"Briefing Chart Image","description":"A Scheduling-Based Framework for Efficient Massively Parallel Execution, Phase II","libraryItemTypeId":1095,"projectId":89603,"primary":false,"publishedDateString":"","contentType":{"lkuCodeId":1095,"code":"IMAGE","description":"Image","lkuCodeTypeId":341,"lkuCodeType":{"codeType":"LIBRARY_ITEM_TYPE","description":"Library Item Type"}}},{"caption":"Final Summary Chart Image","file":{"fileExtension":"png","fileId":297784,"fileName":"1583256121047","fileSize":49612,"objectId":294317,"objectType":{"lkuCodeId":889,"code":"LIBRARY_ITEMS","description":"Library Items","lkuCodeTypeId":182,"lkuCodeType":{"codeType":"OBJECT_TYPE","description":"Object Type"}},"objectTypeId":889,"fileSizeString":"48.4 KB"},"files":[{"fileExtension":"png","fileId":297784,"fileName":"1583256121047","fileSize":49612,"objectId":294317,"objectType":{"lkuCodeId":889,"code":"LIBRARY_ITEMS","description":"Library Items","lkuCodeTypeId":182,"lkuCodeType":{"codeType":"OBJECT_TYPE","description":"Object Type"}},"objectTypeId":889,"fileSizeString":"48.4 KB"}],"id":294317,"title":"Final Summary Chart Image","description":"Final Summary Chart Image","libraryItemTypeId":1095,"projectId":89603,"primary":true,"publishedDateString":"","contentType":{"lkuCodeId":1095,"code":"IMAGE","description":"Image","lkuCodeTypeId":341,"lkuCodeType":{"codeType":"LIBRARY_ITEM_TYPE","description":"Library Item Type"}}}],"transitions":[{"transitionId":67742,"projectId":89603,"partner":"Other","transitionDate":"2016-04-01","path":"Advanced From","relatedProjectId":33744,"relatedProject":{"acronym":"","projectId":33744,"title":"A Scheduling-Based Framework for Efficient Massively Parallel Execution","startTrl":3,"currentTrl":4,"endTrl":4,"benefits":"Our proposed technologies will directly improve the performance, resource utilization, productivity, and fault tolerance of many high performance NASA solver codes. Codes such as GEOS, ModelE, Snowflake, OVERFLOW, and FUN3D can achieve both short and long term benefits by adopting our proposed task-based methodology and scheduling tools. Our proposed technology is truly broad-base, and has no domain-specific requirements. It can benefit every NASA application that utilizes this task-based framework. In the short term, these tools will be able improve current hardware utilization and performance. Additionally, productivity will increase as the main focus of the codes will become the underlying science rather than the parallelization and communication needed to implement a high performance code base. Another short term benefit includes improved fault tolerance as a scheduled task-based approach will be able to recover from a scenario in which communication with a node is lost. Long term benefits focus on future-proof algorithms and increased scalability for many of the aforementioned applications. By decoupling the algorithms from the underlying hardware implementation, we will provide a framework that allows for rapid adaptation of new architectures. Scalability will also improve as the proposed scheduler will automatically distribute work to any additional nodes that were added to the cluster.
Our scheduled task-based methodology has applications outside of NASA. Since our proposed technology does not have any domain-based restrictions, it can be utilized by any software system wishing to increase utilization and performance. Math libraries for heterogeneous architectures such as CULA, Arrayfire, NMath, and others will be easily scale and work across platforms. They will also be able to easily scale performance to multiple GPUs in both workstation and cluster environments. Additionally, we will be able to solve problems where there is insufficient memory on the GPU to hold the entire problem. Other non-NASA domains where our technology is immediate applicable include, but is not limited to: molecular dynamics, financial analysis, and graphical ray tracing. All of these domains are easily expressible as interconnected tasks and therefor can prosper for all the aforementioned benefits.","description":"The barrier to entry creating efficient, scalable applications for heterogeneous supercomputing environments is too high. EM Photonics has found that the majority of the coding and debugging time is not spent defining the problem physics but instead on balancing computation between multiple heterogeneous devices, handling communication of data, and managing distributed memory systems. The time spent improving, modifying, or debugging device specific code paths and common code sections could be better spent improving kernel performance or adding new features. To address the problem of separating physical science from computing science, we have been developing a solution that decouples the problem definition from the platform-specific implementation details by expressing algorithms as a series of tasks and data dependencies and handing it off to a managed runtime that efficiently partitions and schedules the problem tasks for execution. We have proven this technique in the field of linear algebra, and in this project we will bring these benefits to mission critical NASA solvers. In this SBIR, we will construct a powerful system that, by virtue of decoupling algorithms from dispatch and execution, will be suited for both current and upcoming computer architectures. Writing a new application will require only an understanding of the algorithm to be implemented, and abstracts away details of heterogeneous resource management and scheduling, thereby removing this responsibility from the scientists that develop this software. Our solution will provide future compatibility, as going to a new version of the same hardware involves no changes and adding new hardware types will require only writing specialized computational kernels. Higher performance is attained because the scheduler will adjust the software's execution based on factors such as the hardware availability and its current performance, as well as the run-time characteristics of the program's execution.","startYear":2015,"startMonth":6,"endYear":2015,"endMonth":12,"statusDescription":"Completed","website":"","program":{"acronym":"SBIR/STTR","active":true,"description":"
The NASA SBIR and STTR programs fund the research, development, and demonstration of innovative technologies that fulfill NASA needs as described in the annual Solicitations and have significant potential for successful commercialization. If you are a small business concern (SBC) with 500 or fewer employees or a non-profit RI such as a university or a research laboratory with ties to an SBC, then NASA encourages you to learn more about the SBIR and STTR programs as a potential source of seed funding for the development of your innovations.
The SBIR and STTR programs have 3 phases:
The SBIR and STTR Phase I contracts last for 6 months with a maximum funding of $125,000, and Phase II contracts last for 24 months with a maximum funding of $750,000 - $1.5 million.
Opportunity for Continued Technology Development Post-Phase II:
The NASA SBIR/STTR Program currently has in place two initiatives for supporting its small business partners past the basic Phase I and Phase II elements of the program that emphasize opportunities for commercialization. Specifically, the NASA SBIR/STTR Program has the Phase II Enhancement (Phase II-E) and Phase II eXpanded (Phase II-X) contract options.
Please review the links below to obtain more information on the SBIR/STTR programs.
Provides an overview of the SBIR and STTR programs as implemented by NASA
Provides access to the annual SBIR/STTR Solicitations containing detailed information on the program eligibility requirements, proposal instructions and research topics and subtopics
Schedule and links for the SBIR/STTR solicitations and selection announcements
Federal and non-Federal sources of assistance for small business
Search our complete archive of awarded project abstracts to learn about what NASA has funded
Still have questions? Visit the program FAQs
","programId":73,"responsibleMd":{"acronym":"STMD","canUserEdit":false,"city":"","external":false,"linkCount":0,"organizationId":4875,"organizationName":"Space Technology Mission Directorate","organizationType":"NASA_Mission_Directorate","naorganization":false,"organizationTypePretty":"NASA Mission Directorate"},"responsibleMdId":4875,"stockImageFileId":36648,"title":"Small Business Innovation Research/Small Business Tech Transfer"},"lastUpdated":"2024-1-10","releaseStatusString":"Released","viewCount":363,"endDateString":"Dec 2015","startDateString":"Jun 2015"},"infoText":"Advanced from another project within the program","infoTextExtra":"Another project within the program (A Scheduling-Based Framework for Efficient Massively Parallel Execution)","dateText":"April 2016"},{"transitionId":67741,"projectId":89603,"transitionDate":"2020-02-01","path":"Closed Out","closeoutDocuments":[{"title":"Final Summary Chart","file":{"fileExtension":"pdf","fileId":307134,"fileName":"1583256157573","fileSize":153992,"objectId":67741,"objectType":{"lkuCodeId":1841,"code":"TRANSITION_FILES","description":"Transition Files","lkuCodeTypeId":182,"lkuCodeType":{"codeType":"OBJECT_TYPE","description":"Object Type"}},"fileSizeString":"150.4 KB"},"transitionId":67741,"fileId":307134}],"infoText":"Closed out","infoTextExtra":"","dateText":"February 2020"}],"primaryImage":{"file":{"fileExtension":"png","fileId":297784,"fileSizeString":"0 Byte"},"id":294317,"description":"Final Summary Chart Image","projectId":89603,"publishedDateString":""},"responsibleMd":{"acronym":"STMD","canUserEdit":false,"city":"","external":false,"linkCount":0,"organizationId":4875,"organizationName":"Space Technology Mission Directorate","organizationType":"NASA_Mission_Directorate","naorganization":false,"organizationTypePretty":"NASA Mission Directorate"},"program":{"acronym":"SBIR/STTR","active":true,"description":"The NASA SBIR and STTR programs fund the research, development, and demonstration of innovative technologies that fulfill NASA needs as described in the annual Solicitations and have significant potential for successful commercialization. If you are a small business concern (SBC) with 500 or fewer employees or a non-profit RI such as a university or a research laboratory with ties to an SBC, then NASA encourages you to learn more about the SBIR and STTR programs as a potential source of seed funding for the development of your innovations.
The SBIR and STTR programs have 3 phases:
The SBIR and STTR Phase I contracts last for 6 months with a maximum funding of $125,000, and Phase II contracts last for 24 months with a maximum funding of $750,000 - $1.5 million.
Opportunity for Continued Technology Development Post-Phase II:
The NASA SBIR/STTR Program currently has in place two initiatives for supporting its small business partners past the basic Phase I and Phase II elements of the program that emphasize opportunities for commercialization. Specifically, the NASA SBIR/STTR Program has the Phase II Enhancement (Phase II-E) and Phase II eXpanded (Phase II-X) contract options.
Please review the links below to obtain more information on the SBIR/STTR programs.
Provides an overview of the SBIR and STTR programs as implemented by NASA
Provides access to the annual SBIR/STTR Solicitations containing detailed information on the program eligibility requirements, proposal instructions and research topics and subtopics
Schedule and links for the SBIR/STTR solicitations and selection announcements
Federal and non-Federal sources of assistance for small business
Search our complete archive of awarded project abstracts to learn about what NASA has funded
Still have questions? Visit the program FAQs
","programId":73,"responsibleMd":{"acronym":"STMD","canUserEdit":false,"city":"","external":false,"linkCount":0,"organizationId":4875,"organizationName":"Space Technology Mission Directorate","organizationType":"NASA_Mission_Directorate","naorganization":false,"organizationTypePretty":"NASA Mission Directorate"},"responsibleMdId":4875,"stockImageFileId":36648,"title":"Small Business Innovation Research/Small Business Tech Transfer"},"leadOrganization":{"canUserEdit":false,"city":"Newark","congressionalDistrict":"Delaware 00","country":{"abbreviation":"US","countryId":236,"name":"United States"},"countryId":236,"external":true,"linkCount":0,"organizationId":2592,"organizationName":"EM Photonics, Inc.","organizationType":"Industry","stateTerritory":{"abbreviation":"DE","country":{"abbreviation":"US","countryId":236,"name":"United States"},"countryId":236,"name":"Delaware","stateTerritoryId":16},"stateTerritoryId":16,"ein":"841314204 ","dunsNumber":"071744143","uei":"CA3NJCKCTN63","naorganization":false,"organizationTypePretty":"Industry"},"supportingOrganizations":[{"acronym":"GSFC","canUserEdit":false,"city":"Greenbelt","country":{"abbreviation":"US","countryId":236,"name":"United States"},"countryId":236,"external":false,"linkCount":0,"organizationId":4947,"organizationName":"Goddard Space Flight Center","organizationType":"NASA_Center","stateTerritory":{"abbreviation":"MD","country":{"abbreviation":"US","countryId":236,"name":"United States"},"countryId":236,"name":"Maryland","stateTerritoryId":3},"stateTerritoryId":3,"naorganization":false,"organizationTypePretty":"NASA Center"}],"statesWithWork":[{"abbreviation":"DE","country":{"abbreviation":"US","countryId":236,"name":"United States"},"countryId":236,"name":"Delaware","stateTerritoryId":16},{"abbreviation":"MD","country":{"abbreviation":"US","countryId":236,"name":"United States"},"countryId":236,"name":"Maryland","stateTerritoryId":3}],"lastUpdated":"2024-1-10","releaseStatusString":"Released","viewCount":549,"endDateString":"Feb 2020","startDateString":"Apr 2016"}}