Large scale numerical simulations, as typified by climate models, space weather models, and the like, typically involve non-linear governing equations in discretized form, subject to initial and/or boundary conditions. Large scale simulations may be employed in a coupled manner, with the output of one simulation providing input data to another. Simulation execution may require significant 'wall clock' time to complete, resources such as memory and CPU, and may involve components that are networked and the need for resources such as input data files or temporary local storage of intermediate data products. With collaboratories and the increase in interdisciplinary and multi-investigator scientific projects, there is an increase in distributed, networked, and coupled scientific simulations. Moreover, problem complexity may require that multiple sets of parameters in the problem space be investigated, thus necessitating multiple simulation runs. With simulation runs extending in time, involving networked components, and networked resource usage, setting-up and monitoring these runs is non-trivial and increasingly time intensive. Such activity can waste a researcher's time; yet the simulation runs must be set-up and then monitored, as crashes, missing components, permission problems, network problems, etc., do occur. Our innovation is a self-regulating, autonomic, agent-based framework that can manage simulation runs.
More »