Data-intensive computing is not a problem unique to IT companies like Google. Nowadays, infrastructure and data analysis tools to support Data-Intensive-Scalable-Computing (DISC) are becoming competitive advantage even for non-IT companies, so that they can roll out new products and services faster and cheaper. For example, Wal-Mart sells ~300 million items everyday at 6000 stores worldwide. The entire data warehouse to support its business is as large as 4 PB. Scalable and efficient data analysis tool is vital to manage its supply chain, conduct market trend analysis and devise pricing strategy. A simple data-mining 'discovery' from its own dataset, such as `send-formula-coupon-to-diaper-buyer', can be a huge marketing success. Our solution will help non-IT companies replicate Google's success. Many science disciplines in NASA are typically data-intensive in nature. Many of NASA's computing environments are based on technologies 20 years ago, and thus insufficient to support growing data and computation demands. The outcome of our research will help NASA reengineering its data-intensive applications using Google's search as a blueprint, not only from user experience perspective but also from infrastructure and programming perspectives. We are aware that reinvention in this area is a high risk. Therefore, we choose to reuse proven technology and provide our innovative solutions as value-added services/libraries. By using our toolset powered by Google's engine (implemented by open-source software), NASA's scientists can do much more data analysis than just a search over a large dataset.
More »