The proposal addresses the NASA's need to enable scientific discovery and the topic's requirements for: processing large volumes of data, commonly available on the Internet, into useful information; intelligent search of large, distributed data archives and data discovery through searches of heterogeneous data sets and architectures; and search agents that support the use of NASA data. A precondition for data discovery in large distributed data environments, is the accurate and consistent characterization of the data stored in the archives. To accurately and consistently characterize data requires an enterprise policy and process for tagging data with metadata. Our proposal for a Taxonomy Enabled Discovery system (TED) provides a process and technology that assists and automates the process of generating and harvesting metadata. The approach employs a highly innovative taxonomy management platform, based on a hybrid of linguistic, statistical, machine learning, and advanced visualization techniques, enhanced with NASA data, supporting open metadata standards and a grid architecture. We demonstrate the feasibility of our approach in a NASA NTRS OAI-PMH (Open Archives Initiative ? Protocol for Metadata Harvesting) environment and prototype.
More »