Skip Navigation
Advanced Information Systems Technology

Mining and Utilizing Dataset Relevancy from Oceanographic Dataset Metadata, Usage Metrics, and User Feedback to Improve Data Discovery and Access (MUDROD)

Completed Technology Project

Project Description

Mining and Utilizing Dataset Relevancy from Oceanographic Dataset Metadata, Usage Metrics, and User Feedback to Improve Data Discovery and Access

We propose to mine and utilize the combination of Earth Science dataset, metadata with usage metrics and user feedback to objectively extract relevance for improved data discovery and access across a NASA Distributed Active Archive Center (DAAC) and other data centers. As a point of reference, the Physical Oceanographic Distributed Active Archive Center (PO.DAAC) aims to provide datasets to facilitate scientists in selecting Earth observation data that fit better their needs in various aspects of Physical Oceanography. The TRL 5 technology of data relevance mining, developed by George Mason University (GMU), NASA, and U.S. Geological Survey (USGS) to support the Geosearch operation and contributed as open source through GeoNetwork, will be improved and tested within the PO.DAAC's metadata-centric discovery system as a TRL 7 technology upon completion of the project. This project will focus on the following objectives and activities: ' Integrating and interfacing the data relevance mining and utilizing the data relevance mining system to include the functionality of a) dataset relevance reasoning based on Jena, an open source semantic reasoning engine, b) dataset similarity calculation, c) recommendations based on dataset metadata attributes and user workflow patterns, and d) ranking results based on similarity between user search terms and dataset usage contexts. ' Leveraging the PO.DAAC data science expertise and user communities to a) capture the ocean science data context and record relevant dataset relevance metrics as triple stores, b) analyze and mine user search and download patterns, c) test the developed system in an experimental environment, d) integrate the system into the PO.DAAC testbed and test the feasibility of integration for open usage and feedback. ' Laying the groundwork for an objective mining and extraction service for data relevance with other data search and discovery systems, such as ECHO, GEOSS clearinghouse, and, for data sharing across NASA and non-NASA data systems. The proposed technology has the potential to enhance the NASA Earth Science data discovery experience by more efficiently and objectively providing scientists with the ability to discover and select the datasets most relevant to their scope of interest. TRL: Existing Data Relevance Mining and Usage/TRL5. We expect a TRL 7 exit of the technology through the two year research Keywords: PO.DAAC, Data Relevance, Mining, Reasoning, Ranking, Recommendation

More »

Primary U.S. Work Locations and Key Partners

Light bulb

Suggest an Edit

Recommend changes and additions to this project record.