The Data Locator project is basically a search engine for finding GFS meterological data here at ESRL / GSD. Some of this data is found on our public file system, some on our mass store, and some is on NCDC's Nomads HTTP server and indexed on its Nomads Thredds server. The metadata that can be extracted from those four locations is inserted into our MySQL database, making it searchable via a web client or web service. Here is what the data flow looks like:
GFS Data Flow Diagram

The Data Locator software consists of three main software components.
- Four Java Spider Programs
- Nomads THREDDS Spider creates an http connection to a NCDC Nomads THREDDS catalog, and walks through the catalog XML, extracting relevant metadata and inserting this information into the MySQL database.
- Nomads HTTP Spider scans all the files on the Nomads HTTP server for files that match the spec (e.g. *.sanl, *.sfcanl) and extracts what information it can out of the filename (e.g. start Z in hours) and the file information (e.g. file size in MB), inserting this information into the MySQL database.
- Public File System Spider scans the dirs/files starting at a top level directory and extracts what information it can out of the filename (e.g. start Z in hours) and the system file information (e.g. file size in MB), inserting this information into the MySQL database.
- Mass Store Spider scans the dirs/files starting at a top level directory of the mass store and extracts what information it can out of the filename (e.g. start Z in hours) and the system file information (e.g. file size in MB), inserting this information into the MySQL database.
- Data Locator Web Service - this Java web service accepts calls from client programs (or other web services) and searches the mySQL database (metadata) for matches. It then returns a list of matching file paths or URLs to the requested data. For details on how to invoke this web service, look here.
- Data Locator Web Client - this HTML and JSP based web application provides a web form for specifying search criteria (e.g. the catalog(s) to search, the date(s) and time(s)), and then invokes the Data Locator Web Service to find meterological datasets that match the search criteria. It displays the matching datasets as links on the web page.
Web Client Process Diagram

Software Flow Diagram

Libraries and Software Used
|
| Tomcat 6 |
MySQL 5 |
| Axis 2 |
THREDDS Data Server 3.1701 |
| Java 6 |
JDOM |
| SQLExecutor 1.41 |
commons-httpclient-3.1.jar |
| NCDC Nomads Server |
|
Software Documentation
|
| Source code Javadocs |
Ant build.xml |
| |
|
|
|