Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Introduction

Metadata Crawler (aka “Crawler”) is an open source tool designed to automate the discovery and creation of metadata records. If configured to read from a PostgreSQL database such as your SDW, it will create a metadata record in Gemini 2.3 format for each spatial table that it finds.

...

  • Calculated by Crawler: auto-calculated values for each dataset.

  • Spreadsheet: individual values for each metadata record. See below for more details.

  • Default: place-holder values for all records. You can change these later on a per-record basis if you need to.

Spreadsheets

To enable you to complete the manual fields listed in the table above, we will auto-generate two spreadsheets for you, populated with each metadata record title and the fields you need to complete. We have split the fields across two spreadsheets to make data entry easier, one spreadsheet will contain contact details and organisational responsibility, while the other will contain the abstracts and other remaining fields. Where the text is a controlled value, such as for the maintenance update frequency, lookups have been provided.

Workflow

  1. We derive the spreadsheets based on the spatial tables within your SDW, and send them to you for completion.

  2. Crawler runs in your VPC, and creates metadata records as xml files in an AWS S3 bucket.

    1. At this point, the fields populated are those calculated by crawlerCrawler, and the default ones.

  3. GeoNetwork harvests the records from the S3 bucket and assigns them to your metadata portal.

  4. You send the spreadsheets as a CSV email attachments attachment to an email address we provide, associated with a second AWS S3 bucket.

  5. Python scripts on the GeoNetwork server extract the CSV files from the emails, and use the values within to update the metadata records with the individual values.

  6. You check the records in your metadata portal and make any further changes that you need to.

Automating the

...

Crawler Workflow

After your metadata catalog has been pre-populated with metadata, Crawler can be run as a scheduled task to pick up new tables or changes to existing ones (such as a bounding box change).

For subsequent updates, crawler Crawler will pick up new tables and changes to existing tables, but it can’t deal with deletions (metadata records on deleted tables should be explicitly retired in GeoNetwork).

...