Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Numberedheadings
number-formatdecimal
skip-headingsHNaN
start-numbering-with3
h1[h1.decimal].
h2[h1.decimal].[h2.upper-latin]
h3
h4
h5
enabledtrue
h6
start-numbering-atH1

Loading Data to Postgres

Data can be loaded into Postgres in a number of ways - this module covers:

  • Tools in QGIS

  • The COPY command in SQL

  • The GDAL ogr2ogr tool from the command line

QGIS

If you have QGIS installed, it provides a GUI-based means of loading data to a Posgres database. There are a few ways of getting data into Postgres using QGIS.

Load data with DB Manager

This approach gives the the greatest degree of control over the conversion from your source format to Postgres.

  1. In QGIS, go to the Browser panel, and navigate to one of the layers in the database - for example general > neighbourhoods

  2. Right-click on the layer in the Browser, and select Export Layer > To File…, then save the layer as a GeoPackage in your Home folder - this creates a sample file-based layer for you to load back into the database

  3. Remove all layers from the Layers panel (right-click > Remove Layer)

  4. Go to Database > DB Manager, and navigate to your database

  5. Click the Import Layer/File button to load the Import vector layer dialog

  6. Using the … button to the right of the Input field, navigate to the file you just created

  7. Under Output table, select the training schema, and give the table an appropriate name

  8. In the Options section, you can leave most of the defaults in place, but note:

    1. Use Primary key to set a primary key for the table - the default is id

    2. The Geometry column holds details of the geometry for each feature - the default name for the column is geom

    3. You can use Target SRID to change the SRID of the data - i.e. to reproject it

    4. Replace destination table (if exists): will overwrite a table with the same name

    5. Convert field names to lower case: is good practice, as otherwise field names need double quotes in SQL statements

    6. Create spatial index: will make your spatial queries run faster

  9. When the table has been loaded successfully, refresh the connection in the Browser and check the table is there - double-click to load it

  10. Navigate the file-based version of the data you created earlier in the Browser, and double-click it to load it into the Layers panel - it should be identical to the database version

Load data using the Browser

The QGIS Browser panel also contains functionality to load and manage data in the database.

  1. In the Layers panel, click on the GeoPackage version of the data you have been working with, and drag it into the Browser panel, dropping it over one of the schemas

  2. The layer will be loaded into the database with the default settings from the DB Manager dialog

  3. Right-click on the new layer to see the options available for layer management in the Browser

Import CSV to Postgres

The Copy command can be used to import Comma Separated Value files (CSV) / text files to and from PostgreSQL. 

  1. In QGIS, create a CSV file we can use to import

    1. In the Browser, navigate to general > llsoa_2011_ew and double-click to load the data into QGIS

    2. Zoom in to a small area so that only about 50 features are visible (this will save time for the export)

    3. Select the layer in the Layers panel and right-click > Export Laery > To File…, set the Format to Comma Separated Value [CSV], under Extent click Map Canvas Extent, and save the layer to your Home folder with a suitable name

  2. In Windows Explorer, navigate to the layer and double-click on it to open it as a spreadsheet - note that there is no geometry column

  3. In pgAdmin, look at the source table and make a note of the datatype of each field (omitting wkb_geometry)

  4. Create a table in the general schema called llsoa_2011_clip, and use the Columns tab of the Create - Table dialog to create columns (apart from wkb_geoemtry) using the field names in the spreadhsheet (in lower case, with no spaces), in the order they appear, and with the same datatype as in the source table

  5. Got to Tools > PSQL Tool in the pgAdmin menu to open a PSQL window

  6. Use the \COPY command to copy the data from the CSV file into the blank table

    1. Paste or type the following code in the PSQL prompt, editor, adjusting the file path/name if necessary, the press Enter

    2. The double backslashes in the file path ensure the file path is recognised. CSV sets the mode, and HEADER ensures the first line is not included as this contains header information

    3. You will get a confirmation, for example COPY 220, indicating that 220 rows were loaded into the table

  7. Look at the table in pgAdmin to see that it was successfully populated

ogr2ogr

The GDAL library contains an ogr2ogr executable which will convert between most spatial formats, including PosgresQL/PostGIS, Shapefile, MapInfo and GeoPackage. It can be run from the command line with appropriate parameters - see the GDAL site for full details.

It’s also possible to use ogr2ogr from QGIS, so in this exercise we will use QGIS to create a command line, then run it in the command prompt.

Load data with ogr2ogr

  1. In QGIS, find the Export to PostgreSQL (available connection) tool in the Processing Toolbox - make sure you are using the one in the GDAL folder

  2. In the dialog:

    1. Set the Database to your database connection

    2. Set Input Layer as the GeoPackage layer you created earlier

    3. Set the Schema and Table to a location in your database

    4. Leave other options as defaults

    5. Scroll to the bottom of the dialog to see the command line that QGIS has created, and copy it

  3. Click Run to load the data

  4. Check in the QGIS Browser that the data has loaded correctly

  5. In AppStream, open the OSGeo4W Shell application - this will give you a command prompt

  6. Copy the command line from the QGIS dialog at the prompt, and press Enter to run it - this will give you the same result as running from QGIS

...

skip-headingsh3, h4
h1[h1.decimal].
start-numbering-with2
h2[h1.decimal].[h2.upper-latin]

GDAL/OGR og2ogr

A versatile command line took which will translate many different formats, including shapefile, .TAB and GML. There are some GUI interfaces that support ogr2ogr (e.g. http://www.ogr2gui.ca), and QGIS uses it as its translation engine.

A full list of supported formats is at http://www.gdal.org/ogr_formats.html.

Desktop GIS

If you are already using desktop GIS with support for PostGIS, it's usually easiest to use this, though bear in mind that some GIS products are more successful at doing this than others.

  • QGIS has several methods for uploading data into PostGIS. Probably the best is DB Manager plugin - this supports loading files as well as existing layers, and provides control over how the data will be saved to the database. However you can also drag and drop in the QGIS Browser panel.

  • MapInfo uses Easy loader.  Data loaded is referenced in a custom map catalogue table, and a connection to it from MapInfo can then be made. Data loaded through this method is not immediately stored, so that other GIS and spatial tools can access it. Then datasets loaded though other methods are not available spatially in MapInfo until they have been referenced in the MapInfo map catalogue table.

  • Cadcorp can be used to load most formats directly into PostGIS, using PostGIS’s standard method of referencing spatial tables. Cadcorp requires data to adhere to strict rules, and it is sometimes difficult in Cadcorp to access data in PostGIS that has been loaded using other tools. Data needed to be used by Cardcorp should therefore be loaded using Cadcorp if possible.

  • ArcGIS does not support loading data into PostGIS, although it is possible using ArcSDE, as PostGIS can be used as an alternative backend database server for ArcSDE. You can view spatial data in PostGIS from ArcGIS version 10 and later using the Add Query Layer tool.

Copy

The Copy command is a command-line operated tool to import/export non-spatial data between PostgreSQL and CSV or other text formats.

  • copy to (export data to file)

  • copy from (import data from file)

Connecting to PostGIS from QGIS

...

Open QGIS

...

Add the PostGIS layer by right clicking on Postgis in the data browser, or by selecting new connection in the add data→PostGIS window.

...

Click Test Connect, then OK if the connection is successful

...

Select the connection and click Connect

...

Select the table you need, then click Add

The table will be added as a layer in the map.

Spatial Tables

The specific characteristics of spatial tables are important. These include:

  • Geometry field and datatype

  • Unique id/primary key

  • Spatial Indexes

Geometry fields

PostGIS expands PostgreSQL by introducing the datatype GEOMETRY, which holds spatial information in the OpenGIS standard: WKB (Well Known Binary) format. WKB is not easily interpreted by eye, so it can be converted to WKT (Well Known Text) to display the geometry in textual form, using the functions ST_AsText() or ST_AsEWKT().

WKB GEOMETRY

WKT GEOMETRY

PostGIS also supports 3D geometry using EWKB and EWKT.

The name of the geometry field can vary, and different methods of loading data use different names - e.g. the_geom, geometry, wkb_geometry. It's recommended to use the same name for all tables, and the datatype must always be geometry.

Type Modifier

Typemod is a type modifier, used to modify data types - for example on variable length character columns, they specify the number of character available on the data type.

description VARCHAR(255)

For geometry columns, PostGIS uses type modifiers to define:

  • The geometry type

  • The coordinate reference system

  • Dimension

These are written as follows:

Geometry_name GEOMETRY([datatype][(Z)], [SRID]

Example 2D: wkb_geometry GEOMETRY(Point, 27700)

Example 3D: wkb_geometry GEOMETRY(PointZ, 27700)

These modifiers specify:

  • the geometry type: Point

  • the dimension:  Z, which includes 3rd dimension

  • the spatial reference identifier (SRID): 27700

Only geometry that meets these criteria can be inserted or updated into the table.

...

Type modifiers

Type modifiers replace constraints in PostGIS 1.5.

Unique Identifier

Every table should have a unique identifier field, with unique non-null values. It will normally be introduced by the importing tool as a serial or integer datatype, and be set as the primary key. However any unique value can also be set as a primary key.

If a primary key does not exist, you will not be able edit data in pgAdmin, and there may be problems viewing and editing in GIS applications.

Geometry Types

PostGIS geometry data types reflect OpenGIS standard definitions for spatial information. The three simplest types of geometry are points, lines and polygons. The diagram below shows how these are represented in WKT format:

For example:

POINT(2 3) LINESTRING(1 2, 3 3, 3 1) POLYGON((1 2, 3 3, 3 1, 1 2))
or

POLYGON((571178 6337246,571178 6402217,598061 6402217,598061 6337246,571178 6337246))

Collection Geometries

Collection geometries are groups of geometries that locally belong together. There is a collection type for each geometry type. GeometryCollections enable different geometry types to be stored together as one logical grouping.

  • Multipoints

  • Mulitlines

  • Multipolygons

  • Geometry Collections

  • Curved Geometries

  • 3D Geometries

Spatial Reference Systems

PostGIS follows EPSG (European Petroleum Survey Group) numberings for spatial reference system identification. 

Commonly used SRIDs (Spatial Reference IDs) used in the UK are:

  • SRID 27700: OSGB 1936 British National Grid

  • SRID 4326: WGS 84 Longitude/Latitude

  • SRID 3857: Web Mercator

Geometry Columns

PostGIS uses the geometry_columns view in the public schema to list metadata, based on and associated with the geometry field in the spatial tables. This view is automatically created when the PostGIS extension is added to the database. When a spatial table is created using a standard process (for example one of the ones listed above), it will automatically appear in the geometry_columns view.

The geometry_columns view is used in clients such as desktop GIS to generate a list of spatial tables without having to examine all tables for geometry types etc, so it acts as a catalogue of spatial data. Some clients will not open a spatial table if it does not appear in the view.

geometry_columns fields

...

Field name

...

Purpose

...

f_table_catalog

...

The name of the database

...

f_table_schema

...

The table schema

...

f_table_name

...

The table name

...

f_geometry_column

...

The name of the geometry column in the table

...

coord_dimension

...

Either 2(D) or 3(D)

...

srid_integer

...

The SRID (e.g 27700)

...

type

...

The geometry type (e.g POINT or GEOMETRY)

Geography Datatype

The geography datatype provides native support for spatial features represented on geographic coordinates - i.e. spherical coordinates expressed in angular units (degrees). The basis for the geometry type is a plane. On the plane the shortest path between two objects is a straight line. This means calculations on geometries can be calculated using cartesian maths and straight line vectors.

The geography type is based on a sphere. In the sphere the shortest path between two objects is an arc. This means that calculations on geographies (e.g. areas, distances, lengths, intersections) must be calculated on the sphere, using more complex maths. For accurate measurements, the calculations must take the actual shape of the earth into account, and as this is more complex, there are few functions defined for the geography type, and only WGS 84 (SRID:4326) is supported.

While it's useful to be aware of the geography type, this course will focus on geometry types. The table below compares the two.

...

Geometry

...

Geography

...

Basis is a plane

...

Basis is a sphere

...

Shortest path between two objects is a straight line

...

Shortest path between two objects is an arc

...

Calculations using simple maths and straight line vectors

...

Calculations (areas, distances, lengths, intersections, etc.) must be calculated on the sphere

...

Supported in all SRS

...

Limited and only lat/long SRSs are supported. 4326 is the default and if no SRID is specified, 4326 is assumed.

Indexes

A database index is a data structure that improves the speed of data retrieval operations on a database table, at the cost of slower writes and increased storage space. Any field in a database can be indexed and more than one field can be involved in a single index. While indexes improve efficiency in search, they slow down update operations.

Normally there should be a spatial index on the geometry field to improve the efficiency of rendering, analysis and searching of spatial data. Most import tools provide an option to create an index when the data is loaded, and it is good practice to do so. The most common type of spatial index is a GIST index, but other indexes include B-Tree type.

Loader for MasterMap 

Astun's Loader is a set of Python scripts which make use of OGR 1.9 to transform GML or KML data to other formats. It is principally use to load Ordnance Survey MasterMap data to PostGreSQL. The source data is pre-processed using Python to make it suitable for loading with OGR, or to add value by deriving attributes.

Although Loader was originally written to load Ordnance Survey OS MasterMap Topographic Layer data in GML/GZ format, it has been used to load other GML and KML data.

Loader is an open source tool, at https://github.com/AstunTechnology/Loader.