Load large CSV dataset into PostgreSQL

Overview

When you have a large CSV file that you wish to load into the Astun Spatial Data Warehouse (SDW) then the best option is to use Psql to run a batch file to load the data as using a Workflow Spatial Data Transformation Task doesn't work for large files. 

In terms of size of CSV we are not sure what the optimum is before moving to Psql copy but it tends to be used with larger datasets as it's faster and simpler. 

Step-by-step guide

  • Open Notepad++ and enter the following replacing the relevant information to match your requirement.

  • In our scenario we have the latest Bin Collection data bins_2020.csv that want to upload to the public schema of the SDW. Note: If you wish to upload the data to a different schema then you would need to prefix the table with the schema name e.g. schema.tablename.
  • First we need to drop the table if it already exists
  • Then create the table with the same number of fields that exist in the CSV file.
  • Now we want to copy the data from bins_2020.csv file, which is comma delimited, into the table.
  • Save this with a .sql extension and call it say import_bins_2020.sql.

  • Next open iShare Studio and under Workflow expand the Jobs node.
  • Select Unassigned Tasks and create a New Task > Program Task
  • For the Program entry browse to the PostgreSQL interactive terminal executable which you will find in the installation folder for PostgreSQL e.g. C:\Program Files\PostgreSQL\n.n\bin\psql.exe where n.n is the version of Postgres you have installed.
  • For the Additional Parameters add the following changing the database name (-d) and file path (-f) as required.

  • Run the Task to make sure it works
  • Save the Task if you want to keep it for future use.