Best Practice for Data Workflows / Traffic in iShare Enterprise Environment
There are significant changes in iShare v6 use of databases.
The Spatial Data Warehouse (SDW) and the iShare GIS application database ('iShareData') from iShare v5 have been combined into the ‘sdw’ database. This hosts all data accessed by iShare GIS, QGIS, GeoServer plus the iShare GIS gazetteer.
The iShare Maps application database ('iShareData') has been renamed as the ‘maps’ database. This hosts all data used by iShare Maps.
The security model has been strengthened meaning there are several new database roles used for the transfer of data, application data access and database administration.
A member of the sdw_admin group is used to conduct administration of the sdw database such as creating schemas and roles used by QGIS connections.
The isharedata_gis role, a member of sdw_admin, is used to transfer data & create schemas from the Studio ETL to the sdw database for iShare GIS.
The isharewebservice_gis role is used by iShare GIS to consume data from the sdw database
The isharedata_maps role is used to administrate & transfer data to the maps database for iShare Maps.
The isharewebservice_maps role is used by iShare Maps to consume data from the maps database.
If you have trouble identifying these roles in iShare v6 an Astun consultant will be able to confirm the details of these roles to you.
Studio ETL Activity
Studio ETL is the location for the bulk of a customer’s scheduled workflow activity. There might be some exceptions to this such as the download of external data or the scheduled operation of My Alerts.
Studio ETL should be configured to access a mail server in case of workflow failure. The Studio software on the GIS and Map servers is configured to use Amazon’s Simple Email Service. To confirm Studio has been configured to use a mail server: Select Workflow from within Studio and select Edit > Notification Setup and the mail server details will be visible. Select ‘Test’ for Studio to send a test email.
Scheduling
Studio ETL uses standard Windows Task Scheduler to organise the scheduling of jobs that are defined in the Studio ETL workflow.
When creating a schedule for a job bear in mind that there is an option to keep the schedule in line with UTC rather than GMT / BST. To switch to UTC edit the task in Task Scheduler and edit the trigger. There is an ‘Synchronize across time zones’ option which must be checked to align with UTC.
Servers that Astun host receive a backup every night. This can occur between midnight and 2am UTC. Avoid this time period when scheduling as jobs can not be guaranteed to complete during this time.
Database Connections
In a typical iShare Enterprise scenario the Studio ETL will be required to transfer data to both the sdw database for iShare GIS and the maps database for iShare Maps. To complete these tasks the Studio ETL must have the following connections defined under Workflow:
A connection to the sdw database using isharedata_gis.
A connection to the maps database using isharedata_maps.
Data used by iShare GIS and iShare Maps
In cases where the same data set is needed for both iShare GIS and iShare Maps, we recommend that the data is duplicated into both the Spatial Data Warehouse, for iShare GIS consumption, and the maps database for iShare Maps. We don’t recommend iShare Maps reading data directly from the Spatial Data Warehouse for security reasons.
If data needs to be manipulated before being used by iShare Maps this can be done in the Spatial Data Warehouse and then transferred from the Spatial Data Warehouse to the maps database through scheduling.
Spatial Data Transfers
Spatial Data Transfer tasks have an option to Skip Failures. In iShare v6.0.5 and below this is enabled by default. Unless it is necessary to transfer remaining data after an issue with an individual feature it is strongly advised to disable this option. Data transfers with Skip Failures disabled will be significantly faster.
Non-Spatial Data Transfers
In previous iterations of iShare this would be handled by external Data Share Connections. While that is still possible we’d recommend that external Data Share Connections are replaced by Spatial Data Transformation tasks with the Geometry option set to None.
Data Share Connections preserve the source column names in PostgreSQL. However Spatial Data Transfers will, by default, convert columns to lower case with underscores. If the column names of the source data needs to be preserved check the Preserve Case option.
A standard Data Share Connection can only transfer data to the identified ‘Data Share Connection’. This is typically defined as the Spatial Data Warehouse. If a non-spatial dataset is required in both the Spatial Data Warehouse and the maps database an additional task is required to duplicate the data from one to the other. When using Spatial Data Transformation tasks there are no restrictions on the destination of the transfer and these additional steps can be avoided.
Spatial Data Transfer tasks handle a large number of data sources and data types including non-spatial RSS feeds and Microsoft Excel documents. See https://gdal.org/en/latest/drivers/vector/ for the range of drivers supported.
Data Share Connection Synchronisations create backups / last updated copies of the data. These backups take up space and are not deleted automatically.
A Data Share Connection should only be used for transferring data when it is not possible to use a Spatial Data Transformation task. Data Share Connections still hold their other function which is to provide access to data via the iShare API. This is commonly seen in iShare Maps for instance returning the nearest street address for a given coordinate pair.
Administration of the SDW
Administration of the SDW should occur through Studio ETL or a PostgreSQL client such as PGAdmin 4 or DBeaver.
Administration of the SDW should be conducted by a member of the sdw_admin database group. Each customer should be provided with at least one sdw_admin account. The sdw_admin account can be used to create additional sdw_admin accounts if the responsibility of SDW administration is to be shared.
Running SQL Tasks in Studio ETL
Studio ETL’s unassigned tasks are a useful place to store repeatable tasks such as creating schemas or new users. To run any of the following SQL commands from Studio ETL create a new SQL by right-clicking on Workflow > Jobs > Unassigned Tasks and selecting New Task. Choose ‘Stored Procedure’ from the Task Type dialog.
Within the task dialog, select '- Workflow ' from the Function drop-down and then select wkf_executecmd as the function. Enter the SQL in the parameter text box.
Creation of Users
These are database roles that individual users will use to connect a desktop GIS, such as QGIS, directly to the Spatial Data Warehouse.
In pgAdmin 4 > Servers > Choose a connection which uses an sdw_admin role.
In a query window run the following, changing user name and password appropriately.
SELECT astun.sdw_new_user('<new_user', '<new_password>', false); GRANT CONNECT ON DATABASE sdw TO <new_user>;
Creation of SDW Admin Users
To create a SDW Admin user it is the same method as creating a standard user - just change the third parameter, ‘makeadmin’, to true. If you also want the user to be able to pass on the ability to grant connect to other users when you need to add the WITH GRANT OPTION e.g.
SELECT astun.sdw_new_user('<new_admin_user', '<new_password>', true);
GRANT CONNECT ON DATABASE sdw TO <new_admin_user> WITH GRANT OPTION; Deleting Users
To delete a user they must first have their permission to connect to databases revoked and then the role can be dropped.
REVOKE CONNECT ON DATABASE sdw FROM <retired_user>;
DROP ROLE <retired_user>;Creation of Schemas in the SDW database
In pgAdmin 4 > Servers > Choose a connection which uses a sdw_admin role.
In a query window run the following. The first parameter is the schema name, the second is the ‘staff access’ parameter. If ‘staff access’ is set to true then all connections to the sdw will be able to access the contents of the schema. If set to false then the schema’s contents are inaccessible until a role inherits access from either the schema viewer or schema editor role.
SELECT astun.sdw_new_schema('<new_schema>', true); GRANT <new_schema>_owner TO editor;Apply the following to allow iShare GIS to access the contents of the schema.
GRANT <new_schema>_viewer TO ishare_reader;If the schema contains data that will be edited through iShare GIS apply the following statement.
GRANT <new_schema>_editor TO isharewebservice_gis;
Views using Multiple Schemas
If a view in the Spatial Data Warehouse uses tables from multiple schemas, run the following SQL to allow the view to access the source table.
GRANT <schema_source>_viewer TO <schema_with_view>_owner;Administration of the maps Database
Administration of the maps database can occur through Studio ETL or a PostgreSQL client such as PGAdmin 4 or DBeaver.
Administration of the maps database should be conducted by the isharedata_maps role.
Creation of Schemas in the maps Database
The creation of schemas in the maps database is conducted through a single SQL task.
In PGAdmin4 > Servers > Choose a connection that uses the isharedata_maps role.
Create a new Query window and enter the following command.
SELECT ishare.wkf_create_schema('<new_schema>', 'isharewebservice_maps');
In sdw, the create function is astun.sdw_new_schema. In maps, the create function is ishare.wkf_create_schema.
Data Access by iShare GIS
All Layers & Internal Data Share Connections in iShare GIS should use a connection to the Spatial Data Warehouse that uses the isharewebservice_gis role.
Data Access by iShare Maps
All Layers & Internal Data Share Connections in iShare Maps should use a connection to the maps database that uses the isharewebservice_maps role.
Any External Data Share Connections in iShare Maps should use the isharedata_maps role when connecting to the maps database. This can be determined by selecting the Data Share Connections node in Studio Maps and inspecting the configuration of the Admin connection.
Test Applications & Databases
The sdw_test database is the application database for iShare GIS Test. The maps_test database is the application database for iShare Maps Test.
As Spatial Data Warehouse database sizes can be quite large for our customers the recommendation is that the iShare GIS Test application uses the sdw database for layers. To do this there must be a Workflow Connection defined in Studio for iShare GIS Test that refers to the sdw database. It is still possible to load data into sdw_test database for testing before deploying to the sdw database but for the majority of published datasets it is unnecessary to duplicate the data in the sdw_test database.
Database sizes for iShare Maps tend to be a lot smaller than the SDW so for iShare Maps this is less important.
Traffic
Access to iShare Maps, a public website, should be available through the Internet, using the HTTPS protocol on port 443. A public DNS entry is required.
Access to iShare GIS, the internal application, should be routed through the VPN where possible, using the HTTPS protocol on port 443. Typically iShare GIS is authenticated via a SAML compliant Single Sign-On mechanism such as Microsoft’s Entra ID. If it is not possible to use iShare GIS via a VPN then agreed whitelist range can be agreed.
Studio ETL and Desktop clients such as QGIS should access the Spatial Data Warehouse via a VPN. If a VPN is not possible then the Spatial Data Warehouse should be configured to enforce encrypted connections.
It is recommended that an internal DNS entry be created for the Spatial Data Warehouse such as sdw.council.gov.uk. All clients that use direct database connections such as Studio ETL and all QGIS users can then use the internal DNS rather than an IP address. If the IP address were to ever change any connections will still work providing a correction is made to the DNS.