The Dynamic DataSource Plugin Framework

Overview

DataSources are used to import non-spatial data into the dedicated iShare database in PostgreSQL and to integrate that data with iShare output in My House and Publisher. This means that data is only ever as current as the last run of a scheduled task. As DataSources expect to retrieve the entirety of a given dataset, it also means that we have no way of retrieving data from sources that expose only a small subset of the data at any one time, e.g. a webservice which only returns data for a particular UPRN.

Dynamic DataSources do not yet work with Publisher.

Requirements

Restrictions

  • Data must be retrieved from the original source whenever the DataSource is queried.
  • This new DataSource type must have entries in the normal DataSources.xml document as per existing types.
  • It must be accessed inside iShare in exactly the same way as all other DataSource types (e.g. the code must fork only where it currently does for other Astun.iShareData.Lib.DataSourceTypes).
  • It should be possible to create a new Dynamic DataSource on a customer site for some custom source of data, like a customer created or implemented webservice.
  • DataSources return iShareDataSets which are essentially wrappers around .NET DataSets.

Overview

The key concept is to have a 'plugin' system whereby many different types of live data requests are presented to iShare in the same way.

The core work is writing the framework for Dynamic DataSource plugins and the interfaces between this framework and the rest of iShare, as well as the configuration UI in Studio. Python has been chosen for the implementation of the framework as it can be written and executed on a customer site with no additional utilities (Python 2.7 is already installed as part of iShare). The DataSource plugins will contain some configuration details in a file and allow the optional combining of field values with templates (e.g. an HTML template), these files will not be edited from Studio.

Dynamic DataSource plugin

A plugin must output .NET DataSet-style XML in response to a request for data using a fieldname and a value to match.

Plugins must have all resources contained within a single directory. These always include a configuration file.

Plugins will have internal names that should be unique among all plugins but, as new plugins can be created in-situ and by end-users that will probably not be aware of all extant plugins, this cannot be absolutely guaranteed.

Each plugin will allow for the definition of a display name that can be set for each instance of the plugin. These will not be guaranteed to be in any way unique.

The plugin configuration should contain a unique field definition and must contain a list of field names with at least one entry.

Each plugin + configuration combination should be uniquely identified so that, for example, the same webservice plugin code can be used twice with different URIs. This should be done by hashing the plugins internal name with those parts of the configuration that functionally distinguish one instance from another. The display name would not normally be a good choice. 

All plugins must have the following same core methods and properties:

  • query(field, value)
    • returns records where field is equal to value
  • preview()
    • returns twenty records, intended for use with Studio
    • there are no expectations as to which twenty records are returned, or in which order they should be displayed
  • test()
    • proves that plugin is working
    • values for the test can be supplied in the plugin configuration
    • should return at least one record in DataSet XML as proof
  • fields
    • newline (\n) separated list of the one or more field names expected in each Dataset XML record
    • this may come from source or could simply be specified in plugin configuration
    • these do not have to correspond to fields in the actual source data
  • id
    • unique identifier of plugin and current configuration (see above)
    • the base Python DynamicDataSource class provides both a standard id property function and a generate_id class method
  • name
    • the default value for this should be specified in the plugin code but be able to be overridden by an instance
  • display_name
    • the default value for this should be specified in the plugin code but be able to be overridden by an instance
    • this is intended to be used as the default value for the display name of the DataSource configuration in iShare Studio
  • unique_field
    • one of the names listed in fields that can be used to uniquely identify records in the DataSet
  • __NAME
    • the class default for name
  • __DISPLAY_NAME
    • the class default for display_name
  • __IDENTITY
    • a class-specific sequence of option names from the Settings section of the configuration file, by default these are combined with name and then passed to the class method generate_id(identity) to create the unique identifier for the plugin configuration

All plugins must currently be implemented as Python classes, this class must be called DataSource.

DynamicDataSource class

This is a class that is intended for use as the base for all Dynamic Data Source plugins. It is not to be instantiated itself.

Class methods
  • to_dataset_xml(name, column_names, data)
    • this is a helper method to create DataSet XML
    • name will be the name of the record-level node in the output XML
    • column_names must be a sequence of field names that will be used as the column-level node name, these must be the key values in the data objects
    • data has to be, or provide, an iterator that returns a mapping object on each iteration (e.g. a list of dicts), each one of these objects corresponds to a record-level node in the output
  • generate_id(identity)
    • a common method for hashing a string (identity), used by id and unlikely to be needed unless implemented an override for that method

Instance methods and properties
  • __init__
    • default class initiation - loads configuration file and applies override values, child classes should normally call this using super
  • _apply_overrides()
    • an 'internal' method that applies options from the Overrides section in the configuration file
    • sets 'internal' property values for the name, the display name and the identity options
  • test()
    • calls query(valuefield) using the values of the appropriate options from the Test section of the configuration file
  • id
    • loads the values of the identity options from the Settings section of the configuration file, combines with name to create a string which is used with generate_id(identify) class method to create a unique identifier for the class instance
  • name
    • returns value of __NAME or its override
  • display_name
    • returns value of __DISPLAY_NAME or its override
  • unique_field
    • returns the value of the unique_field option in the Settings section of the configuration file
NotImplemented

The following are defined as they are core to Dynamic Data Sources but raise NotImplementedError exceptions since they are always plugin specific.

  • query(namevalue)
  • preview()
  • fields

Dynamic DataSource framework

Plugins will be loaded from one or more local 'collection' folders which contain plugin sub-folders. (In future this may be extended to use remote plugin collections from web services.)

The framework will collate plugins from all collections and advertise a single list of unique plugin configurations (using identifier supplied by plugin).

The framework app's primary interface is the command line:

List available datasources

python DynamicDataSource.py list [-o OUTFILE] location [location ...]

Where location is the parent folder of one or more Dynamic DataSource modules. This will output to standard output or a file if -o is used.

The list is a DataSet XML document containing one or more <DataSource> nodes, each with <id>, <path>, <display_name>, <unique_field>, and <fields> (or nothing, if no Dynamic DataSources found). The fields will be a comma-separated list of field names.

Call a datasource

python DynamicDataSource.py call (--query FIELD VALUE | --test | --id | --unique_field | --fields | --name | --display_name) location

Where location is the path to a Dynamic DataSource module (i.e. the folder containing the __init__.py).

iShare Core

DataSource data is retrieved for MyHouse/MyMaps through the iShareDataSet.Query() method. Anything outside of this should ideally not be touched as part of this work. The place where the current types of DataSource are differentiated is in the InternalQuery method so this should also be the place to do it for the Dynamic DataSource.  

The plugin framework app will live alongside the Workflow and data sync apps and its path should be configured in the same way.

If a plugin query fails, then a message can be returned in place of the expected data. This could possibly be done by returning a DataSet with one DataTable containing one field with the message as content and an undisplayable display name (e.g. "__"). 

DataSource XML

A possible example DataSource node configuring a Dynamic DataSource:

    <DataSource>
      <Name>[Plugin.ID]</Name>
      <DisplayName>[Plugin.DisplayName]</DisplayName>
      <HelpInfo show="yes">This data is currently unavailable</HelpInfo>
      <QueryField>[Plugin.UniqueField?]</QueryField>
      <FilterQueryOrderField></FilterQueryOrderField>
      <Type Internal="False" DSType="DS_Dynamic">
        <DSN/>
        <SQL/>
      </Type>
      <Currency>0</Currency>
      <Results>
        <Fields FieldsOrder="[Plugin.UniqueField],[Plugin.Field1],[Plugin.Field2],[Plugin.Field3],[Plugin.Field4]">
          <Field>[Plugin.UniqueField]</Field>
          <Field>[Plugin.Field1]</Field>
          <Field>[Plugin.Field2]</Field>
          <Field>[Plugin.Field3]</Field>
          <Field>[Plugin.Field4]</Field>
        </Fields>
        <LookupDatasources />
      </Results>
    </DataSource>

The points to note compared to existing types are:

  • the DS_Type attribute of the Type node is present and has the new value of "DS_Dynamic
  • the DSN and SQL nodes are empty and there are no other children of Type
  • the Currency node has a value of "0" signifying that the data is always retrieved from the original source
  • HelpInfo has been repurposed (again!) to contain the message to show if the plugin query fails, it has a new show attribute which indicates whether the message should be shown to the user

Studio Configuration

The main aim for configuration in Studio is to require the least amount of user input possible. To this end the Studio should set the Display Name to the value supplied by the plugin (but not enforce this, allow the user to change it) and default the value of the query field list to the plugin-supplied unique field hint. The "Name" field is set by the plugin and is always set to the plugins identifier so should not be editable by the user. See Configure a Dynamic Datasource.