Mosaic thousands of raster images

Requirement (Problem)

You have 1000's of raster images and you want to create a single mosaic (a complete stitched map) of those images.

Image courtesy of Stack Exchange


Example

A customer had over 3800 ECW images of their area which needed to be viewed at a variety of scales.  The key difference here is the need to view at different scales - if that requirement isn't needed then a simple Raster Tile Index can be built. 

Why?

To build a single image in ECW format from 3800 ECW files, where the average filesize is between 1-2 Mb would result in an ECW file approximately 7.5 Gb.  This guesstimate is taken from 3800*2/1024) - and the assumption that the new overviews that are built within the ECW file would be made up from the spare space when up to 2Gb for each file.  Regardless of the output size the requirements would be for a an ERDAS or MrSID license to write a file over 500Mb (£1000,s) 

Solution

Potential options include using PostGIS Raster functionality (Database) or GDAL tools (file based)

The following uses GDAL tools to build a Virtual Raster dataset (similar to a tileindex via gdalbuildvrt) of all the files and then an overview image (via gdaladdo) that software that is GDAL enabled (MapServer, GeoServer, QGIS etc) can access - alternative programs would best be served via a WMS feed from MapServer or GeoServer.

Step-by-step guide

The following steps were required for the customer project and will serve you well to follow

  1. Check all files are valid
  2. Create Virtual Dataset (VRT)
  3. Create Overviews
  4. Load in mapping software (MapServer example)

GDAL with ECW support

We're presuming that you have GDAL installed with ECW support - the ECW SDK may need to be downloaded. On Linux you'll need to recompile GDAL for the support, on Windows you'll need to move the ECW plugin into the GDAL directory.

Check all files are valid

Before embarking on the build of the Overviews, which can take hours to process, it is good to double check that all files are valid.  We found issues  (possible file corruption) with 2 ECWs from the 3800 which although a very small amount bombed out (stopped) the Overviews build.  Here we're going to run a script against ALL the ECW files and try translating them to GeoTIFF.  This example is a Linux bash script but an equivalent windows batch file or powershell script wouldn't be difficult to write.

Script and check all Rasters
# loop through all the ECW (*.ecw) in the script directory (paths could easily be added i.e. /iShareData/Workshop/Raster/Aerial/*.ecw)
for i in *.ecw; do
    # Process $i and translate to a new file of the same name with a .tif extension
    gdal_translate $i $i.tif
	# now delete the file
    rm $i.tif
done

The use of gdal_translate essential as it will do a full read of the files via GDAL and will detect any corruption.  gdalinfo will only scan the header of the raster image and not do a full read.

Execute from the command line
 bash at_check_ecw_valid.sh 2> ecw_errors.txt | grep ERROR
  • bash - to execute the script
  • at_check_ecw_valid.sh - script name
  • 2> - only pipe through the errors to the output file
  • ecw_errors.txt - output file
  • | grep ERROR - linux to only show the error output on screen that contains ERROR (possibly not needed with 2> where 2 specifies only errors)

Create Virtual Dataset (VRT)

The VRT is an XML file, similar to a TileIndex, that references all the ECW files holding the extent, projection etc without processing or translating the files.

Execute from command line
gdalbuildvrt all_ecw.vrt *.ecw -a_srs "EPSG:27700"
  • gdalbuildvrt - The GDAL utility to exefute

  • all_ecw.vrt - the output vrt file name
  • *.ecw - the input raster images
  • -a_srs - force an output projection
  • "EPSG:27700" - British National Grid projection

Note regarding reproject

the reprojection to EPSG:27700 (British National Grid - BNG) was needed as our data was supplied with BNG coordinate references but without a projection system. If all your images are in the same projection then this can be ignored - to check a similar script to the above gdal_translate one would be written having a grep command to see if NOT British National Grid

Create Overviews

TIFF supports the creation of “overviews” within the file, which is basically a down sampled version of the raster data suitable for use at lower resolutions.

Execute from the command line
gdaladdo -ro --config COMPRESS_OVERVIEW JPEG --config PHOTOMETRIC_OVERVIEW YCBCR --config INTERLEAVE_OVERVIEW PIXEL --config BIGTIFF_OVERVIEW YES _all_ecw.vrt 2 4 8 16
  • gdaladdo - GDAL utility to execute
  • -ro -  option to force the generation of external overview
  • --config - various config parameters
    • COMPRESS_OVERVIEW JPEG - the compressed overviews to be stored in JPEG format
    • PHOTOMETRIC_OVERVIEW YCBCR - the photometric interpretation
    • INTERLEAVE_OVERVIEW PIXEL - interleave by PIXEL not BAND
    • BIGTIFF_OVERVIEW YES - essential if you think the output file will be greater than 4Gb (TIFF has a limit of 4Gb)
  • _all_ecw.vrt - input vrt file
  • 2 4 8 16 - overview samples

Reference

Referenced from following blog entry at Linfiniti and MapServer and GeoServer documentation from Boundless

Load in mapping software (MapServer example)

The following example shows a layer snippet using the VRT and is referenced in the same way as a standard Raster image or Raster tileindex.  GDAL does the magic of looking for the relevant .ovr file or the source ECW images.

MapServer Layer snippet
	LAYER 
		NAME "aerial1999"
		STATUS DEFAULT
		TYPE RASTER
		METADATA
		  "wms_title" 			"Aerial Photography 1999-2000"
		  "wms_srs"             "EPSG:27700"
		  "wms_name"            "aerial1999"
		  "wms_server_version"  "1.1.1"
		  "wms_format"          "image/png"
		END
		DATA "Raster\Aerial_Photographs_1999-2000\_all_ecw.vrt"
	END