CS97 — GIS tools

GDAL | C++ GDAL Wrapper
GDAL is a toolkit for reading, writing, and converting a number of raster and vector formats. I installed GDAL 1.3.2 on the lab machines. GDAL has a number of utilities, but the three I use most and the three you will likely use most are gdalinfo, gdal_translate, and gdalwarp. I'll show some basic usage of each tool below.
For these examples, I'll use a sample data set of Delaware County, PA downloaded from the USGS Seamless Data Distribution System. This data is 3 arc second data (about 90 meter resolution) from the Shuttle Radar Topography Mission (SRTM).

Using gdalinfo

sample data delco.zip
> unzip delco.zip
> gdalinfo delco/delco
Driver: AIG/Arc/Info Binary Grid
Size is 504, 361
Coordinate System is:
        SPHEROID["WGS 84",6378137,298.257223563,
Origin = (-75.628333,40.095833)
Pixel Size = (0.00083333,-0.00083333)
Corner Coordinates:
Upper Left  ( -75.6283333,  40.0958333) ( 75d37'42.00"W, 40d 5'45.00"N)
Lower Left  ( -75.6283333,  39.7950000) ( 75d37'42.00"W, 39d47'42.00"N)
Upper Right ( -75.2083333,  40.0958333) ( 75d12'30.00"W, 40d 5'45.00"N)
Lower Right ( -75.2083333,  39.7950000) ( 75d12'30.00"W, 39d47'42.00"N)
Center      ( -75.4183333,  39.9454167) ( 75d25'6.00"W, 39d56'43.50"N)
Band 1 Block=504x4 Type=Float32, ColorInterp=Undefined
  Min=-61.000 Max=227.000
  NoData Value=-3.40282e+38

A couple of things to point out... Size is given in columns, rows, so this is a 504 columns x 361 rows data set. The Driver field tells you what kind of raster format this is. Arc/Info Binary grid is pretty common. seamless.usgs.gov gives you only these file formats. The Coordinate system data you can ignore temporarily. Eventually, we will need to project or transform some data sets from one projection to another, and this coordinate system info is useful for that. Note however that UNIT is set to degree meaning this data is unprojected and pixel coordinates represent lat/long coordinates. The Pixel Size is 0.0008333 degrees or 3 arc-seconds. This roughly corresponds to 90 meter pixels. The corner coordinates of the data are also given in projected and lat/long coordinates. Since the projection for this data is lat/long, the numbers are the same. The min and max elevations are also given. This elevation is stated in meters. How do I know? Because the USGS site tells me. It really should be included in the data file somewhere, perhaps as an additional readme, but it isn't.

A word about nodata:

Raster formats are rectangular arrays of data, but sometimes we don't have a measured value for every cell in the array, or we have irregularly shaped data that doesn't fill up a rectangular bounding box. In these cases, we can write a special NoData value for that pixel. Some data formats will report the nodata value through gdalinfo, while some do not. Conceptually if you are working with data that has nodata values, you should ignore them. They are not real data, so don't try to do things like average a real data value with a no data value. You can pollute you real data this way. When designing your own tools, you should decide if nodata will be a problem, and if you need to know the nodata value, your programs should have a way to ask the user what the nodata value is if you can not infer it directly from your data.

Using gdal_translate

gdalinfo gives you some basic stats about a data set, but that may not be very useful if you actually want to use the data. GRASS has a utility called r.in.gdal that can import this data into a GRASS database and then you can view it, but gdal_translate offers some more lightweight processing. For example, you can convert the data to a PNG file

gdal_translate -ot Byte -of PNG delco/delco delco.png
delaware county SRTM

Note the dark spot slightly southwest of the center of the image. Can you find this feature in google maps? Can you find Springton Lake in the image above? At 90 meter resolution and a greyscale png, it will be hard to find any discernible Swarthmore feature besides Crum creek in this image.

A more useful format for you as a programmer, might be to convert the data to an ASCII grid. Gdal_translate can do that too.

gdal_translate -ot Float32 -of AAIGrid delco/delco delco.txt

This generates and ASCII file that you can open in an editor (for small files like this) or write a simple driver to read in ASCII input. The file has a six line header as follows:

head -6 delco.txt

ncols        504
nrows        361
xllcorner    -75.628333345281
yllcorner    39.795000003346
cellsize     0.000833333333
NODATA_value -3.4028234663852885981e+38
Followed by the pixel values in row major order from left to right.

Using gdalwarp

lat/long is a pretty bad way of showing geographic data. If you used unprojected data for the whole world, Alaska would look really weird. Even the conterminous US looks pretty wacky unprojected. Plus if I tell you I moved 0.25 degrees west and .05 degrees north and started at 40 degrees north, how far did I move? You have no idea, do you? That's why we like to project data so that we can measure distances in feet or meters. You should work with projected data, but seamless gives you unprojected. No problem, we'll re-project with gdalwarp. There are many many projections to choose from. For simplicity, I recommend using Universal Transverse Mercator or UTM. Some of the North Carolina data I have is in NC state plane feet, which is OK too. To successfully warp, you will need to know your UTM zone (number not letter) for your data. You can find out that info on a chart of UTM zones. Eastern PA is pretty much in the center of zone 18. Enough chat, more warp!

gdalwarp -tps -rcs -t_srs "+proj=utm +zone=18" delco/delco delco_utm.gtiff

By default, gdalwarp converts the data to a GeoTIFF output format. You can convert the output to a PNG or ASCII grid using gdal_translate

gdal_translate -ot Byte -of PNG delco_utm.gtiff delco_utm.png

Delaware County, PA UTM

Note that the data is more square because degrees of longitude up at 40 North are noticeably smaller in raw meters than a degree of longitude. You can measure distances on the UTM map using Euclidean distance, but on a lat/long map, you need to use spherical distance formulas. Yecch, math is hard.

I said the pixel size in the unprojected data was roughly 90 meters. With the projected data, you can find out the real pixel size in meters using gdalinfo. What is it?

C++ GDAL Wrapper
I have some C++ code that I wrote that is a simple interface to GDAL that allows you to open any raster readable in GDAL and write a raster in ESRI EHdr format which GDAL and GRASS can both understand. This tool is handy for reading a number of raster formats and getting the data into a format useable to you (an array of pixels, a matrix, etc..) without having to export data to ASCII all the time and writing your own ASCII driver. If you are interested, let me know.