Geographic Information Systems (GIS) often store representations of continuous surfaces representing physical properties such as elevation, temperature, water depth, etc., which can be approximated to a reasonable level of accuracy by smooth, single valued functions of the form z=F(x,y). Often such a surface is stored as a regular grid—or raster—of data values. When physically measuring the value of the surface, it is often impossible to collect data for every point on the surface. Instead, scientists collect a sample S of N spot measurements of the form (x,y,z) and reconstruct the continuous surface z=F(x,y) by interpolating the points in S.
We are particularly interested in data sets collected using lidar—a remote sensing method that collects highly accurate and dense point samples that measure the height of a terrain. Given a set S of N lidar points, we want to construct a gridded digital elevation model (DEM) of the terrain represented by S. Because lidar point sets are dense, and thousands of points can be collected in seconds, lidar data sets often consist of millions or billions of points. Processing such large point sets poses a number of computational challenges.
![]() |
![]() |
Input lidar points shown in red (left) and corresponding DEM (right) with color representing height. For clarity, only one in every one hundred lidar points are shown at left. Region shown is a small area outside of Hillsborough, NC along the Eno river.
For extremely large point sets common in lidar applications, not all input points can simultaneously reside in the main memory of a computer and must therefore reside on larger but much slower disks. In this case the transfer of data between disk and main memory (also called I/O), rather than computation, often becomes the performance bottleneck. Therefore we must consider I/O-efficient methods for constructing the segmentation, finding points in neighboring segments, and building the grid DEM
DEM of Neuse River basin in North Carolina derived from 390 million lidar points