Small benchmark with PostGIS and MS SQL-Server Spatial
We're doing a small performance benchmark with PostGIS and Microsoft
SQL-Server Spatial.We took some Tiger data and are doing queries like shown below.
Any comments about these and/or suggestions about additional queries?
- Roland
Syntax with curly brackets as table placeholders:
1. Loading data and generating indices
SELECT *
INTO {dataset}
FROM {original dataset}
WHERE geom.STIntersects(@poly) = 1ALTER TABLE {dataset}
ADD PRIMARY KEY CLUSTERED (ID ASC)CREATE SPATIAL INDEX {dataset}_sidx
ON {dataset} ([geom])
2. Non-spatial selection query:SELECT Count(*)
FROM {dataset lines} l
WHERE l.roadflg='Y'
3. Spatial query I:a)
SELECT Count(*)
FROM {dataset points} p WITH(INDEX(geom_sidx))
WHERE p.geom.STIntersects(@poly) = 1b)
SELECT Count(*)
FROM {dataset polygons} pg WITH(INDEX(geom_sidx))
WHERE pg.geom.STIntersects(@poly) = 1c)
SELECT Count(*)
FROM {datset lines} l WITH(INDEX(geom_sidx))
WHERE l.geom.STIntersects(@poly) = 1
4. Spatial query II:a)
SELECT COUNT(*)
FROM {dataset points} p WITH(INDEX(geom_sidx))
WHERE p.geom.STDistance(@point) <= xb)
SELECT COUNT(*)
FROM {dataset polygons} pg WITH(INDEX(geom_sidx))
WHERE pg.geom.STDistance(@point) <= xc)
SELECT COUNT(*)
FROM {dataset lines} l WITH(INDEX(geom_sidx))
WHERE l.geom.STDistance(@point) <= x
5. Join of a linestring and a polygon tableSELECT SUM(pg.geom.STLength())
FROM {dataset lines} l
JOIN {dataset polygons} pg WITH(INDEX(geom_sidx))
ON l.geom.STDistance(pg.geom) <= 10
WHERE l.railflg = 'Y'- Changed TypeRoland Krummenacher Tuesday, November 24, 2009 6:02 PM
All Replies
- SQL Server 2008 and PostGIS implement very similar spatial functionality (not surprising, seeing as they are both based on the same OGC standards).
If you're doing performance comparison between the two databases, it's all going to come down to the indexes that exist on your spatial data, and how the query engine uses those indexes.
PostGIS uses R-Tree indexing of spatial data on top of of Generalised Search Tree (GiST). GiST indexing only works when you use bounding box operators in your query, e.g.
Depending on what version of PostGIS you're using, the bounding box operators might be implicity included when you use a predicate involving a geometry method such as STIntersects, but you may have to still explicitly use them.SELECT the_geom FROM geom_table WHERE the_geom && 'BOX3D(90900 190900, 100100 200100)'::box3d AND ST_Distance(the_geom, ST_GeomFromText('POINT(100000 200000)', -1)) < 100
I'm not sure, but I don't believe that PostGIS allows you to alter any parameters of the created index on a spatial column - it's either there or its not.
SQL Server 2008, in contrast, uses spatial indexes based on a multi-level grid system. You can explicitly set the bounding box covered by the index, the grid resolution at each of four levels of nested grids, and the cells-per-object limit applied to the index. Changing these settings can have a dramatic effect on the primary filter efficiency and internal filter efficiency of a spatial query, and hence the overall speed of performance.
For your test suite of queries above, I would seriously recommend that you create different test cases involving different spatial indexes (maybe not every combination, but at least try LOW, MEDIUM, and HIGH resolution at each level of the grid, and adjusting the bounding box to provide a tight fit around the data. )
Beginning Spatial with SQL Server http://www.apress.com/book/view/1430218290