Tuning Spatial Point Data Queries in SQL Server 2012
Written by: Ed Katibah, Milan Stojic, Michael Rys, Nicholas Dritsas
Technical reviewers: Chuck Heinzelman
Spatial Point Data queries require particular tuning efforts to enhance performance and improve overall application through put. SQL Server 2012 introduces a few key new features and improvements
to assist you in that goal.
Below, we go over a few key best practices and suggestions on how to achieve that.
To create an index on a spatial column, the table must provide a primary key. SQL Azure extends this requirement by requiring that each table have a clustered primary key.
Spatial query execution with an index contains two parts: primary filter (spatial index lookup) and secondary filter (original spatial predicate). So with the queries like “STDistance() < x”,
SQL Server will:
The secondary filter can be avoided by using Filter(@)=1 operation instead of STIntersects(@x)=1. The query performance can be much better but false positives are possible in the result. If the spatial query is not too large,
for HHHH spatial index, tolerance will be around 100-200m when using the Geography data type.
STDistance(@x) < @range operator can be replaced with
If returning some false positives is not an option, IO performance of the base table can still be optimized.
Any spatial index is already optimized to minimize IO. Cells are places in a specific pattern so that spatially closer cells sit physically close to each other. The following picture illustrates the cell pattern used (a
Hilbert space-filling curve is used in SQL Server).
Therefore, the reading pattern for an index seek over spatial index might look like:
However, as the join to the base table is needed to get actual spatial objects, the IO pattern for the clustered index seek (the base table) might look like:
This IO pattern will result in reading many more pages from the disk and will have high overall impact on spatial query performance.
So, in order to get better IO performance for the clustered index seek, the Primary Key needs to be correlated with spatial index cell pattern. This can be achieved by adding point coordinates components to the clustered
For our sample data, here is how the original table containing the point column was specified:
[id] [int] NOT
[type] [int] NOT
Spatial indexes reference the primary key to access the base table rows. Query performance can often be significantly enhanced by creating a clustered index on the primary key which incorporates the individual ordinates
of the underlying point coordinate. This technique allows the primary key to be “spatially” ordered thereby minimizing the number of disk seeks required to fetch the rows of base table data requested by the spatial index.
On occasion, the individual point coordinate ordinates can be found in their own columns in the base table along with the column of points (spatial object). When this is the case, the individual point ordinates can be referenced
directly in the clustered primary key DDL. When this is not the case, the table schema can be expanded to include two persisted computed columns which contain the individual ordinates of the underlying point coordinates.
It is important to note that the new point ordinate columns must be the first two columns referenced in the clustered primary key. For geometry coordinates, they should be in the x, y order. For geography data they should
be in latitude, longitude order.
For the sample data table, it is necessary to expand the schema as follows, creating two new computed persisted columns using spatial methods and redefining the clustered primary key index with these two new columns (note
the use of the built-in Lat and
Long methods to extract the individual ordinates):
[geog] [geography] NOT
[geo_lat] AS ([geog].[Lat]) PERSISTED
NOT NULL, -- *
[geo_lon] AS ([geog].[Long])
NULL, -- *
* not null due to use in primary key
NOTE: Persistence of the x,y / lat,long points for the spatial object as the first columns in the clustered key adds 16 bytes to the key, which is then duplicated in both the base table as well
as the spatial index itself, as well as every non-clustered index added to the table. The regular guidance for clustered index keys is to keep them as small as possible. Even though this could potentially improve spatial index performance, users should be
aware of the potential negative consequences.
The default spatial index has traditionally been recommended with grids all set to MEDIUM based on a general spatial workload. This was original index definition:
SPATIAL INDEX [table_geog_sidx]
=(LEVEL_1 = MEDIUM,
Spatial query performance can often be improved significantly by selecting spatial index options which differ from default settings for specific spatial shapes and distributions. In the case of point data, it has been
found in most, if not in all, cases that spatial indexes with all grid levels set to HIGH outperform other configurations. Since we are dealing with point data, the CELLS_PER_OBJECT setting is irrelevant and can be set to any legal value (1-8192) without
effect. Here is the spatial index reconfigured to grid levels with the optimal setting for points, all HIGH:
SPATIAL INDEX [table_geog_sidx]
Consider using a spherical Earth model if using geography data type
If you are using SQL Azure or SQL Server 2012 and your accuracy requirements for the spatial results do not require that an ellipsoidal model be used, additional query performance can potentially be gleaned by using a spherical
model as the basis for the underlying spatial coordinate system due to the simpler mathematics involved in calculations. Additionally, the
STDistance(), STLength() and ShortestLineTo() methods are optimized to run faster on a sphere than on an ellipsoid.
In SQL Server 2012 and SQL Azure, there is a new entry for a unit sphere in the
sys.spatial_reference_systems view. This new entry has a spatial reference identifier (SRID) = 104001 and is listed as a unit sphere.
The native units of measure output when using the unit sphere are radians. To compute measures in real world values, such as length and area in meters, simply multiply results by the radius of the desired output sphere
In the case of our sample data, the original coordinate system was specified as World Geodetic Reference System of 1984 (WGS84). In SQL Server and SQL Azure, this is identified as spatial reference identifier (SRID) 4326.
This is the most common coordinate system for use with the geography data type (most commercial spatial data uses these coordinate systems, as do GPS receivers). For many web mapping programs which reference WGS84 (Bing Maps, Google Maps, etc.), a radius of
6,378,137 meters is used for their underlying spherical spatial reference system (“Spherical Mercator”).
To update the SRID for each point object to the unit sphere reference system in the sample spatial table, the following T-SQL can be used (note the use of the
To test these tuning recommendations, a customer-driven scenario was used along with their data. To provide an idea of the scope of the query, the following stored procedure is provided as an example:
/****** Object: StoredProcedure [dbo].[TEST] Script Date: 12/13/2011 3:14:50 PM ******/
-- SET NOCOUNT ON added to prevent extra result sets from
-- interfering with SELECT statements.
--set statistics time on
--set statistics IO on
SET NOCOUNT ON;
DECLARE @Location geography,
AS CO ON CO.country_id
SET @location = geography::Point(@location.Lat, @location.Long,
SELECT GI.geo_id, GI.geo_info.STDistance(@Location)
AS GI with(nolock,
INNER JOIN dbo.product_hotels
For discussion clarity, the following T-SQL, pre-optimization, captures the core spatial query used in the stored procedure (the SQL Server spatial methods are highlighted). This corresponds to the T-SQL illustrated in
the light blue box, above.
SELECT p.geo_id, p.geo_info.STDistance(@Location)
AS p WITH(nolock,
INDEX(SIndx_geography_informations)) –- original, non-optimized index
< 10000 -– 10KM (10,000 meters)
When the coordinate system was changed from SRID=4326 to SRID=104001, it was necessary to rewrite the query as follows to accommodate the coordinate system units change and assure that the new, optimized spatial index was
SELECT GI.geo_id, GI.geo_info.STDistance(@Location)
FROM Points_UnitSphere AS GI
INDEX(geo_info_HHHH_sidx)) –- optimized index
* 6378137 <
10000 – 10KM
Adding the multiplication in the WHERE clause causes the spatial index to be ignored in the query plan (in fact, the query compiler complains about the use of the spatial index hint, when written in this fashion). To alleviate
this issue, the query can be rewritten as follows:
SELECT p.id, p.info.STDistance(@Location)
FROM Points_UnitSphere AS p
WHERE GI.geo_info.STDistance(@Location) <
Quick summary of the potential techniques to improve spatial point data performance:
Prior to optimization, use of the SQL Server STDistance method in the SELECT and WHERE clauses returned an average performance of 30 milliseconds.
After optimization (new table with point clustered primary key, new point index and use of the sphere-optimized methods), use of the SQL Server STDistance method in the SELECT and WHERE clauses returned an average performance
of 18 milliseconds.
Execution Time in ms (average*)
Original Spatial Query
Optimized Spatial Query
* each query for this test was executed 1,000 times for determination of the average time
Excellents samples. Tks
Indeed a real treasue.
- still working with the "World" database that you provided us with at the SQLSkills Immersion Event in Bellevue for testing and demos :-)