Monday, 24 November 2008

Adventures in Spatial Data - Part 1

As part of my job at Content Master, I get to play around with some pretty cool new technologies. Unfortunately, I usually need to do this to a fairly tight deadline; and often at the end of a project I'm left feeling that I'd like to have a bit more time to dig a little deeper. Recently I completed some marketing collateral and Hands-On Labs on Spatial Data in SQL Server 2008 for the SQL Server Marketing team at Microsoft, and of all the technologies and products I've looked at recently, I find myself fascinated with this one. I've always been impressed with technology that's actually useful, so I've decided going to play around with spatial data to see what I can come up with. I should warn you, I'm not a GIS specialist, but I love that I can now incorporate geographical data into SQL Server databases - even with SQL Server Express Edition.

As a basic context for learning about this stuff, I've decided to create a simple Web-based application that combines spatial data in SQL Server 2008 with Virtual Earth. Specifically, it will track the travels of Beanie - a small white bear that we've had in our family for a number of years and who gets taken with us wherever we go (it's a long story!). Here's Beanie:

The Web application is available at http://www.graemesplace.com/beanietracker.htm, and I'll spend the next couple of blog articles dissecting it (and probably making changes to it as I go).

OK, so let's cover a few basics first. Spatial data support is new in SQL Server 2008, and takes the form of two new data types: geometry and geography. The first of these is designed to handle planar spatial data (i.e. points, lines and shapes on a flat surface), while the second is for geodetic spatial data (i.e. points, lines, and shapes on the surface of a spheroid - specifically the planet Earth). The data types ar quite similar to one another, and expose many of the same methods and properties - key differences between them include:
  • The geometry type uses arbitrary x and y coordinates to define spatial elements within a bounded flat area, while the geography type uses longitude and latitude coordinates to define spatial elements on the surface of the Earth.
  • The geometry type uses basic pythagorean mathematics to calculate distances between points, while the geography type accounts for the curvature of the Earth when calculating distances.
Since I want to integrate my data with Virtual Earth, it makes most sense to use the geography type for Beanie's travel data.

The first thing I need to do is to define the database objects for my application. For this aplication, I want to store the locations that Beanie has visited as a longitude/latitude point (along with information about the location and a photograph of Beanie there - just in case you thought I was making this up!)
So, here's the definition for the Photos table:
CREATE TABLE Photos
([PhotoID] int IDENTITY PRIMARY KEY,
[Description] nvarchar(200),
[Photo] varbinary(max),
[Location] geography)
GO


Note the Location column, which is defined as geography. All of the locations in this table will be simple long/lat points rather than lines or shapes, but the geography type will store any valid geography instance.

To insert a location, I can choose from a number of ways to parse the spatial data - depending on the format I have it in. For example, here's an INSERT statement that inserts a geography instance from a text-based decription of a point:
INSERT INTO Photos
([Description], Photo, Location)
VALUES
('Beanie in Las Vegas',
(SELECT * FROM OPENROWSET(BULK N'C:\BeanieTracker\3.JPG', SINGLE_BLOB) As [Photo]),
geography::STPointFromText('POINT (-115.1765 36.1132)', 4326))

Note the use of the STPointFromText static method. This parses the spatial text description, which must specify coordinates in longitude - latitude order. Note also the number 4326 specified as the Spatial Reference ID (SRID) - this is a code that determines the specific geodetic model that the coordinates are based on (in this case, 4326 indicates the WGS84 spheroid model used by GPS systems. There are a few different models, largely because it's difficult to be exact about what shape the Earth actually is! There's a useful article on this
here that might help if you're interested in learning more about ellipsoid models.)

Spatial data is commonly available in Geographic Markup Language (GML) format - an XML syntax for describing geospatial elements, and you can also insert data in this format as shown here:

INSERT INTO Photos
([Description], Photo, Location)
VALUES
('Beanie at Edinburgh Castle',
(SELECT * FROM OPENROWSET(BULK N'C:\BeanieTracker\2.JPG', SINGLE_BLOB) As [Photo]),
geography::GeomFromGml
('<Point xmlns="http://www.opengis.net/gml">
<pos>55.9486 -3.2005</pos>
</Point>'
, 4326))


Note the use of the GeomFromGml static method in this code, and also note that this time the coordinates are in latitude - longitude order. The reason for the inconsistancy stems from a late-breaking change in spatial data support in SQL Server 2008 (see
Isaac Kunen's blog article about this).

The next thing I need in the application is another table in which to store details about countries. The reason I need this is that I want to be able to find out how many locations Beanie has visited within each country, and show them visually on a map. I'll store the shapes of the countries as polygons - in terms of the database schema, I can use another geography column, since the data type can handle polygons and lines as well as points. here's the table definition:
CREATE TABLE Countries
(CountryID INT IDENTITY PRIMARY KEY,
CountryName nvarchar(255),
CountryShape geography)


To insert the polygon data, I can use the same basic approaches I used for the long/lat points in the Photos table. All I need to do is to describe the polygon as a sequence of points, as shown in this example:
INSERT INTO Countries
(CountryName, CountryShape)
VALUES
('France',
geography::STPolyFromText
('POLYGON
((
2.109375 51.01375465718818,
0.615234375 50.17689812200107,
-0.439453125 49.439556958940855,
-1.2744140625 49.837982453084834,
-2.1533203125 49.696061819115634,
-2.28515625 48.893615361480194,
-4.9658203125 48.83579746243093,
-4.5703125 47.635783590864854,
-3.1640625 47.487513008956554,
-1.6259765625 46.07323062540835,
-1.3623046875 44.809121700077355,
-1.845703125 43.32517767999296,
0.615234375 42.682435398386204,
3.0322265625 42.42345651793833,
4.2626953125 43.421008829947254,
6.2841796875 42.74701217318067,
7.6025390625 43.7393520791547,
6.943359375 44.37098696297173,
6.6357421875 45.182036837015886,
6.5478515625 46.49839225859763,
5.9765625 46.13417004624326,
6.328125 46.6795944656402,
7.119140625 47.57652571374621,
8.2177734375 48.951366470947725,
6.5478515625 49.32512199104001,
4.833984375 50.064191736659104,
4.21875 49.89463439573421,
4.1748046875 50.3734961443035,
2.98828125 50.62507306341437,
2.109375 51.01375465718818
))', 4326)
)

As before, I can also import the data in GML format:

INSERT INTO Countries
(CountryName, CountryShape)
VALUES
('Egypt',
geography::GeomFromGml
(
'<Polygon xmlns="http://www.opengis.net/gml">
<exterior>
<LinearRing>
<posList>
31.57853542647338 24.9609375
31.128199299111959 24.697265625
30.524413269923986 24.9609375
30.221101852485983 24.609375
29.152161283318915 24.9609375
21.94304553343818 24.9609375
21.983801417384697 34.1015625
23.140359987886114 35.6396484375
27.9361805667694 34.43115234375
29.535229562948469 34.892578125
31.278550858946531 34.21142578125
31.222197032103196 32.32177734375
1.597252561706661 31.1572265625
30.939924331023441 28.76220703125
31.57853542647338 24.9609375
</posList>
</LinearRing>
</exterior>
</Polygon>'
, 4326)
)



Again, note the difference in the ordering of the longitude and latitude coordinates between the two methods. Also note that you must describe the polygon as a sequence of coordinates that enclose the shape you want to store in a counter-clockwise direction. The reason for this requirement is to avoid ambiguity about the part of the Earth's surface you want to include in the shape and the rest of the Earth's surface. For example, look at the following picture:



Now, does this show a small red circle on a gray sphere, or does it show a large gray area that covers most of a red sphere? By using the convention of describing polygons by enclosing them in a counter-clockwise direction, you can be unambiguous about which shape you're describing. Additionally, if you try to define a shape that is larger than a hemisphere, the geography type will raise an exception - and this is often happens when you describe polygon in the wrong direction by mistake. See this article in Ed Katibah's blog for an interesting discussion about this.
OK, back to our database - and an interesting problem. Some countries consist of more than one landmass. In other words, they include multiple polygons. Fortunately, the geography data type can handle this scenario as shown in the following code, which creates a geography instance for the United States that includes the mainland US, Alaska, and Hawaii:
INSERTINTO Countries
(CountryName, CountryShape)
VALUES
('United States of America',
geography::STMPolyFromText
('MULTIPOLYGON(
((-123.07402588363084 37.767367310117017,
-121.94097917688177 36.490937923858858,
-120.37149732826121 34.045890710439785,
-117.16155430516534 32.715680633728539,
-115.13671875 32.54681317351514,
-110.390625 31.203404950917395,
-108.6328125 31.653381399664,
-106.47549795243778 31.733844355633792,
-103.7109375 29.075375179558346,
-101.56221503336745 29.807382670042525,
-100.89817721206964 29.366207091173063,
-99.501983817857962 27.530904416002304,
-97.505157798460743 25.866221792519891,
-97.149468867243158 25.943457571654395,
-97.176960378652467 27.675151290420871,
-93.894849576786243 29.732981419413576,
-90.506037559706073 29.06303206938912,
-89.12109375 30.353916372297035,
-86.1328125 30.27804437780013,
-85.2099609375 29.649868677972303,
-83.95751953125 29.99300228455108,
-81.780293310096368 24.5583430193,
-80.134852769446283 25.792769876534258,
-80.101951906484146 26.931995972872219,
-80.606873856214719 28.079387728938791,
-80.537540582081022 28.470839206510636,
-80.749057521104547 32.189103634193479,
-75.984925610664732 35.373785353022406,
-71.591597672350787 41.189858770922925,
-70.3056301459883 41.801225588436182,
-66.967327818020152 44.812878222881466,
-69.154285648166763 47.454989805995105,
-71.538670536472367 44.99637100669522,
-74.735724992604418 45.028448990132183,
-77.250621135570626 43.995140406452222,
-79.064274837925424 43.10662863009243,
-78.878439231461826 42.88544317671419,
-82.652583599187864 41.76926713330969,
-84.347939004022578 46.518294248124974,
-89.245498827623535 48.381706457487851,
-102.54842574981625 49.00235529594778,
-122.7588943299908 48.976728632317268,
-123.14601579176878 48.165524140067781,
-123.43134011415688 48.118720268780265,
-124.73518224557731 48.391678533676071,
-123.07402588363084 37.767367310117017)),
((-141.15234375 59.88893689676585,
-141.50390625 70.08056215839737,
-156.68701171875 71.43417616304582,
-168.3984375 68.84766505841037,
-164.8828125 66.99884379185184,
-169.1015625 65.58572002329472,
-164.70703125 63.704722429433225,
-169.1015625 60.23981116999892,
-162.59765625 58.17070248348609,
-171.5625 51.83577752045248,
-153.6328125 56.072035471800866,
-148.0078125 59.44507509904714,
-141.15234375 59.88893689676585)),
((-159.169921875 22.411028521558702,
-159.85107421875 22.380555501421533,
-160.400390625 21.80030805097259,
-158.26904296875 21.238182425982312,
-157.08251953125 20.80747157680652,
-156.741943359375 20.478481600090564,
-156.02783203125 20.262197124246533,
-156.2255859375 19.673625561844392,
-155.819091796875 18.781516724349703,
-154.632568359375 19.383704634148017,
-155.8740234375 20.848545148787234,
-157.8955078125 21.749295836732088,
-159.2138671875 22.27893059841188,
-159.169921875 22.411028521558702))
)', 4326)
)
Note the use of the STMPolyFromText method, which includes a text description of three polygons. Of course, you can also use GML for this, as shown in the following code to define two polygons for the United Kingdom of Great Britain and Northern Ireland:

INSERT INTO Countries
(CountryName, CountryShape)
VALUES
('United Kingdom',
geography::GeomFromGml
('<MultiSurface xmlns="http://www.opengis.net/gml">
<surfaceMembers>
<Polygon>
<exterior>
<LinearRing>
<posList>
58.813741715707813 -5.09765625
58.539594766640484 -6.6796875
57.704147234341931 -7.998046875
56.022948079627454 -6.767578125
55.379110448010472 -6.15234375
55.07836723201514 -5.361328125
54.162433968067809 -3.779296875
53.540307391500221 -3.251953125
53.44880683542759 -4.7021484375
52.696361078274485 -4.98779296875
52.749593726741139 -4.24072265625
52.308478623663355 -4.2626953125
51.944264879028765 -5.38330078125
51.440312757160115 -5.07568359375
51.508742458803319 -3.955078125
51.358061573190916 -3.5595703125
51.303145259199056 -3.1640625
51.206883394865621 -4.482421875
50.007739014636869 -5.99853515625
49.908787000867136 -5.0537109375
50.247204901392671 -4.24072265625
50.12057809796007 -3.69140625
50.471490851399558 -2.13134765625
50.583236614805884 -0.72509765625
50.861444110589233 1.12060546875
51.23440735163458 1.64794921875
51.577069537225668 1.0546875
52.1874047455997 1.77978515625
52.816043191549333 1.82373046875
53.054421865461023 0.615234375
52.948637884883205 0.3076171875
53.278353017531821 0.41748046875
54.610254981579146 -0.615234375
54.711928848406139 -1.12060546875
55.615589025267482 -1.51611328125
56.084297562061408 -2.63671875
55.986091533808384 -3.2080078125
56.022948079627454 -3.69140625
56.059769479106571 -3.22998046875
56.279960831728459 -2.548828125
56.389583525613055 -2.79052734375
57.112385007934009 -2.021484375
57.680660029772348 -1.73583984375
57.715885127745032 -3.8232421875
57.879816455278409 -3.7353515625
57.879816455278409 -3.955078125
58.562522728537338 -2.8125
58.813741715707813 -5.09765625
</posList>
</LinearRing>
</exterior>
</Polygon>
<Polygon>
<exterior>
<LinearRing>
<posList>
55.354135310210573 -7.31689453125
55.028022112992517 -7.4267578125
54.762670400254947 -7.646484375
54.711928848406139 -7.91015625
54.635697306063854 -7.93212890625
54.610254981579146 -7.7783203125
54.482804559582554 -8.173828125
54.1109429427243 -7.53662109375
54.188155481071512 -7.20703125
54.380557368630654 -7.03125
54.213861000644926 -6.85546875
54.188155481071512 -6.7236328125
54.046489113355761 -6.70166015625
54.033586335210849 -6.08642578125
54.393352223845881 -5.38330078125
54.800684867322332 -5.55908203125
54.863962939854758 -5.82275390625
55.216490131689788 -6.1083984375
55.241552035652518 -6.87744140625
55.354135310210573 -7.31689453125
</posList>
</LinearRing>
</exterior>
</Polygon>
</surfaceMembers>
</MultiSurface>
'
,4326)
)

After you've inserted your spatial data, you can query it in Transact-SQL. SQL Server Management Studio includes a pretty cool visualizer that lets you see your spatial data projected as a map:

You can retrieve spatial data in Binary, Text, or GML format as shown by the following queries and results:

SELECT [Description], [location]
FROM Photos

Results:
Beanie at Edinburgh Castle 0xE6100000010C287E8CB96BF94B40B4C876BE9F9A09C0
Beanie in Las Vegas 0xE6100000010C1FF46C567D0E42409EEFA7C64BCB5CC0


SELECT [Description], [location].ToString()
FROM Photos

Results:
Beanie at Edinburgh Castle POINT (-3.2005 55.9486)
Beanie in Las Vegas POINT (-115.1765 36.1132


SELECT [Description], [location].AsGml()
FROM Photos

Results:
Beanie at Edinburgh Castle<Point xmlns="http://www.opengis.net/gml"><pos>55.9486 -3.2005</pos></Point>
Beanie in Las Vegas<Point xmlns="http://www.opengis.net/gml"><pos>-115.1765 36.1132<pos></Point>

Of course, if the spatial support in SQL Server 2008 only enabled you to store and retrieve spatial objects, it would be of limited use. After all, you could just as easily store this data as text or XML. The real advantage of spatial data support is the ability to perform spatial operations on the data, and ask location-related questions such as "How many times has Beanie been in each country?" (or "What are the sales volumes for each sales region?" if you prefer) and "How far is this place from Beanie's home?"(or "Where's the nearest store to a customer's current location?"). Here are a couple of code samples that shpow how these kinds of query can be performed by using the instance methods of the geography data type.
-- Function to get the distance of any point from Beanie's home in Dunfermline, Scotland
CREATE FUNCTION GetDistanceFromHome (@location geography)
RETURNS float
AS
BEGIN
DECLARE @home geography;
SET @home = geography::STPointFromText('POINT(-3.408304452896118 56.06143472072096)', 4326)
DECLARE @dist float;
SET @dist = ROUND(@home.STDistance(@location)/1000, 2)
RETURN @dist
END
GO
-- Stored procedure to return all photos and their locations
CREATE PROCEDURE GetPhotosGML
AS
-- Return the location geography data as GML
SELECT [PhotoID], [Description], [Location].AsGml() As PhotoGML,
(SELECT dbo.GetDistanceFromHome(Location)) AS DistanceFromHome
FROM Photos
GO

EXEC GetPhotosGML

Results:
PhotoIDDescriptionPhotoGMLDistanceFromHome
2Beanie at Edinburgh Castle<Point xmlns="http://www.opengis.net/gml"><pos>55.9486 -3.2005</pos></Point>18.05
3Beanie in Las Vegas<Point xmlns="http://www.opengis.net/gml"><pos>36.1132 -115.1765</pos></Point>7940.86


SELECT c.CountryName, Count(p.PhotoID) Visits
FROMPhotos p, Countries c
WHERE p.Location.STIntersects(c.CountryShape) = 1
GROUP BY c.CountryName
Results:
CountryName Visits
Egypt 2
France 1


So, hopefully you've enjoyed this little exploration of spatial data in SQL Server 2008. The database code for the Beanie Tracker application is available from
http://cid-ddad9079cff45619.skydrive.live.com/self.aspx/Public/BeanieTracker.zip if you want to play with it yourself. You'll need SQL Server 2008 (you can get the Express Edition for free from http://www.microsoft.com/sqlserver/2008/en/us/express.aspx).
Next time, I'll describe the Web application and how to visualize your spatial data with the Virtual Earth map control.

No comments: