The Internet has become a collection of resources meant to appeal to a large general audience. Although this multitude of information has been a great boon, it also has diluted the importance of geographically localized information. Offering the ability for Internet users to garner information based on geographic location can decrease search times and increase visibility of local establishments. Similarly, user communities and chat-rooms can be enhanced through knowing the locations (and therefore, local times, weather conditions and news events) of their members as they roam the globe. It is possible to provide user services in applications and Web sites without the need for users to carry GPS receivers or even to know where they themselves are.
Geolocation by IP address is the technique of determining a user's geographic latitude, longitude and, by inference, city, region and nation by comparing the user's public Internet IP address with known locations of other electronically neighboring servers and routers. This article presents some of the reasons for and benefits of using geolocation through IP address, as well as several techniques for applying this technology to an application, Web site or user community.
Why Geolocation?
The benefits of geolocation may sound complex, but a simple example may help illustrate the possibilities. Consider a traveling businessman currently on the road to San Francisco. After checking into his hotel, he pulls out his laptop and hops onto the wireless Internet access point provided by the hotel. He opens his chat program as well as a Web browser. His friends and family see from his chat profile that he currently is near Golden Gate Park. Consequently, they can determine his local time. By pulling up a Web browser, furthermore, the businessman can do a localized search to find nearby restaurants and theaters.
Without having to know the address of the hotel he's staying in, the chat program and Web pages can determine his location based on the Internet address through which he is connecting. The following week, when he has returned to his home in Florida, he uses his laptop to log into a chat program, and his chat profile correctly places him in his home city. There is no need to change computer configurations, remember addresses or even be aware, as the user, that you are benefitting from geolocation services.
Possible applications for geolocation by IP address exist for Weblogs, chat programs, user communities, forums, distributed computing environments, security, urban mapping and network robustness. We encourage you to find out what applications and Web sites currently employ geolocation or could be enhanced by adding support.
Although several methods of geographically locating an individual currently exist, each system has cost and other detriments that make them technology prohibitive in computing environments. GPS is limited by line-of-sight to the constellation of satellites in Earth's orbit, which severely limits locating systems in cities, due to high buildings, and indoors, due to complete overhead blockage. Several projects have been started to install sensors or to use broadcast television signals (see Resources) to provide for urban and indoor geolocation. Unfortunately, these solutions require much money to cover installation of new infrastructure and devices, and these services are not supported widely yet.
By contrast, these environments already are witnessing a growing trend of installing wireless access points (AP). Airports, cafes, offices and city neighborhoods all have begun installing wireless APs to provide Internet access to wireless devices. Using this available and symbiotic infrastructure, geolocation by IP address can be implemented immediately.
Geolocation Standards and Services
As discussed below, several RFC proposals have been made by the Internet Engineering Task Force (IETF) that aim to provide geolocation resources and infrastructure. However, these standards have met with little support from users and administrators. To date, there has not been much interest in providing user location tracking and automatic localization services. Several companies now offer pay-per-use services for determining location by IP. These services can be expensive, however, and don't necessarily offer the kind of functionality a programmer may want when designing his or her Web site or application.
Several years ago, CAIDA, the Cooperative Association for Internet Data Analysis, began a geolocation by IP address effort called NetGeo. This system was a publicly accessible database of geographically located IP addresses. Through the use of many complex rules, the NetGeo database slowly filled and was corrected for the location of IP addresses. The project has been stopped, however, and the technology was licensed to new partners. However, the database still is available, although several years old, and provides a good resource for determining rough locations.
To query the NetGeo database, an HTTP request is made with the query IP address, like this:
--
$ http://netgeo.caida.org/perl/netgeo.cgi?target=192.168.0.1
VERSION=1.0
TARGET: 192.168.0.1
NAME: IANA-CBLK1
NUMBER: 192.168.0.0 - 192.168.255.255
CITY: MARINA DEL REY
STATE: CALIFORNIA
COUNTRY: US
LAT: 33.98
LONG: -118.45
LAT_LONG_GRAN: City
LAST_UPDATED: 16-May-2001
NIC: ARIN
LOOKUP_TYPE: Block Allocation
RATING:
DOMAIN_GUESS: iana.org
STATUS: OK
--
As you can see, the NetGeo response includes the city, state, country, latitude and longitude of the IP address in question. Furthermore, the granularity (LAT_LONG_GRAN) also is estimated to give some idea about the accuracy of the location. This accuracy also can be deduced from the LAST_UPDATED field. Obviously, the older the update, the more likely it is that the location has changed. This is true especially for IP addresses assigned to residential customers, as companies holding these addresses are in constant flux.
In order to make this database useful to an application or Web site, we need to be able to make the request through some programming interface. Several existing packages assist in retrieving information from the NetGeo database. The PEAR system has a PHP package (see Resources), and a PERL module, CAIDA::NetGeo::Client, is available. However, it is a relatively straightforward task to make a request in whatever language you are using for your application or service. For example, a function in PHP for getting and parsing the NetGeo response looks like this:
function getLocationCaidaNetGeo($ip)
2: {
3: $NetGeoURL = "http://netgeo.caida.org/perl/netgeo.cgi?target=".$ip;
4:
5: if($NetGeoFP = fopen($NetGeoURL,r))
6: {
7: ob_start();
8:
9: fpassthru($NetGeoFP);
10: $NetGeoHTML = ob_get_contents();
11: ob_end_clean();
12:
13: fclose($NetGeoFP);
14: }
15: preg_match ("/LAT:(.*)/i", $NetGeoHTML, $temp) or die("Could not find element LAT");
16: $location[0] = $temp[1];
17: preg_match ("/LONG:(.*)/i", $NetGeoHTML, $temp) or die("Could not find element LONG");
18: $location[1] = $temp[1];
19:
20: return $location;
21: }
Using DNS to Your Advantage
As previously mentioned, the NetGeo database slowly is becoming more inaccurate as IP address blocks change hands in company close-outs and absorptions. Several other tools are available for determining location, however. A description of the NetGeo infrastructure itself (see Resources) presents some of the methods it employed for mapping IP addresses and can be a source of guidance for future projects.
One of the most useful geolocation resources is DNS LOC information, but it is difficult to enforce across the Internet infrastructure. RFC 1876 is the standard that outlines "A Means for Expressing Location Information in the Domain Name System." Specifically, this is done by placing the location information of a server on the DNS registration page. Several popular servers have employed this standard but not enough to be directly useful as of yet.
To check the LOC DNS information of a server, you need to get the LOC type of the host:
--
$ host -t LOC yahoo.com
yahoo.com LOC 37 23 30.900 N 121 59 19.000 W 7.00m 100m 100m 2m
--
This parses out to 37 degrees 23' 30.900'' North Latitude by 121 degrees 59' 19.000'' West Longitude at 7 meters in altitude, with an approximate size of 100 meters at 100 meters horizontal precision and 2 meters vertical precision. There are several benefits to servers that offer their geographic location in this way. First, if you are connecting from a server that shows its DNS LOC information, determining your geolocation is simple, and applications may use this information without further work, although some verification may be useful. Second, if you are connecting on your second or third bounce through a server that has DNS LOC information, it may be possible to make an estimate of your location based on traffic and ping times. However, it should be obvious that these estimates greatly degrade accuracy.
It also is possible to put the DNS LOC information for your Web site in its registration (see Resources). If more servers come to use LOC information, geolocation accuracy will be much easier to attain.
Sidebar: host
host is a DNS lookup utility that allows users to find out various pieces of information about a host. The simplest use is doing hostname to IP address lookups and the reverse. The reverse, dotted-decimal IPv4 notation, is used for this, and the actual server that hosts the canonical name is returned. The type flag, -t, can be used to obtain specific information from the host record from the name server.
Where There's a Name, There's a Way
Many users hopping onto the Internet probably aren't coming from a major server. In fact, most users don't have a static IP address. Dial-up, cable modems and cell phone connections are assigned a dynamic IP address that may change multiple times in one day or not at all for several weeks. Therefore, it becomes difficult to tie these dynamic addresses to a single location.
To our rescue, these service providers typically provide an internal naming scheme for assigning IP addresses and associating names with these addresses. Typically, the canonical name of an IP address contains the country-code top-level domain (ccTLDs) in a suffix. CN is China, FR is France, RO is Romania and so on. Furthermore, the name even may contain the city or region in which the IP address is located. Often, however, this information is shortened to some name that requires a heuristic to determine. For example, in your service or application, a user may appear to be coming from d14-69-1-64.try.wideopenwest.com. A whois at this address reveals it is a WideOpenWest account from Michigan. Using some logic, it is possible to deduce that this user is connecting through a server located in Troy, MI, hence the .try. in the canonical name.
Some projects have been started to decipher these addresses (see Resources), and you also can get all of the country codes and associated cities and regions of a country from the IANA Root-Zone Whois Information or the US Geospatial Intelligence Agency, which hosts the GEOnet Names Server (GNS). The GNS has freely available data files on almost all world countries, regions, states and cities, including their sizes, geographic locations and abbreviations, as well as other information.
Information such as that presented on the GNS also can be used to provide users with utilities and services specific to their geographical locations. For example, it is possible to determine a user's local currency, time zone and language. Time zone is especially useful for members of a community or chat group to determine when another friend may be available and on-line.
Where Are You Located?
Now that we've explained some of the techniques that can be used in geolocating Internet users by their IP addresses, we offer you a chance to try it out. Point your Web browser of choice here, and see how accurate or inaccurate the current results are. Please leave comments below about the accuracy of your results as well as any ideas you may have.
No comments:
Post a Comment