How to build an efficient GeoIP SQL table
This here is a very handy little script I threw together to generate a geoip.sql table for quickly determining which country a IP is from. I already hear you saying “Just convert the IP to an INT and use BETWEEN, how hard can it be”. And you are right, that works. And it may even be your easiest solution, but it just isn’t fast. And if you are planning on hammering the table with thousands of queries you are going to end up looking for something fast.
A while back I found a very interesting posting at www.jcole.us that described how to use Spacial Indexes together with MySQL’s GIS to speed up the queries. The posting has been online for a while and both it and the replies are worth reading.
All I did was make a small bash script to download the current “lite” version of GeoIP CSV file from maxmind.com, use the information from the posting to throw/transform it into a local database table and dump out a .sql file that can be easily imported into any other database. The script isn’t failproof though, it expects your user to be able to use mysql and have permission to create databases/tables and “load data local infile”.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 | #!/bin/bash #=============================================================================== # # FILE: generate_geoip_sql.sh # # USAGE: ./generate_geoip_sql.sh # # DESCRIPTION: Generates a optimized sql dump with geoipdata from maxmind.com # # USAGE: import sql into a database # # REQUIREMENTS: mysql database rights to create databases and LOAD DATA LOCAL INFILE # # EXAMPLE: # SELECT country_code,country_name # FROM geo_ip # WHERE MBRCONTAINS(ip_poly, POINTFROMWKB(POINT(INET_ATON('1.2.3.4'), 0))) # # SELECT country_code,country_name # FROM geo_ip # WHERE INET_ATON('1.2.3.4') # BETWEEN ip_from AND ip_to; # # AUTHOR: Ryan Schulze (rs), ryan@dopefish.de # #=============================================================================== set -o nounset # Treat unset variables as an error CSVFile="GeoIPCountryWhois.csv" if [[ -e ${CSVFile} ]] then echo -en "\n${CSVFile} already exists, download a newer version? [Y/n]: " read answer if [[ "$(echo $answer|tr "N" "n")" != "n" ]] then rm ${CSVFile} fi fi if [[ ! -e ${CSVFile} ]] then tmpfile=$(mktemp) wget -O ${tmpfile} http://geolite.maxmind.com/download/geoip/database/GeoIPCountryCSV.zip unzip ${tmpfile} && rm ${tmpfile} fi echo -e "\ngenerating geoip.sql" mysql -e "create database if not exists temp" mysql -D temp -e "drop table if exists geo_ip" echo ' CREATE TABLE geo_ip ( id INT UNSIGNED NOT NULL auto_increment, ip_poly POLYGON NOT NULL, ip_from INT UNSIGNED NOT NULL, ip_to INT UNSIGNED NOT NULL, country_code CHAR(2) NOT NULL, country_name CHAR(50) NOT NULL, PRIMARY KEY (id), SPATIAL INDEX (ip_poly) ) ENGINE=MyISAM DEFAULT CHARSET=latin1; LOAD DATA LOCAL INFILE "GeoIPCountryWhois.csv" INTO TABLE geo_ip FIELDS TERMINATED BY "," ENCLOSED BY "\"" LINES TERMINATED BY "\n" ( @ip_from_string, @ip_to_string, @ip_from, @ip_to, @country_code, @country_string ) SET id := NULL, ip_from := @ip_from, ip_to := @ip_to, ip_poly := GEOMFROMWKB(POLYGON(LINESTRING( /* clockwise, 4 points and back to 0 */ POINT(@ip_from, -1), /* 0, top left */ POINT(@ip_to, -1), /* 1, top right */ POINT(@ip_to, 1), /* 2, bottom right */ POINT(@ip_from, 1), /* 3, bottom left */ POINT(@ip_from, -1) /* 0, back to start */ ))), country_code := @country_code, country_name := @country_string ; ' | mysql --local-infile=1 -D temp mysqldump --opt temp geo_ip > geoip.sql echo -e "\ndone generating geoip.sql" |