GIS Format Conversion Tutorials
Practical tutorials and real-world examples for spatial data conversion
Getting Started with Spatial Data Conversion
Converting between spatial data formats is a fundamental skill for anyone working with geographic information systems. This tutorial series will guide you through practical examples and common scenarios you'll encounter when working with WKT, WKB, and GeoJSON formats.
What You'll Learn
- How to identify different spatial data formats
- When to use each format for optimal results
- Step-by-step conversion procedures
- Common pitfalls and how to avoid them
- Integration with popular tools and platforms
Tutorial 1: Converting GPS Points from CSV to Multiple Formats
Scenario
You have a CSV file with GPS coordinates from a field survey and need to convert them to different formats for various applications:
- WKT for database storage
- WKB for efficient querying
- GeoJSON for web visualization
Sample Data
Your CSV file contains the following data:
point_id,latitude,longitude,description
1,40.7128,-74.0060,"New York City Hall"
2,34.0522,-118.2437,"Los Angeles City Hall"
3,41.8781,-87.6298,"Chicago City Hall"
Step 1: Create WKT Points
For each coordinate pair, create a WKT POINT geometry:
POINT(-74.0060 40.7128) # New York City Hall
POINT(-118.2437 34.0522) # Los Angeles City Hall
POINT(-87.6298 41.8781) # Chicago City Hall
Step 2: Convert to GeoJSON
Transform the points into a GeoJSON FeatureCollection:
{
"type": "FeatureCollection",
"features": [
{
"type": "Feature",
"geometry": {
"type": "Point",
"coordinates": [-74.0060, 40.7128]
},
"properties": {
"point_id": 1,
"description": "New York City Hall"
}
},
{
"type": "Feature",
"geometry": {
"type": "Point",
"coordinates": [-118.2437, 34.0522]
},
"properties": {
"point_id": 2,
"description": "Los Angeles City Hall"
}
}
]
}
Step 3: Database Integration
Use WKB format for efficient database storage. In PostGIS:
CREATE TABLE city_halls (
id SERIAL PRIMARY KEY,
point_id INTEGER,
description TEXT,
geom GEOMETRY(POINT, 4326)
);
INSERT INTO city_halls (point_id, description, geom) VALUES
(1, 'New York City Hall', ST_GeomFromText('POINT(-74.0060 40.7128)', 4326)),
(2, 'Los Angeles City Hall', ST_GeomFromText('POINT(-118.2437 34.0522)', 4326));
Key Takeaways
- Always use longitude, latitude order for WKT and database storage
- GeoJSON uses [longitude, latitude] array format
- Include SRID (Spatial Reference ID) when working with databases
- Validate coordinates are within valid ranges (-180 to 180 for longitude, -90 to 90 for latitude)
Tutorial 2: Working with Complex Polygons
Scenario
You need to define a complex polygon area (like a park with a lake inside) and convert it between formats for different applications.
Creating the Polygon with Hole
A polygon with a hole requires careful definition of exterior and interior rings:
WKT Format
POLYGON((
10 10, 20 10, 20 20, 10 20, 10 10),
(12 12, 18 12, 18 18, 12 18, 12 12)
)
Note: The first ring is the exterior boundary (counterclockwise), the second ring is the hole (clockwise).
GeoJSON Format
{
"type": "Polygon",
"coordinates": [
[
[10, 10], [20, 10], [20, 20], [10, 20], [10, 10]
],
[
[12, 12], [18, 12], [18, 18], [12, 18], [12, 12]
]
]
}
Validation Steps
Always validate complex polygons:
- Ring Closure: Ensure first and last coordinates are identical
- Ring Orientation: Exterior rings counterclockwise, holes clockwise
- Self-Intersection: Rings should not cross themselves
- Containment: Holes must be entirely within the exterior ring
Common Errors and Solutions
Unclosed Rings
POLYGON((10 10, 20 10, 20 20, 10 20))
POLYGON((10 10, 20 10, 20 20, 10 20, 10 10))
Wrong Ring Orientation
POLYGON((10 10, 10 20, 20 20, 20 10, 10 10))
POLYGON((10 10, 20 10, 20 20, 10 20, 10 10))
Tutorial 3: Building a Web Mapping Application
Scenario
Create a web application that displays spatial data from a database on an interactive map using Leaflet.js.
Backend: Serving GeoJSON from PostGIS
Create a simple API endpoint that converts PostGIS data to GeoJSON:
SQL Query
SELECT
id,
name,
category,
ST_AsGeoJSON(geom) as geometry
FROM points_of_interest
WHERE ST_DWithin(
geom,
ST_GeomFromText('POINT(-74.0060 40.7128)', 4326),
1000 -- 1km radius
);
Node.js API Endpoint
app.get('/api/points', async (req, res) => {
const { lat, lng, radius = 1000 } = req.query;
const query = `
SELECT json_build_object(
'type', 'FeatureCollection',
'features', json_agg(
json_build_object(
'type', 'Feature',
'geometry', ST_AsGeoJSON(geom)::json,
'properties', json_build_object(
'id', id,
'name', name,
'category', category
)
)
)
) as geojson
FROM points_of_interest
WHERE ST_DWithin(geom, ST_GeomFromText($1, 4326), $2)
`;
const result = await db.query(query, [
`POINT(${lng} ${lat})`,
radius
]);
res.json(result.rows[0].geojson);
});
Frontend: Displaying Data with Leaflet
// Initialize the map
const map = L.map('map').setView([40.7128, -74.0060], 12);
// Add tile layer
L.tileLayer('https://{s}.tile.openstreetmap.org/{z}/{x}/{y}.png', {
attribution: '© OpenStreetMap contributors'
}).addTo(map);
// Fetch and display GeoJSON data
async function loadPoints(lat, lng, radius) {
try {
const response = await fetch(
`/api/points?lat=${lat}&lng=${lng}&radius=${radius}`
);
const geojson = await response.json();
L.geoJSON(geojson, {
pointToLayer: function(feature, latlng) {
return L.circleMarker(latlng, {
radius: 8,
fillColor: getColorByCategory(feature.properties.category),
color: '#000',
weight: 1,
opacity: 1,
fillOpacity: 0.8
});
}, onEachFeature: function(feature, layer) {
layer.bindPopup(`
<h3>${feature.properties.name}</h3>
<p>Category: ${feature.properties.category}</p>
`);
}
}).addTo(map);
} catch (error) {
console.error('Error loading points:', error);
}
}
// Helper function for category colors
function getColorByCategory(category) {
const colors = {
'restaurant': '#e74c3c',
'park': '#27ae60',
'school': '#3498db',
'hospital': '#f39c12'
};
return colors[category] || '#95a5a6';
}
Performance Optimization Tips
- Spatial Indexing: Create spatial indexes on geometry columns
- Limit Results: Add LIMIT clauses to prevent large data transfers
- Caching: Cache frequently requested areas
- Clustering: Use marker clustering for dense point data
- Tile-based Loading: Load data based on map viewport
Tutorial 4: Batch Processing with Python
Scenario
Process a large collection of spatial data files and convert them to a unified format for analysis.
Setting Up the Environment
# Install required packages
pip install geopandas shapely fiona
Python Script for Batch Conversion
import geopandas as gpd
import pandas as pd
from shapely.wkt import loads
from pathlib import Path
import json
import os
class SpatialDataConverter:
def __init__(self, input_dir, output_dir):
self.input_dir = Path(input_dir)
self.output_dir = Path(output_dir)
self.output_dir.mkdir(exist_ok=True)
def convert_wkt_to_geojson(self, wkt_file):
"""Convert WKT file to GeoJSON"""
with open(wkt_file, 'r') as f:
wkt_data = f.read().strip().split('\n')
features = []
for i, wkt in enumerate(wkt_data):
if wkt.strip():
try:
geom = loads(wkt)
feature = {
"type": "Feature",
"geometry": json.loads(gpd.GeoSeries([geom]).to_json())['features'][0]['geometry'],
"properties": {"id": i, "source": wkt_file.name}
}
features.append(feature)
except Exception as e:
print(f"Error processing WKT: {wkt[:50]}... - {e}")
geojson = {
"type": "FeatureCollection",
"features": features
}
output_file = self.output_dir / f"{wkt_file.stem}.geojson"
with open(output_file, 'w') as f:
json.dump(geojson, f, indent=2)
return output_file
def convert_csv_with_coordinates(self, csv_file, lat_col='lat', lon_col='lon'):
"""Convert CSV with coordinates to multiple formats"""
df = pd.read_csv(csv_file)
# Create GeoDataFrame
gdf = gpd.GeoDataFrame(
df,
geometry=gpd.points_from_xy(df[lon_col], df[lat_col]),
crs='EPSG:4326'
)
base_name = csv_file.stem
# Export to different formats
outputs = {}
# GeoJSON
geojson_file = self.output_dir / f"{base_name}.geojson"
gdf.to_file(geojson_file, driver='GeoJSON')
outputs['geojson'] = geojson_file
# WKT
wkt_file = self.output_dir / f"{base_name}.wkt"
with open(wkt_file, 'w') as f:
for geom in gdf.geometry:
f.write(f"{geom.wkt}\n")
outputs['wkt'] = wkt_file
# Shapefile
shp_file = self.output_dir / f"{base_name}.shp"
gdf.to_file(shp_file)
outputs['shapefile'] = shp_file
return outputs
def process_directory(self):
"""Process all supported files in input directory"""
results = []
for file_path in self.input_dir.rglob('*'):
if file_path.is_file():
try:
if file_path.suffix.lower() == '.wkt':
output = self.convert_wkt_to_geojson(file_path)
results.append(('WKT->GeoJSON', file_path, output))
elif file_path.suffix.lower() == '.csv':
outputs = self.convert_csv_with_coordinates(file_path)
for format_type, output_path in outputs.items():
results.append((f'CSV->{format_type}', file_path, output_path))
except Exception as e:
print(f"Error processing {file_path}: {e}")
return results
# Usage example
if __name__ == "__main__":
converter = SpatialDataConverter('input_data', 'output_data')
results = converter.process_directory()
print("Conversion Summary:")
for conversion_type, input_file, output_file in results:
print(f"{conversion_type}: {input_file.name} -> {output_file.name}")
Advanced Processing Features
Coordinate System Transformation
# Transform from one CRS to another
gdf_transformed = gdf.to_crs('EPSG:3857') # Web Mercator
# Transform with custom parameters
gdf_utm = gdf.to_crs('+proj=utm +zone=33 +datum=WGS84')
Data Validation and Cleaning
# Check for invalid geometries
invalid_geoms = ~gdf.geometry.is_valid
if invalid_geoms.any():
print(f"Found {invalid_geoms.sum()} invalid geometries")
# Fix invalid geometries
gdf.loc[invalid_geoms, 'geometry'] = gdf.loc[invalid_geoms, 'geometry'].buffer(0)
# Remove duplicate points
gdf_unique = gdf.drop_duplicates(subset=['geometry'])
# Filter by bounding box
bbox = [-74.1, 40.6, -73.9, 40.8] # NYC area
gdf_filtered = gdf.cx[bbox[0]:bbox[2], bbox[1]:bbox[3]]
Tutorial 5: Performance Optimization for Large Datasets
Scenario
Handle millions of spatial records efficiently while maintaining good performance for queries and conversions.
Database Optimization
Spatial Indexing
-- Create spatial index
CREATE INDEX idx_places_geom ON places USING GIST (geom);
-- Create partial index for frequently queried types
CREATE INDEX idx_places_restaurants
ON places USING GIST (geom)
WHERE category = 'restaurant';
-- Analyze table for query planner
ANALYZE places;
Efficient Queries
-- Use bounding box pre-filter before expensive operations
SELECT * FROM places
WHERE geom && ST_MakeEnvelope(-74.1, 40.6, -73.9, 40.8, 4326)
AND ST_DWithin(geom, ST_Point(-74.0060, 40.7128), 1000);
-- Use simplified geometries for large-scale views
SELECT
id,
name,
ST_AsGeoJSON(ST_Simplify(geom, 0.001)) as simplified_geom
FROM administrative_boundaries
WHERE zoom_level <= 8;
Streaming Large Datasets
Python Generator for Memory Efficiency
def stream_geojson_features(query, connection, chunk_size=1000):
"""Stream GeoJSON features to avoid loading all data into memory"""
cursor = connection.cursor()
cursor.execute(query)
while True:
rows = cursor.fetchmany(chunk_size)
if not rows:
break
for row in rows:
feature = {
"type": "Feature",
"geometry": json.loads(row['geom_geojson']),
"properties": {k: v for k, v in row.items() if k != 'geom_geojson'}
}
yield feature
# Usage
def export_large_dataset(output_file, query, connection):
with open(output_file, 'w') as f:
f.write('{"type": "FeatureCollection", "features": [')
first_feature = True
for feature in stream_geojson_features(query, connection):
if not first_feature:
f.write(',')
json.dump(feature, f, separators=(',', ':'))
first_feature = False
f.write(']}')
Parallel Processing
import multiprocessing as mp
from concurrent.futures import ProcessPoolExecutor
import geopandas as gpd
def process_chunk(chunk_data):
"""Process a chunk of spatial data"""
chunk_gdf = gpd.GeoDataFrame(chunk_data)
# Perform operations on chunk
chunk_gdf['area'] = chunk_gdf.geometry.area
chunk_gdf['centroid'] = chunk_gdf.geometry.centroid
return chunk_gdf
def parallel_spatial_processing(input_file, chunk_size=10000):
"""Process large spatial dataset in parallel"""
# Read data in chunks
gdf = gpd.read_file(input_file)
chunks = [gdf.iloc[i:i+chunk_size] for i in range(0, len(gdf), chunk_size)]
# Process chunks in parallel
with ProcessPoolExecutor(max_workers=mp.cpu_count()) as executor:
processed_chunks = list(executor.map(process_chunk, chunks))
# Combine results
result_gdf = gpd.pd.concat(processed_chunks, ignore_index=True)
return result_gdf
Caching Strategies
Redis for Spatial Queries
import redis
import json
import hashlib
class SpatialQueryCache:
def __init__(self, redis_host='localhost', redis_port=6379):
self.redis_client = redis.Redis(host=redis_host, port=redis_port)
self.cache_ttl = 3600 # 1 hour
def get_cache_key(self, query, params):
"""Generate cache key from query and parameters"""
query_string = f"{query}:{json.dumps(params, sort_keys=True)}"
return hashlib.md5(query_string.encode()).hexdigest()
def get_cached_result(self, query, params):
"""Get cached query result"""
cache_key = self.get_cache_key(query, params)
cached_data = self.redis_client.get(cache_key)
if cached_data:
return json.loads(cached_data)
return None
def cache_result(self, query, params, result):
"""Cache query result"""
cache_key = self.get_cache_key(query, params)
self.redis_client.setex(
cache_key,
self.cache_ttl,
json.dumps(result, separators=(',', ':'))
)
Performance Monitoring
import time
import psutil
import logging
class PerformanceMonitor:
def __init__(self):
self.logger = logging.getLogger(__name__)
def __enter__(self):
self.start_time = time.time()
self.start_memory = psutil.virtual_memory().used
return self
def __exit__(self, exc_type, exc_val, exc_tb):
duration = time.time() - self.start_time
memory_used = psutil.virtual_memory().used - self.start_memory
self.logger.info(f"Operation completed in {duration:.2f}s")
self.logger.info(f"Memory used: {memory_used / (1024*1024):.2f} MB")
# Usage
with PerformanceMonitor():
large_gdf = gpd.read_file('large_dataset.geojson')
result = large_gdf.to_crs('EPSG:3857')
result.to_file('output.geojson')
Common Issues and Troubleshooting
Coordinate System Problems
Issue: Points Appearing in Wrong Locations
Symptoms: Mapped points appear in ocean or wrong continent
Causes:
- Latitude/longitude order confusion
- Wrong coordinate reference system
- Missing or incorrect SRID
Solutions:
# Check coordinate order
# WKT: POINT(longitude latitude)
# GeoJSON: [longitude, latitude]
# Verify coordinates are in expected range
longitude: -180 to 180
latitude: -90 to 90
# Transform coordinates if needed
SELECT ST_Transform(geom, 4326) FROM table_name;
Performance Issues
Issue: Slow Spatial Queries
Symptoms: Database queries taking several seconds or minutes
Solutions:
- Add spatial indexes:
CREATE INDEX USING GIST (geom_column)
- Use bounding box pre-filters:
geom && ST_MakeEnvelope(...)
- Simplify geometries for visualization
- Consider spatial partitioning for very large datasets
Data Quality Issues
Issue: Invalid Geometries
Common problems:
- Self-intersecting polygons
- Unclosed polygon rings
- Duplicate consecutive vertices
- Wrong ring orientation
Detection and fixes:
-- Check for invalid geometries
SELECT id, ST_IsValidReason(geom)
FROM table_name
WHERE NOT ST_IsValid(geom);
-- Fix invalid geometries
UPDATE table_name
SET geom = ST_MakeValid(geom)
WHERE NOT ST_IsValid(geom);
Format-Specific Issues
GeoJSON: Large File Sizes
Solutions:
- Reduce coordinate precision to 6 decimal places
- Use TopoJSON for shared boundaries
- Implement tiled delivery for web maps
- Compress files with gzip
WKB: Binary Data Handling
Common issues:
- Endianness problems between systems
- Base64 encoding for text-based protocols
- Version compatibility between libraries
Best Practices Summary
Data Management
- Always validate geometries after conversion
- Maintain coordinate system information
- Use appropriate precision for your use case
- Document data sources and transformations
- Implement proper error handling
Performance
- Create spatial indexes on geometry columns
- Use bounding box filters before expensive operations
- Consider data partitioning for large datasets
- Cache frequently accessed conversions
- Monitor query performance and optimize bottlenecks
Web Applications
- Use GeoJSON for client-side mapping
- Implement progressive loading for large datasets
- Optimize coordinate precision for web display
- Consider tile-based approaches for complex data
- Implement proper error handling for failed conversions
Database Integration
- Store geometries in WKB format for efficiency
- Use WKT for human-readable queries and debugging
- Implement proper spatial reference system handling
- Regular maintenance of spatial indexes
- Monitor database performance metrics
Conclusion
Mastering spatial data format conversion is essential for modern GIS workflows. These tutorials provide practical, real-world examples that you can adapt to your specific needs. Remember that the key to successful spatial data management is:
- Understanding your data: Know the characteristics and requirements of your spatial data
- Choosing the right format: Select formats that optimize for your specific use case
- Implementing proper validation: Always verify data integrity after conversions
- Optimizing for performance: Consider scalability from the beginning
- Planning for maintenance: Implement monitoring and error handling
As you work with these formats more, you'll develop intuition for when to use each format and how to troubleshoot common issues. The examples in these tutorials provide a solid foundation for building robust spatial data processing workflows.