GIS Format Conversion Tutorials

Practical tutorials and real-world examples for spatial data conversion

Getting Started with Spatial Data Conversion

Converting between spatial data formats is a fundamental skill for anyone working with geographic information systems. This tutorial series will guide you through practical examples and common scenarios you'll encounter when working with WKT, WKB, and GeoJSON formats.

What You'll Learn

How to identify different spatial data formats
When to use each format for optimal results
Step-by-step conversion procedures
Common pitfalls and how to avoid them
Integration with popular tools and platforms

Tutorial 1: Converting GPS Points from CSV to Multiple Formats

Scenario

You have a CSV file with GPS coordinates from a field survey and need to convert them to different formats for various applications:

WKT for database storage
WKB for efficient querying
GeoJSON for web visualization

Sample Data

Your CSV file contains the following data:

point_id,latitude,longitude,description
1,40.7128,-74.0060,"New York City Hall"
2,34.0522,-118.2437,"Los Angeles City Hall"
3,41.8781,-87.6298,"Chicago City Hall"

Step 1: Create WKT Points

For each coordinate pair, create a WKT POINT geometry:

POINT(-74.0060 40.7128)  # New York City Hall
POINT(-118.2437 34.0522) # Los Angeles City Hall
POINT(-87.6298 41.8781)  # Chicago City Hall

Step 2: Convert to GeoJSON

Transform the points into a GeoJSON FeatureCollection:

{
  "type": "FeatureCollection",
  "features": [
    {
      "type": "Feature",
      "geometry": {
        "type": "Point",
        "coordinates": [-74.0060, 40.7128]
      },
      "properties": {
        "point_id": 1,
        "description": "New York City Hall"
      }
    },
    {
      "type": "Feature",
      "geometry": {
        "type": "Point",
        "coordinates": [-118.2437, 34.0522]
      },
      "properties": {
        "point_id": 2,
        "description": "Los Angeles City Hall"
      }
    }
  ]
}

Step 3: Database Integration

Use WKB format for efficient database storage. In PostGIS:

CREATE TABLE city_halls (
    id SERIAL PRIMARY KEY,
    point_id INTEGER,
    description TEXT,
    geom GEOMETRY(POINT, 4326)
);

INSERT INTO city_halls (point_id, description, geom) VALUES
(1, 'New York City Hall', ST_GeomFromText('POINT(-74.0060 40.7128)', 4326)),
(2, 'Los Angeles City Hall', ST_GeomFromText('POINT(-118.2437 34.0522)', 4326));

Key Takeaways

Always use longitude, latitude order for WKT and database storage
GeoJSON uses [longitude, latitude] array format
Include SRID (Spatial Reference ID) when working with databases
Validate coordinates are within valid ranges (-180 to 180 for longitude, -90 to 90 for latitude)

Tutorial 2: Working with Complex Polygons

Scenario

You need to define a complex polygon area (like a park with a lake inside) and convert it between formats for different applications.

Creating the Polygon with Hole

A polygon with a hole requires careful definition of exterior and interior rings:

WKT Format

POLYGON((
  10 10, 20 10, 20 20, 10 20, 10 10),
  (12 12, 18 12, 18 18, 12 18, 12 12)
)

Note: The first ring is the exterior boundary (counterclockwise), the second ring is the hole (clockwise).

GeoJSON Format

{
  "type": "Polygon",
  "coordinates": [
    [
      [10, 10], [20, 10], [20, 20], [10, 20], [10, 10]
    ],
    [
      [12, 12], [18, 12], [18, 18], [12, 18], [12, 12]
    ]
  ]
}

Validation Steps

Always validate complex polygons:

Ring Closure: Ensure first and last coordinates are identical
Ring Orientation: Exterior rings counterclockwise, holes clockwise
Self-Intersection: Rings should not cross themselves
Containment: Holes must be entirely within the exterior ring

Common Errors and Solutions

Unclosed Rings

❌ Incorrect:
POLYGON((10 10, 20 10, 20 20, 10 20))

✅ Correct:
POLYGON((10 10, 20 10, 20 20, 10 20, 10 10))

Wrong Ring Orientation

❌ Incorrect (clockwise exterior):
POLYGON((10 10, 10 20, 20 20, 20 10, 10 10))

✅ Correct (counterclockwise exterior):
POLYGON((10 10, 20 10, 20 20, 10 20, 10 10))

Tutorial 3: Building a Web Mapping Application

Scenario

Create a web application that displays spatial data from a database on an interactive map using Leaflet.js.

Backend: Serving GeoJSON from PostGIS

Create a simple API endpoint that converts PostGIS data to GeoJSON:

SQL Query

SELECT 
    id,
    name,
    category,
    ST_AsGeoJSON(geom) as geometry
FROM points_of_interest
WHERE ST_DWithin(
    geom, 
    ST_GeomFromText('POINT(-74.0060 40.7128)', 4326),
    1000  -- 1km radius
);

Node.js API Endpoint

app.get('/api/points', async (req, res) => {
  const { lat, lng, radius = 1000 } = req.query;
  
  const query = `
    SELECT json_build_object(
      'type', 'FeatureCollection',
      'features', json_agg(
        json_build_object(
          'type', 'Feature',
          'geometry', ST_AsGeoJSON(geom)::json,
          'properties', json_build_object(
            'id', id,
            'name', name,
            'category', category
          )
        )
      )
    ) as geojson
    FROM points_of_interest
    WHERE ST_DWithin(geom, ST_GeomFromText($1, 4326), $2)
  `;
  
  const result = await db.query(query, [
    `POINT(${lng} ${lat})`,
    radius
  ]);
  
  res.json(result.rows[0].geojson);
});

Frontend: Displaying Data with Leaflet

// Initialize the map
const map = L.map('map').setView([40.7128, -74.0060], 12);

// Add tile layer
L.tileLayer('https://{s}.tile.openstreetmap.org/{z}/{x}/{y}.png', {
  attribution: '© OpenStreetMap contributors'
}).addTo(map);

// Fetch and display GeoJSON data
async function loadPoints(lat, lng, radius) {
  try {
    const response = await fetch(
      `/api/points?lat=${lat}&lng=${lng}&radius=${radius}`
    );
    const geojson = await response.json();
    
    L.geoJSON(geojson, {
      pointToLayer: function(feature, latlng) {
        return L.circleMarker(latlng, {
          radius: 8,
          fillColor: getColorByCategory(feature.properties.category),
          color: '#000',
          weight: 1,
          opacity: 1,
          fillOpacity: 0.8
        });
            },      onEachFeature: function(feature, layer) {
        layer.bindPopup(`
          <h3>${feature.properties.name}</h3>
          <p>Category: ${feature.properties.category}</p>
        `);
      }
    }).addTo(map);
  } catch (error) {
    console.error('Error loading points:', error);
  }
}

// Helper function for category colors
function getColorByCategory(category) {
  const colors = {
    'restaurant': '#e74c3c',
    'park': '#27ae60',
    'school': '#3498db',
    'hospital': '#f39c12'
  };
  return colors[category] || '#95a5a6';
}

Performance Optimization Tips

Spatial Indexing: Create spatial indexes on geometry columns
Limit Results: Add LIMIT clauses to prevent large data transfers
Caching: Cache frequently requested areas
Clustering: Use marker clustering for dense point data
Tile-based Loading: Load data based on map viewport

Tutorial 4: Batch Processing with Python

Scenario

Process a large collection of spatial data files and convert them to a unified format for analysis.

Setting Up the Environment

# Install required packages
pip install geopandas shapely fiona

Python Script for Batch Conversion

import geopandas as gpd
import pandas as pd
from shapely.wkt import loads
from pathlib import Path
import json
import os

class SpatialDataConverter:
    def __init__(self, input_dir, output_dir):
        self.input_dir = Path(input_dir)
        self.output_dir = Path(output_dir)
        self.output_dir.mkdir(exist_ok=True)
    
    def convert_wkt_to_geojson(self, wkt_file):
        """Convert WKT file to GeoJSON"""
        with open(wkt_file, 'r') as f:
            wkt_data = f.read().strip().split('\n')
        
        features = []
        for i, wkt in enumerate(wkt_data):
            if wkt.strip():
                try:
                    geom = loads(wkt)
                    feature = {
                        "type": "Feature",
                        "geometry": json.loads(gpd.GeoSeries([geom]).to_json())['features'][0]['geometry'],
                        "properties": {"id": i, "source": wkt_file.name}
                    }
                    features.append(feature)
                except Exception as e:
                    print(f"Error processing WKT: {wkt[:50]}... - {e}")
        
        geojson = {
            "type": "FeatureCollection",
            "features": features
        }
        
        output_file = self.output_dir / f"{wkt_file.stem}.geojson"
        with open(output_file, 'w') as f:
            json.dump(geojson, f, indent=2)
        
        return output_file
    
    def convert_csv_with_coordinates(self, csv_file, lat_col='lat', lon_col='lon'):
        """Convert CSV with coordinates to multiple formats"""
        df = pd.read_csv(csv_file)
        
        # Create GeoDataFrame
        gdf = gpd.GeoDataFrame(
            df, 
            geometry=gpd.points_from_xy(df[lon_col], df[lat_col]),
            crs='EPSG:4326'
        )
        
        base_name = csv_file.stem
        
        # Export to different formats
        outputs = {}
        
        # GeoJSON
        geojson_file = self.output_dir / f"{base_name}.geojson"
        gdf.to_file(geojson_file, driver='GeoJSON')
        outputs['geojson'] = geojson_file
        
        # WKT
        wkt_file = self.output_dir / f"{base_name}.wkt"
        with open(wkt_file, 'w') as f:
            for geom in gdf.geometry:
                f.write(f"{geom.wkt}\n")
        outputs['wkt'] = wkt_file
        
        # Shapefile
        shp_file = self.output_dir / f"{base_name}.shp"
        gdf.to_file(shp_file)
        outputs['shapefile'] = shp_file
        
        return outputs
    
    def process_directory(self):
        """Process all supported files in input directory"""
        results = []
        
        for file_path in self.input_dir.rglob('*'):
            if file_path.is_file():
                try:
                    if file_path.suffix.lower() == '.wkt':
                        output = self.convert_wkt_to_geojson(file_path)
                        results.append(('WKT->GeoJSON', file_path, output))
                    
                    elif file_path.suffix.lower() == '.csv':
                        outputs = self.convert_csv_with_coordinates(file_path)
                        for format_type, output_path in outputs.items():
                            results.append((f'CSV->{format_type}', file_path, output_path))
                
                except Exception as e:
                    print(f"Error processing {file_path}: {e}")
        
        return results

# Usage example
if __name__ == "__main__":
    converter = SpatialDataConverter('input_data', 'output_data')
    results = converter.process_directory()
    
    print("Conversion Summary:")
    for conversion_type, input_file, output_file in results:
        print(f"{conversion_type}: {input_file.name} -> {output_file.name}")

Advanced Processing Features

Coordinate System Transformation

# Transform from one CRS to another
gdf_transformed = gdf.to_crs('EPSG:3857')  # Web Mercator

# Transform with custom parameters
gdf_utm = gdf.to_crs('+proj=utm +zone=33 +datum=WGS84')

Data Validation and Cleaning

# Check for invalid geometries
invalid_geoms = ~gdf.geometry.is_valid
if invalid_geoms.any():
    print(f"Found {invalid_geoms.sum()} invalid geometries")
    # Fix invalid geometries
    gdf.loc[invalid_geoms, 'geometry'] = gdf.loc[invalid_geoms, 'geometry'].buffer(0)

# Remove duplicate points
gdf_unique = gdf.drop_duplicates(subset=['geometry'])

# Filter by bounding box
bbox = [-74.1, 40.6, -73.9, 40.8]  # NYC area
gdf_filtered = gdf.cx[bbox[0]:bbox[2], bbox[1]:bbox[3]]

Tutorial 5: Performance Optimization for Large Datasets

Scenario

Handle millions of spatial records efficiently while maintaining good performance for queries and conversions.

Database Optimization

Spatial Indexing

-- Create spatial index
CREATE INDEX idx_places_geom ON places USING GIST (geom);

-- Create partial index for frequently queried types
CREATE INDEX idx_places_restaurants 
ON places USING GIST (geom) 
WHERE category = 'restaurant';

-- Analyze table for query planner
ANALYZE places;

Efficient Queries

-- Use bounding box pre-filter before expensive operations
SELECT * FROM places 
WHERE geom && ST_MakeEnvelope(-74.1, 40.6, -73.9, 40.8, 4326)
  AND ST_DWithin(geom, ST_Point(-74.0060, 40.7128), 1000);

-- Use simplified geometries for large-scale views
SELECT 
    id, 
    name,
    ST_AsGeoJSON(ST_Simplify(geom, 0.001)) as simplified_geom
FROM administrative_boundaries
WHERE zoom_level <= 8;

Streaming Large Datasets

Python Generator for Memory Efficiency

def stream_geojson_features(query, connection, chunk_size=1000):
    """Stream GeoJSON features to avoid loading all data into memory"""
    
    cursor = connection.cursor()
    cursor.execute(query)
    
    while True:
        rows = cursor.fetchmany(chunk_size)
        if not rows:
            break
            
        for row in rows:
            feature = {
                "type": "Feature",
                "geometry": json.loads(row['geom_geojson']),
                "properties": {k: v for k, v in row.items() if k != 'geom_geojson'}
            }
            yield feature

# Usage
def export_large_dataset(output_file, query, connection):
    with open(output_file, 'w') as f:
        f.write('{"type": "FeatureCollection", "features": [')
        
        first_feature = True
        for feature in stream_geojson_features(query, connection):
            if not first_feature:
                f.write(',')
            json.dump(feature, f, separators=(',', ':'))
            first_feature = False
        
        f.write(']}')

Parallel Processing

import multiprocessing as mp
from concurrent.futures import ProcessPoolExecutor
import geopandas as gpd

def process_chunk(chunk_data):
    """Process a chunk of spatial data"""
    chunk_gdf = gpd.GeoDataFrame(chunk_data)
    
    # Perform operations on chunk
    chunk_gdf['area'] = chunk_gdf.geometry.area
    chunk_gdf['centroid'] = chunk_gdf.geometry.centroid
    
    return chunk_gdf

def parallel_spatial_processing(input_file, chunk_size=10000):
    """Process large spatial dataset in parallel"""
    
    # Read data in chunks
    gdf = gpd.read_file(input_file)
    chunks = [gdf.iloc[i:i+chunk_size] for i in range(0, len(gdf), chunk_size)]
    
    # Process chunks in parallel
    with ProcessPoolExecutor(max_workers=mp.cpu_count()) as executor:
        processed_chunks = list(executor.map(process_chunk, chunks))
    
    # Combine results
    result_gdf = gpd.pd.concat(processed_chunks, ignore_index=True)
    return result_gdf

Caching Strategies

Redis for Spatial Queries

import redis
import json
import hashlib

class SpatialQueryCache:
    def __init__(self, redis_host='localhost', redis_port=6379):
        self.redis_client = redis.Redis(host=redis_host, port=redis_port)
        self.cache_ttl = 3600  # 1 hour
    
    def get_cache_key(self, query, params):
        """Generate cache key from query and parameters"""
        query_string = f"{query}:{json.dumps(params, sort_keys=True)}"
        return hashlib.md5(query_string.encode()).hexdigest()
    
    def get_cached_result(self, query, params):
        """Get cached query result"""
        cache_key = self.get_cache_key(query, params)
        cached_data = self.redis_client.get(cache_key)
        
        if cached_data:
            return json.loads(cached_data)
        return None
    
    def cache_result(self, query, params, result):
        """Cache query result"""
        cache_key = self.get_cache_key(query, params)
        self.redis_client.setex(
            cache_key, 
            self.cache_ttl, 
            json.dumps(result, separators=(',', ':'))
        )

Performance Monitoring

import time
import psutil
import logging

class PerformanceMonitor:
    def __init__(self):
        self.logger = logging.getLogger(__name__)
    
    def __enter__(self):
        self.start_time = time.time()
        self.start_memory = psutil.virtual_memory().used
        return self
    
    def __exit__(self, exc_type, exc_val, exc_tb):
        duration = time.time() - self.start_time
        memory_used = psutil.virtual_memory().used - self.start_memory
        
        self.logger.info(f"Operation completed in {duration:.2f}s")
        self.logger.info(f"Memory used: {memory_used / (1024*1024):.2f} MB")

# Usage
with PerformanceMonitor():
    large_gdf = gpd.read_file('large_dataset.geojson')
    result = large_gdf.to_crs('EPSG:3857')
    result.to_file('output.geojson')

Common Issues and Troubleshooting

Coordinate System Problems

Issue: Points Appearing in Wrong Locations

Symptoms: Mapped points appear in ocean or wrong continent

Causes:

Latitude/longitude order confusion
Wrong coordinate reference system
Missing or incorrect SRID

Solutions:

# Check coordinate order
# WKT: POINT(longitude latitude)
# GeoJSON: [longitude, latitude]

# Verify coordinates are in expected range
longitude: -180 to 180
latitude: -90 to 90

# Transform coordinates if needed
SELECT ST_Transform(geom, 4326) FROM table_name;

Performance Issues

Issue: Slow Spatial Queries

Symptoms: Database queries taking several seconds or minutes

Solutions:

Add spatial indexes: CREATE INDEX USING GIST (geom_column)
Use bounding box pre-filters: geom && ST_MakeEnvelope(...)
Simplify geometries for visualization
Consider spatial partitioning for very large datasets

Data Quality Issues

Issue: Invalid Geometries

Common problems:

Self-intersecting polygons
Unclosed polygon rings
Duplicate consecutive vertices
Wrong ring orientation

Detection and fixes:

-- Check for invalid geometries
SELECT id, ST_IsValidReason(geom) 
FROM table_name 
WHERE NOT ST_IsValid(geom);

-- Fix invalid geometries
UPDATE table_name 
SET geom = ST_MakeValid(geom) 
WHERE NOT ST_IsValid(geom);

Format-Specific Issues

GeoJSON: Large File Sizes

Solutions:

Reduce coordinate precision to 6 decimal places
Use TopoJSON for shared boundaries
Implement tiled delivery for web maps
Compress files with gzip

WKB: Binary Data Handling

Common issues:

Endianness problems between systems
Base64 encoding for text-based protocols
Version compatibility between libraries

Best Practices Summary

Data Management

Always validate geometries after conversion
Maintain coordinate system information
Use appropriate precision for your use case
Document data sources and transformations
Implement proper error handling

Performance

Create spatial indexes on geometry columns
Use bounding box filters before expensive operations
Consider data partitioning for large datasets
Cache frequently accessed conversions
Monitor query performance and optimize bottlenecks

Web Applications

Use GeoJSON for client-side mapping
Implement progressive loading for large datasets
Optimize coordinate precision for web display
Consider tile-based approaches for complex data
Implement proper error handling for failed conversions

Database Integration

Store geometries in WKB format for efficiency
Use WKT for human-readable queries and debugging
Implement proper spatial reference system handling
Regular maintenance of spatial indexes
Monitor database performance metrics

Conclusion

Mastering spatial data format conversion is essential for modern GIS workflows. These tutorials provide practical, real-world examples that you can adapt to your specific needs. Remember that the key to successful spatial data management is:

Understanding your data: Know the characteristics and requirements of your spatial data
Choosing the right format: Select formats that optimize for your specific use case
Implementing proper validation: Always verify data integrity after conversions
Optimizing for performance: Consider scalability from the beginning
Planning for maintenance: Implement monitoring and error handling

As you work with these formats more, you'll develop intuition for when to use each format and how to troubleshoot common issues. The examples in these tutorials provide a solid foundation for building robust spatial data processing workflows.