Skip to main content

GeoArrow

arrow-logoapache-logo

Overview

GeoArrow is a specification for storing geospatial data in Apache Arrow memory layout. It ensures geospatial tools can interoperate and leverage the growing Apache Arrow ecosystem.

GeoArrow enables each row in an Arrow table to represent a feature as defined by the OGC Simple Feature Access standard (i.e. Point, LineString, Polygon, MultiPoint, MultiLineString, MultiPolygon, and GeometryCollection).

Aside from geometry, simple features can also have additional standard Arrow columns that provide additional non-spatial attributes for the feature.

Geospatial tabular data where one or more columns contains feature geometries and remaining columns define feature attributes. The GeoArrow specification defines how such vector features (geometries) can be stored in Arrow (and Arrow-compatible) data structures.

Note that GeoArrow is not a separate format from Apache Arrow rather, the GeoArrow specification simply describes additional conventions for metadata and layout of geospatial data. This means that a valid GeoArrow file is always a valid Arrow file. This is done through Arrow extension type definitions that ensure type-level metadata (e.g., CRS) is propagated when used in Arrow implementations.

Geometry Types

Geometry typeReadWriteDescription
geoarrow.point
geoarrow.multipoint
geoarrow.linestring
geoarrow.multilinestring
geoarrow.polygon
geoarrow.multipolygon
geoarrow.wkbWKB also supported
geoarrow.wktWKT also supported

Relationship with GeoParquet

The GeoParquet specification is closely related to GeoArrow. Notable differences:

  • GeoParquet is a file-level metadata specification
  • GeoArrow is a field-level metadata and memory layout specification