Skip to main content

MeshArrow: Storing Meshes in Arrow Tables

arrow-logo

This page describes "MeshArrow", a set of conventions for storing binary GPU geometries (point clouds and meshes) in Apache Arrow tables.

Design Goals

To be able to store 3D meshes in arrow tables in a well-defined way.

A key realization is that each row can have all its vertices in a POSITIONS column in a List<FixedSizeList<3, double>>.

Relationship to GeoArrow

Mesharrow

  • was inspired by GeoArrow, in terms of being a set of column type and metadata conventions for Apache Arrow.

  • In contrast to GeoArrow which focuses on storing simple 2D geospatial features, MeshArrow focuses on storing GPU renderable geometries (meshes).

  • is independent and can be used separately or together with GeoArrow.

  • GeoArrow allows for simple 2D geometries (points, lines and polygons) to be stored in binary columnar form in Apache Arrow (and Parquet) tables.

  • However, the stored geometries are not GPU renderable (in particular, for polygon fills)

  • GeoArrow objects must often be transformed (e.g. through triangulation) into a form that can be rendered by GPUs, and the resulting format is often no longer stored in Arrow.

Mesharrow allows GeoArrow derived, renderable geometries to be stored in Apache Arrow format, even appended as an additional "MeshArrow" column to the source GeoArrow table.

Relationship to GeoParquet

All the conventions defined by "MeshArrow" can be applied to Parquet tables, so feel free to store mesharrow data in Parquet and think of it as "MeshParquet". In contrast to GeoParquet/GeoArrow, MeshArrow is not a standard. Mesharrow is being defined as a set of conventions for the vis.gl frameworks (loaders.gl, luma.gl, deck.gl etc). If defacto standards similar to "mesharrow" were to emerge, expect vis.gl to adopt and favor those over MeshArrow.

Design Goals

A key realization is that each row can have all its vertices in a POSITIONS column in a List<FixedSizeList<3, double>>.

One mesh

A mesh typically is separated into of a number of primitives with different "materials"

  • "One-mesh-primitive-per-row" approach.
  • "One-mesh-per-row" approach.
  • One-mesh-per-record-batch?
    • Record Batches - The Apache Arrow record batch structure can certainly be used to wrap each mesh in its own record batch. An advantage is that each RecordBatch has its own Data object so there is no need to concatenate all the array buffers from the different meshes being combined into a table.

Feature Ids

Tracking feature ids in geometries is very important

Id typeDescription
row indexprimary feature id for a one-mesh-per-row-table is naturally the row index.
"global" idAn additional column can be added to the table to contain an "arbitrary" global feature id for the row.
Vertex basedAn additional List<Uint32> can contain a per vertex feature id. Not that shader interpolation should be disabled.
Texture basedIf the geometry has uvs a featureTexture can be added, see e.g. EXT_mesh_features.

The EXT_mesh_features glTF extension gives some details around advanced feature id specification.

Meshes that do not fit well into MeshArrow

  • Heavily indexed meshes, where mesh primitives share vertices via indices.

Materials

Full Handling of materials is outside the scope of this MeshArrow proposal. A material involved a bunch of settings that can be JSON encoded, however it also involves a number of a textures that typically need to be parsed into a browser specific object.

Binary data required to create images can be stored in a separate Apache Arrow table. However, textures typically need to be parsed into browser specific objects.


Background

Vertexes

This MeshArrow proposal currently only supports interleaved 3 component, 32 bit floating point vertexes.

type VertexArrowType = arrow.FixedSizeList<3, arrow.Float>;

Attributes

A binary geometry suitable for GPU rendering typically has a number of "attributes", which are just binary columns, suitable as columns in an Apache Arrow table.

  • positions
  • normals
  • uvs
  • colors
  • ...

Topologies

A topology describes how vertexes are interpreted (how primitives are formed) during rendering. Topology is stored in mesharrow metadata for a column. All columns with attribute layout will need to be la

Column Conventions

Row Modes

Mesharrow identifies several modes in which geometries can be stored in tables

  • One position vertex per Arrow Table row.
  • One mesh (list of position vertexes) per Arrow Table row.
Row ModePOSITIONS column typeTopology
vertexFixedSizeList<3, Float*>One vertex per table row
primitiveList<FixedSizeList<3, Float*>>One primitive per arrow table row
meshList<List<FixedSizeList<3, Float*>>>Multiple primitives per arrow table row

Indexes

An index column can be defined, it will be a list of indexes into the POSITIONS column

  • Indexes List<Uint32>

Note: Indexes are not always well supported.

Mesharrow Metadata Conventions

Table / Schema Metadata Conventions include:

{
"geometryColumns": ["..."],
"attributeColumns": ["..."],
"rowMode": "vertex/primitive/mesh"
}

Field Metadata

  • Whether a column represents positions, normals, etc well known attributes
{
"semantic?": "POSITION"
}
  • Whether a column is quantized
{
"quantization?": {
"scale": 1,
"offset": 0
}
}
  • Whether a transform should be applied (e.g. UVs)

Considerations

Expanding/duplicating rows to match geometry

For a table where each row has its own variable length geometry, the GPU either needs other columns to match the length, or it needs to use indirection through row ids.

Row Ids

As each GPU shader can see only the current triangle and its vertices, it is useful to be able to know which row in the arrow table a specific vertex belongs to.

a simple process is to generate a

List<FixedSizeList<Float>> => List<Uint32>

This creates a columnar array where each vertex index references the current row index, and additional looups can be done.

Draw

Instanced Draws

Instanced draws are very useful in data visualization since many visualizations draw a copy of the same geometry (a circle, hexagon, line etc) for each row in the input table. For instanced draws, the geometry is often separated from the data table.

MultiDraw Support

GPU APIs often support MultiDraw operations (typically as an optional extension) https://developer.chrome.com/blog/new-in-webgpu-131#experimental_support_for_multi-draw_indirect

As for multidraw, my understanding is that it is an optimized way to draw all the mesh primitives that share materials etc in a single lower overhead call.

If we store one mesh per row, we could just filter out all the Arrow Table rows that were using the same material, and get the start and end indices from the underlying arrow.Data offsets, and populate the multidraw parameter buffer.


Library Functionality

A mesharrow library can provide a number of transformations to make

getRowIndexAttributeColumn

Create a column

getRowIndexAttributeColumn(table: arrow.Table, columnName: string) => arrow.Vector<List<Uint32>>

getVertexAttribute

Map a normal column to a vertex attribute by duplicating

getVertexAttribute(table: arrow.Table, columnName: string) => arrow.Vector<List<...>>

"GeoArrow" functions

Triangulation

Mesharrow is designed to hold triangulated geometries. Sometimes these geometries need to be generated (triangulated) from other, more complex geometries.

An important example is GeoArrow polygons (these can be very complex with an outer hull and multiple complex holes). A computationally expensive triangulation process needs to be performed that converts the "abstract" polygon hull geometry into a simple list of triangles that a GPU can render.

It is useful to offer support for GeoArrow to MeshArrow triangulation, preserving the full GeoArrow table.

triangulateGeoArrowToMeshArrow(table: arrow.Table, colum) => arrow.Table