record-batch-writer
RecordBatchWriter
This documentation reflects Arrow JS v4.0. Needs to be updated for the new Arrow API in v9.0 +.
The RecordBatchWriter
"serializes" Arrow Tables (or streams of RecordBatches) to the Arrow File, Stream, or JSON representations for inter-process communication (see also: Arrow IPC format docs).
The RecordBatchWriter is conceptually a "transform" stream that transforms Tables or RecordBatches into binary Uint8Array
chunks that represent the Arrow IPC messages (Schema
, DictionaryBatch
, RecordBatch
, and in the case of the File format, Footer
messages).
These binary chunks are buffered inside the RecordBatchWriter
instance until they are consumed, typically by piping the RecordBatchWriter instance to a Writable Stream (like a file or socket), enumerating the chunks via async-iteration, or by calling toUint8Array()
to create a single contiguous buffer of the concatenated results once the desired Tables or RecordBatches have been written.
RecordBatchWriter conforms to the AsyncIterableIterator
protocol in all environments, and supports two additional stream primitives based on the environment (nodejs or browsers) available at runtime.
- In nodejs, the
RecordBatchWriter
can be converted to aReadableStream
, piped to aWritableStream
, and has a static method that returns aTransformStream
suitable in chainedpipe
calls. - browser environments that support the DOM/WhatWG Streams Standard, corresponding methods exist to convert
RecordBatchWriters
to the DOMReadableStream
,WritableStream
, andTransformStream
variants.
Note: The Arrow JSON representation is not suitable as an IPC mechanism in real-world scenarios. It is used inside the Arrow project as a human-readable debugging tool and for validating interoperability between each language's separate implementation of the Arrow library.
Member Fields
closed: Promise (readonly)
A Promise which resolves when this RecordBatchWriter
is closed.
Static Methods
RecordBatchWriter.throughNode(options?: Object): DuplexStream
Creates a Node.js TransformStream
that transforms an input ReadableStream
of Tables or RecordBatches into a stream of Uint8Array
Arrow Message chunks.
options.autoDestroy
: boolean - (default:true
) Indicates whether the RecordBatchWriter should close after writing the first logical stream of RecordBatches (batches which all share the same Schema), or should continue and reset each time it encounters a new Schema.options.*
- Any Node Duplex stream options can be supplied
Returns: A Node.js Duplex stream
Example:
const fs = require('fs');
const { PassThrough, finished } = require('stream');
const { Table, RecordBatchWriter } = require('apache-arrow');
const table = Table.new({
i32: Int32Vector.from([1, 2, 3]),
f32: Float32Vector.from([1.0, 1.5, 2.0]),
});
const source = new PassThrough({ objectMode: true });
const result = source
.pipe(RecordBatchWriter.throughNode())
.pipe(fs.createWriteStream('table.arrow'));
source.write(table);
source.end();
finished(result, () => console.log('done writing table.arrow'));
RecordBatchWriter.throughDOM(writableStrategy? : Object, readableStrategy? : Object) : Object
Creates a DOM/WhatWG ReadableStream
/WritableStream
pair that together transforms an input ReadableStream
of Tables or RecordBatches into a stream of Uint8Array
Arrow Message chunks.
options.autoDestroy
: boolean - (default:true
) Indicates whether the RecordBatchWriter should close after writing the first logical stream of RecordBatches (batches which all share the same Schema), or should continue and reset each time it encounters a new Schema.writableStrategy.*
= - Any options forQueuingStrategy<RecordBatch>
readableStrategy.highWaterMark
? : NumberreadableStrategy.size
?: Number
Returns: an object with the following fields:
writable
:WritableStream<Table | RecordBatch>
readable
:ReadableStream<Uint8Array>
Methods
constructor(options? : Object)
options.autoDestroy
: boolean -
toString(sync: Boolean): string | Promise<string>
toUint8Array(sync: Boolean): Uint8Array | Promise<Uint8Array>
writeAll(input: Table | Iterable<RecordBatch>): this
writeAll(input: AsyncIterable<RecordBatch>
): Promise<this>
writeAll(input: PromiseLike<AsyncIterable<RecordBatch>>): Promise<this>
writeAll(input: PromiseLike<Table | Iterable<RecordBatch>>): Promise<this>
[Symbol.asyncIterator](): AsyncByteQueue<Uint8Array>
Returns An async iterator that produces Uint8Arrays.
toDOMStream(options?: Object): ReadableStream<Uint8Array>
Returns a new DOM/WhatWG stream that can be used to read the Uint8Array chunks produced by the RecordBatchWriter
options
- passed through to the DOM ReadableStream constructor, any DOM ReadableStream options.
toNodeStream(options?: Object): Readable
options
- passed through to the Node ReadableStream constructor, any Node ReadableStream options.
close() : void
Close the RecordBatchWriter. After close is called, no more chunks can be written.
abort(reason?: any) : void
finish() : this
reset(sink?: WritableSink<ArrayBufferViewInput>, schema?: Schema | null): this
Change the sink
write(payload?: Table | RecordBatch | Iterable<Table> | Iterable<RecordBatch> | null): void
Writes a RecordBatch
or all the RecordBatches from a Table
.
Remarks
- Just like the
RecordBatchReader
, aRecordBatchWriter
is a factory base class that returns an instance of the subclass appropriate to the situation:RecordBatchStreamWriter
,RecordBatchFileWriter
,RecordBatchJSONWriter