The RecordBatchReader is the IPC reader for reading chunks from a stream or file
The JavaScript API supports streaming multiple arrow tables over a single socket.
To read all batches from all tables in a data source:
const readers = RecordBatchReader.readAll(fetch(path, {credentials: 'omit'}));
for await (const reader of readers) {
for await (const batch of reader) {
console.log(batch.length);
}
}
If you only have one table (the normal case), then there'll only be one RecordBatchReader/the outer loop will only execute once. You can also create just one reader via
const reader = await RecordBatchReader.from(fetch(path, {credentials: 'omit'}));
AsyncIterable<RecordBatchReader>
Reads all batches from all tables in the data source.
data
The RecordBatchReader.from
method will also detect which physical representation it's working with (Streaming or File), and will return either a RecordBatchFileReader
or RecordBatchStreamReader
accordingly.
Remarks:
application/octet-stream
You can also turn the RecordBatchReader into a stream if you're in node, you can use either toNodeStream() or call the pipe(writable) methods
in the browser (assuming you're using the UMD or "browser" fields in webpack), you can call
In the browser (assuming you're using the UMD or "browser" fields in webpack), you can call toDOMStream()
or pipeTo(writable)
/pipeThrough(transform)
You can also create a transform stream directly, instead of using RecordBatchReader.from()
You can also create a transform stream directly, instead of using RecordBatchReader.from()
via throughNode()
and throughDOM()
respectively:
By default the transform streams will only read one table from the source readable stream and then close, but you can change this behavior by passing { autoDestroy: false }
to the transform creation methods
readAll()
) is technically an extension in the JavaScript Arrow API compared to the Arrow C++ API. The authors found it was useful to be able to send multiple tables over the same physical socket
so they built the ability to keep the underlying socket open and read more than one table from a stream.Schema
, DictionaryBatch
, RecordBatch
, or Tensor
(which we don't support yet). The Streaming format is just a sequence of messages with Schema first, then n
DictionaryBatches
, then m
RecordBatches
.