Logical table as sequence of chunked arrays
The JavaScript Table
class is not part of the Apache Arrow specification as such, but is rather a tool to allow you to work with multiple record batches and array pieces as a single logical dataset.
As a relevant example, we may receive multiple small record batches in a socket stream, then need to concatenate them into contiguous memory for use in NumPy or pandas. The Table object makes this efficient without requiring additional memory copying.
A Table’s columns are instances of Column
, which is a container for one or more arrays of the same type.
Table.new()
accepts an Object
of Columns
or Vectors
, where the keys will be used as the field names for the Schema
:
const i32s = Int32Vector.from([1, 2, 3]);
const f32s = Float32Vector.from([.1, .2, .3]);
const table = Table.new({ i32: i32s, f32: f32s });
assert(table.schema.fields[0].name === 'i32');
It also accepts a a list of Vectors with an optional list of names or Fields for the resulting Schema. If the list is omitted or a name is missing, the numeric index of each Vector will be used as the name:
const i32s = Int32Vector.from([1, 2, 3]);
const f32s = Float32Vector.from([.1, .2, .3]);
const table = Table.new([i32s, f32s], ['i32']);
assert(table.schema.fields[0].name === 'i32');
assert(table.schema.fields[1].name === '1');
If the supplied arguments are Column
instances, Table.new
will infer the Schema
from the Column
s:
const i32s = Column.new('i32', Int32Vector.from([1, 2, 3]));
const f32s = Column.new('f32', Float32Vector.from([.1, .2, .3]));
const table = Table.new(i32s, f32s);
assert(table.schema.fields[0].name === 'i32');
assert(table.schema.fields[1].name === 'f32');
If the supplied Vector or Column lengths are unequal, Table.new
will
extend the lengths of the shorter Columns, allocating additional bytes
to represent the additional null slots. The memory required to allocate
these additional bitmaps can be computed as:
let additionalBytes = 0;
for (let vec in shorter_vectors) {
additionalBytes += (((longestLength - vec.length) + 63) & ~63) >> 3;
}
For example, an additional null bitmap for one million null values would require 125,000
bytes (((1e6 + 63) & ~63) >> 3
), or approx. 0.11MiB
Table
extends Chunked
Creates an empty table
Creates an empty table
Promise<RecordBatchReader>
): Promise<Table>
Promise<Table>
Type safe constructors. Functionally equivalent to calling new Table()
with the same arguments, however if using Typescript using the new
method instead will ensure that types inferred from the arguments "flow through" into the return Table type.
The Schema
of this table.
The number of rows in this table.
TBD: this does not consider filters
The list of chunks in this table.
The number of columns in this table.
The schema will be inferred from the record batches.
The schema will be inferred from the record batches.
Create a new Table
from a collection of Columns
or Vectors
, with an optional list of names or Fields
.
TBD
Returns a new copy of this table.
Gets a column by index.
Gets a column by name
Returns the index of the column with name name
.
TBD
Returns a Uint8Array
that contains an encoding of all the data in the table.
Note: Passing the returned data back into Table.from()
creates a "deep clone" of the table.
TBD - Returns the number of elements.
Returns a new Table with the specified subset of columns, in the specified order.
Returns a new Table that contains two columns (values
and counts
).