Arquero Test
const TERCdata = FileAttachment("data/TERC_Urzedowy_2024-06-21.csv").text()
display(TERCdata)
Arquero table from CSV
As you see below, data in the column WOJ
is correctly parsed as strings.
const nullIfEmpty = s => s.length ? s : null
const TERC = aq.fromCSV(TERCdata, {
delimiter: ";",
parse: {
WOJ: String,
POW: nullIfEmpty,
GMI: nullIfEmpty,
RODZ: nullIfEmpty,
NAZWA: nullIfEmpty,
NAZWA_DOD: nullIfEmpty,
STAN_NA: d => new Date(Date.parse(d))
}
}).select(aq.not("STAN_NA"))
display(TERC.objects())
display(Inputs.table(TERC))
Arrow file from Arquero table
I don't think it follows from the Arquero reference that one needs to use type
options when exporting to Arrow — unless you want specific data types,
like UInt32
instead of Int64
, say.
The arrow file is made by the following data loader:
import * as aq from "arquero" import { Type } from "apache-arrow" import {readFile} from "node:fs/promises" import {fileURLToPath} from "node:url" const nullIfEmpty = s => s.length ? s : null const TERCdata = await readFile( fileURLToPath( import.meta.resolve( "./TERC_Urzedowy_2024-06-21.csv" )), "utf-8") const TERC = aq.fromCSV(TERCdata, { delimiter: ";", parse: { WOJ: String, POW: nullIfEmpty, GMI: nullIfEmpty, RODZ: nullIfEmpty, NAZWA: nullIfEmpty, NAZWA_DOD: nullIfEmpty, STAN_NA: d => new Date(Date.parse(d)) } }).select(aq.not("STAN_NA")) const bytes = TERC.toArrowBuffer({ types: { WOJ: Type.Utf8, POW: Type.Utf8, GMI: Type.Utf8, RODZ: Type.Utf8, } }) await process.stdout.write(bytes)
const TERCarrow = FileAttachment("data/TERC_Urzedowy_2024-06-21.arrow").arrow()
Inputs.table(TERCarrow, { layout: "auto" })
const TERCfromArrow = aq.fromArrow(TERCarrow)
display(TERCfromArrow.objects())
The values in column `WOJ` were cast to numbers.