Arquero Test
const TERCdata = FileAttachment("data/TERC_Urzedowy_2024-06-21.csv").text()
display(TERCdata)
Arquero table from CSV
As you see below, data in the column WOJ is correctly parsed as strings.
const nullIfEmpty = s => s.length ? s : null
const TERC = aq.fromCSV(TERCdata, {
delimiter: ";",
parse: {
WOJ: String,
POW: nullIfEmpty,
GMI: nullIfEmpty,
RODZ: nullIfEmpty,
NAZWA: nullIfEmpty,
NAZWA_DOD: nullIfEmpty,
STAN_NA: d => new Date(Date.parse(d))
}
}).select(aq.not("STAN_NA"))
display(TERC.objects())
display(Inputs.table(TERC))
Arrow file from Arquero table
I don't think it follows from the Arquero reference that one needs to use type
options when exporting to Arrow — unless you want specific data types,
like UInt32 instead of Int64, say.
The arrow file is made by the following data loader:
import * as aq from "arquero"
import { Type } from "apache-arrow"
import {readFile} from "node:fs/promises"
import {fileURLToPath} from "node:url"
const nullIfEmpty = s => s.length ? s : null
const TERCdata = await readFile(
fileURLToPath(
import.meta.resolve(
"./TERC_Urzedowy_2024-06-21.csv"
)), "utf-8")
const TERC = aq.fromCSV(TERCdata, {
delimiter: ";",
parse: {
WOJ: String,
POW: nullIfEmpty,
GMI: nullIfEmpty,
RODZ: nullIfEmpty,
NAZWA: nullIfEmpty,
NAZWA_DOD: nullIfEmpty,
STAN_NA: d => new Date(Date.parse(d))
}
}).select(aq.not("STAN_NA"))
const bytes = TERC.toArrowBuffer({
types: {
WOJ: Type.Utf8,
POW: Type.Utf8,
GMI: Type.Utf8,
RODZ: Type.Utf8,
}
})
await process.stdout.write(bytes)
const TERCarrow = FileAttachment("data/TERC_Urzedowy_2024-06-21.arrow").arrow()
Inputs.table(TERCarrow, { layout: "auto" })
const TERCfromArrow = aq.fromArrow(TERCarrow)
display(TERCfromArrow.objects())
The values in column `WOJ` were cast to numbers.