← All Projects

oxtract

Open Source / Developer Tool

A high-performance Oracle unload utility written in Go that exports query results directly via OCI — targeting ≥500K rows/sec and ≥150 MB/s throughput.

GoOracleParquetDocker

oxtract is a command-line tool I wrote in Go to replace slow JDBC-based export utilities. It connects directly to Oracle via OCI using the godror driver, eliminating the Java/JDBC overhead and allowing bulk data extraction at speeds that match or exceed traditional unload tools.

Why I Built It

Large-scale data pipelines often need to extract hundreds of millions of rows from Oracle as fast as possible. Most tools use JDBC, which introduces significant latency and GC overhead. oxtract bypasses all of that with a direct OCI connection, prefetch tuning, and a parallel pipeline that keeps both the database fetch and format workers saturated.

Output Formats

Supports CSV (RFC 4180), TSV, and Parquet. Parquet output carries full type fidelity: NUMBER(p,0) maps to INT64, NUMBER(p,s) to DECIMAL(p,s), unconstrained NUMBER to STRING, and DATE/TIMESTAMP to microsecond-precision timestamps. Compression options include gzip, zstd, and Snappy (Parquet default).

Pipeline Architecture

Rows are fetched in configurable array batches and passed through a parallel format worker pool. Output is written in original row order regardless of parallelism. File splitting by row count (--split-rows) or byte size (--split-bytes) produces numbered files with a {seq} placeholder.

Incremental Extraction

A watermark-based incremental mode persists the last seen value of a chosen column to a state file. On subsequent runs, only rows newer than the saved watermark are fetched. Composite keys are supported for non-unique leading watermarks.

Observability

Progress is reported on stderr (rows/sec, bytes written). JSON-structured log output is available for ingestion by Loki or Elasticsearch. An optional Prometheus metrics endpoint (--metrics-addr) exposes counters and gauges for the lifetime of the run.

Distribution

A multi-stage Dockerfile builds and packages the binary with Oracle Instant Client baked in, producing a minimal debian-slim runtime image. A PowerShell packaging script creates a Windows zip with the binary and a launcher batch file. Prebuilt images are published to GHCR on each tagged release.

Technologies

Go 1.22+, godror (CGO/OCI), Apache Parquet (parquet-go), gzip/zstd, Prometheus client, Docker, Oracle Instant Client