Fetching latest headlines…
csv-peek — A 700-Line Rust CLI That Pretty-Prints CSV in the Terminal With a Hand-Rolled RFC 4180 Parser, Type Inference, and One Dependency
NORTH AMERICA
🇺🇸 United StatesMay 8, 2026

csv-peek — A 700-Line Rust CLI That Pretty-Prints CSV in the Terminal With a Hand-Rolled RFC 4180 Parser, Type Inference, and One Dependency

0 views0 likes0 comments
Originally published byDev.to

cat customers.csv and the columns slide off the right edge of your terminal. You don't really want to open a spreadsheet just to spot-check eight rows. csv-peek customers.csv prints an aligned table in 1 ms — numbers right-aligned in cyan, dates in yellow, booleans in magenta, the embedded comma in "hello, world" not breaking alignment because the parser actually understands quoting. Single dependency (clap), sub-1 MB stripped binary, 51 tests.

csv-peek rendering an 8-row sample CSV in a terminal: id (cyan, right-aligned), name, email, signup_date (yellow), active (magenta), balance (cyan, right-aligned), note. Box-drawing borders enclose the table. Below:

📦 GitHub: https://github.com/sen-ltd/csv-peek

$ cat customers.csv
id,name,email,signup_date,active,balance,note
1,Alice Suzuki,[email protected],2024-01-15,true,1280.50,"first 10 customers"
2,Bob Tanaka,[email protected],2024-02-03,true,42.00,
…

$ csv-peek customers.csv
┌────┬──────────────┬───────────────────┬─────────────┬────────┬─────────┬────────────────────┐
│ id │ name         │ email             │ signup_date │ active │ balance │ note               │
├────┼──────────────┼───────────────────┼─────────────┼────────┼─────────┼────────────────────┤
│  1 │ Alice Suzuki │ [email protected] │ 2024-01-15  │ true   │ 1280.50 │ first 10 customers │
│  2 │ Bob Tanaka   │ [email protected]   │ 2024-02-03  │ true   │   42.00 │                    │
…

Why hand-roll the parser

Rust has a mature csv crate. For real production work, that's the answer. Two reasons to hand-roll anyway:

  1. The dependency tree stays at one entry. cargo build --release finishes in 9 seconds, the stripped Alpine binary is ~600 KB, and the CI matrix is one cell wide.
  2. The state machine is the article. RFC 4180 is short — four rules — and walking through what "", \r\n, and "foo\nbar" actually mean in a CSV is more useful than csv::Reader::from_path() is.

The whole parser is 200 lines in src/csv.rs. I'll quote the core machine and the bits that I find most often misimplemented.

The four-state machine

enum State {
    Start,         // between fields
    Unquoted,      // inside an unquoted field
    Quoted,        // inside a "..." field
    QuotedQuote,   // saw a `"` inside a quoted field
}

QuotedQuote is the interesting one. Once you've seen a " inside a quoted field, you can't tell yet whether it was the closing quote or the first half of an escaped pair (""). You need the next byte to decide:

State::QuotedQuote => match b {
    b'"' => {
        // It was `""` — emit a literal `"` and stay in Quoted.
        field.push(b'"');
        state = State::Quoted;
    }
    c if c == self.delim => {
        // Closing quote followed by delimiter — field ends here.
        fields.push(...);
        state = State::Start;
    }
    b'\r' | b'\n' => {
        // Closing quote followed by newline — record ends here.
        return Ok(Some(...));
    }
    other => {
        // Malformed input like `"foo"bar`. Strict parsers Err here;
        // we'd rather render *something*, so we recover by appending
        // the byte and falling back to Unquoted.
        field.push(other);
        state = State::Unquoted;
    }
}

The recovery branch matters because csv-peek's job is to let you see what's there, even when "what's there" is broken. "foo"bar\n parses to foobar. A strict parser would refuse the line and the user would never see what was wrong with it.

CRLF / LF / CR — sometimes mixed in the same file

CSV inherits the OS-line-ending mess: classic Mac (\r), Unix (\n), Windows (\r\n), and Excel-on-Mac will sometimes mix them in a single file. The parser needs one-byte lookahead:

b'\r' => {
    fields.push(...);
    if let Some(nb) = self.next_byte()? {
        if nb != b'\n' {
            self.pos -= 1;   // peek failed; push it back
        }
    }
    return Ok(Some(...));
}

Implementing peek over a streaming BufReader was the part of the parser I rewrote three times. The version that shipped is a manual circular-ish buffer (Vec<u8> + pos) inside Reader<R: Read>, refilled in 64 KB chunks. The lookahead is just pos -= 1.

Per-column type inference

Each column gets one of Empty / Bool / Int / Float / Date / Text by widening across the body sample. The hierarchy is Empty < Bool < Int < Float < Date < Text — once any value forces Text, the column is Text for good.

fn widen(a: ColType, b: ColType) -> ColType {
    if matches!(a, Empty) { return b; }
    if matches!(b, Empty) { return a; }
    if (a == Int && b == Float) || (a == Float && b == Int) {
        return Float;
    }
    if a == b { a } else { Text }
}

So:

  • [1, 2, 3]Int
  • [1, 2.5, 3]Float (Int promotes to Float)
  • ["", 1, "", 2]Int (Empty is absorbed)
  • [1, "alice", 3]Text (mixed → bail)

Each value gets classified by classify(v):

fn classify(v: &str) -> ColType {
    let s = v.trim();
    if s.is_empty()                    { return Empty; }
    if matches_bool(s)                 { return Bool; }
    if s.parse::<i64>().is_ok()        { return Int; }
    if s.parse::<f64>().is_ok() && !s.eq_ignore_ascii_case("nan") {
        return Float;
    }
    if matches_date(s)                 { return Date; }
    Text
}

The NaN exclusion is deliberate: f64::parse accepts "NaN", but a column containing the literal string NaN mixed with numbers is almost certainly not numeric data, and aligning it like a number makes the table look broken.

Date detection is a cheap shape check, not a calendar:

fn matches_date(s: &str) -> bool {
    let bytes = s.as_bytes();
    if bytes.len() < 10 { return false; }
    let dash_or_slash = |b| b == b'-' || b == b'/';
    bytes[0..4].iter().all(|b| b.is_ascii_digit())
        && dash_or_slash(bytes[4])
        && bytes[5..7].iter().all(|b| b.is_ascii_digit())
        && dash_or_slash(bytes[7])
        && bytes[8..10].iter().all(|b| b.is_ascii_digit())
        && (bytes.len() == 10 || matches!(bytes[10], b' ' | b'T'))
}

2024-13-99 is "a date" by this function. That's fine — type inference runs in microseconds, doesn't have to be right, and the wrong answer for a date-shaped string is still better than rendering it as text.

ANSI palette without per-cell branches

Palette accessors return &'static str. When color is disabled, every accessor returns "". The hot loop never branches on a flag — it just write!s zero-length escape codes that the terminal sees as nothing:

pub struct Palette { enabled: bool }

impl Palette {
    pub fn cyan(&self) -> &'static str {
        if self.enabled { "\x1b[36m" } else { "" }
    }
    pub fn reset(&self) -> &'static str {
        if self.enabled { "\x1b[0m" } else { "" }
    }
}

// Render path
write!(w, " {}{}{} {}", palette.cyan(), value, palette.reset(), border)?;

rustc folds the empty &str writes away and the binary ends up branch-free. The pattern came from #137 hexview (the first Rust entry in this portfolio) and has been the house style ever since.

TTY detection without the atty crate

atty is deprecated in favour of std::io::IsTerminal:

use std::io::IsTerminal;

let stdout_is_tty = io::stdout().is_terminal();
let no_color = args.no_color
    || std::env::var_os("NO_COLOR").is_some()
    || !stdout_is_tty;

That's the entire color-detection logic. Pipe to less and color disappears automatically; set NO_COLOR=1 and color disappears manually; pass --no-color and color disappears explicitly. All three converge into one boolean and the Palette fans out from there.

Sniffing the delimiter

When --delim isn't passed, the first line gets scanned for , \t ; |, and whichever appears most often wins:

pub fn guess_delimiter(sample: &str) -> u8 {
    let candidates = [b',', b'\t', b';', b'|'];
    let line = sample.lines().next().unwrap_or("");
    let mut best = (b',', 0usize);
    for &c in &candidates {
        let count = line.bytes().filter(|&b| b == c).count();
        if count > best.1 {
            best = (c, count);
        }
    }
    best.0
}

Ties default to ,. A single-column file (zero of any of the candidates) also defaults to , and just renders as one column. Tab-separated and semicolon-separated files Just Work.

51 tests

$ cargo test
running 34 tests
test csv::tests::quoted_field_with_comma ... ok
test csv::tests::escaped_quote_inside_quoted_field ... ok
test csv::tests::quoted_field_with_newline ... ok
…
test result: ok. 34 passed; 0 failed
running 17 tests
test renders_basic_csv ... ok
test no_color_env_disables_escape_sequences ... ok
test handles_unicode_data ... ok
…
test result: ok. 17 passed; 0 failed

The unit half of the test suite covers the parser edge cases (every quoted-field scenario, the recovery path, CRLF/LF/CR mixing), the type-inference widening rules, and the padding / truncate helpers including unicode width.

The integration half uses assert_cmd + predicates to drive the actual binary against fixture CSV files. Every CLI flag has at least one test; NO_COLOR=1, piping detection, ASCII vs Unicode borders, --no-header synthesizing column names, --delim overriding the sniffer, etc.

#[test]
fn handles_quoted_fields_with_commas() {
    let f = fixture("name,bio\nAlice,\"hello, world\"\n");
    cli().arg(f.path()).arg("--ascii").assert()
        .success()
        .stdout(predicate::str::contains("hello, world"));
}

Try it

git clone https://github.com/sen-ltd/csv-peek
cd csv-peek
cargo build --release
./target/release/csv-peek sample.csv

Or via Docker (no Rust toolchain needed):

docker build -t csv-peek .
docker run --rm -v "$(pwd)":/data -t csv-peek customers.csv

The -t flag is necessary so the container sees a TTY and IsTerminal returns true; without it, csv-peek auto-disables color (the same path that fires when you pipe to less).

Source: https://github.com/sen-ltd/csv-peek — MIT, ~700 lines, single dependency, 51 tests, sub-1 MB stripped binary.

🛠 Built by SEN LLC as part of an ongoing series of small, focused developer tools. Browse the full portfolio for more.

Comments (0)

Sign in to join the discussion

Be the first to comment!