CSV Preview and Statistics Tool - In-Browser CSV Profiler with Type Inference and Histograms

First Published:
Last Updated:

Free in-browser CSV profiler for fast exploration of CSV files. Drop a CSV (up to 100 MB), and the tool detects the delimiter, parses with an RFC 4180-compliant streaming parser, infers column types, and computes per-column statistics including count, missing rate, unique value counts, min/max/mean/median/standard deviation, 25/50/75 percentiles, and a histogram or top-value bar chart for any column. Every byte stays in your browser — files are never uploaded to a server.

IMPORTANT DISCLAIMER:

  • This tool is provided "AS IS" without any warranties of any kind.
  • The author accepts no responsibility for data loss, parsing errors, type-inference mistakes, or any issues arising from use.
  • Very large files (closer to the 100 MB limit) may noticeably impact browser memory and responsiveness.
  • Statistics depend on automatic type inference, which is heuristic and can mis-classify ambiguous columns.
  • Always keep backups of your original data and verify results against an authoritative source.
  • By using this tool, you accept full responsibility for any outcomes.

This tool uses client-side JavaScript for all processing. No data is transmitted to servers, no files are uploaded online, all processing happens locally in your browser. Once loaded, this tool continues to work even without an internet connection. For more details, please refer to our Web Tools Disclaimer.

Drag & drop a CSV / TSV file here (up to 100 MB)
Delimiter:
Options:

Features

  • RFC 4180-Compliant Streaming Parser: Correctly handles quoted fields containing the delimiter, embedded newlines, and escaped double-quotes (""). The parser works on 1 MB chunks via File.slice() + FileReader, so the full file is never loaded into memory at once.
  • Drag & Drop Up To 100 MB: Drop a CSV / TSV file or pick one through the file dialog; pasted text is also supported for quick experiments.
  • Automatic Delimiter Detection: Sniffs comma, tab, semicolon, and pipe delimiters from the first lines using a count + variance score.
  • Encoding Awareness: Detects UTF-8 / UTF-16 byte-order marks and uses TextDecoder with streaming mode so multi-byte characters that straddle chunk boundaries are decoded correctly.
  • Per-Column Type Inference: Each column is classified as integer, number, boolean, date, or string based on whether all non-missing values match the corresponding pattern.
  • Rich Statistics: Count, missing count and rate, unique value count (capped for memory safety), min, max, mean, median, standard deviation, and 25th / 75th percentiles for numeric columns.
  • Histograms & Top-Value Charts: Click any column's Chart button to render a Canvas-based histogram (numeric) or a top-value bar chart (categorical), drawn at device-pixel-ratio for crisp output on high-DPI screens.
  • Paginated Data Preview: First 1,000 parsed rows are kept in memory and displayed with 50 rows per page, with column-type badges in the header.
  • Multiple Export Formats: Export the preview rows as JSON (with type-cast values), TSV, or Excel-compatible CSV (UTF-8 BOM + CRLF) so the data opens cleanly in Microsoft Excel, including for non-ASCII characters.
  • Fully Client-Side: The page never issues a network request after loading. All parsing, stats, charting, and exporting run in the browser's main thread with periodic requestAnimationFrame yields so the UI stays responsive.

How to Use

  1. Drop a CSV / TSV file onto the drop zone, or click Choose File to open the file picker. For pasted snippets, click Paste Text Instead and use the Analyze Pasted Text button.
  2. (Optional) Adjust the parsing options:
    • Delimiter — leave on Auto-detect or pick comma, semicolon, tab, or pipe explicitly.
    • First row as header — uncheck if your file has no header row; columns are then named column1, column2, …
    • Infer column types — turn off if you only need row counts and missing counts (parsing is faster on huge files).
    • Trim whitespace — collapses leading / trailing whitespace before classifying values.
  3. Watch the progress bar while the streaming parser walks the file. When parsing completes, the File Summary, Column Statistics, and Data Preview panels are filled in.
  4. Click any column's Chart button to inspect its distribution. Numeric columns render a 30-bin histogram; categorical / boolean / date columns render a top-N bar chart. The legend on the right shows the full statistics for that column.
  5. Use the pagination controls below the preview table to walk through the first 1,000 parsed rows fifty at a time.
  6. Use the Export buttons to download the preview rows as JSON (with inferred types applied), TSV, or Excel-compatible CSV (UTF-8 BOM + CRLF).

Type Inference Rules

  • boolean: every non-missing cell matches true / false / yes / no (case-insensitive).
  • integer: every non-missing cell matches ^-?(0|[1-9]\d*)$; leading-zero strings (e.g. ZIP codes) stay as strings.
  • number: every cell matches an integer or decimal / scientific notation (1.5, -0.42, 3e8).
  • date: every cell matches YYYY-MM-DD / YYYY/MM/DD with optional time and timezone, and is accepted by Date.parse().
  • string: any column that fails the rules above. Top-N value frequencies are still computed.
  • The values (empty), NA, N/A, NaN, null, NULL, and None are treated as missing and excluded from statistics.

Statistics Computed Per Column

  • All types: total count, missing count and rate, unique value count (tracked up to 100,000 distinct values per column for memory safety), top values with their frequencies.
  • integer / number: min, max, arithmetic mean, sample standard deviation (N-1 denominator), median, 25th and 75th percentiles, and a 30-bin histogram.
  • boolean: true / false counts.
  • date: earliest and latest dates (parsed via Date.parse).
  • string: Top-20 value bar chart with raw frequencies.

RFC 4180 Compliance Notes

  • Fields containing the delimiter must be enclosed in double-quotes, e.g. "San Francisco, CA".
  • Fields containing newlines are preserved within double-quotes; the streaming parser tracks state across chunk boundaries to keep these intact.
  • Double-quote characters within a quoted field must be doubled (""); the parser un-doubles them on read.
  • Both CRLF (\r\n) and LF (\n) line endings are accepted as record terminators.

Common Use Cases

  • Data Engineering / Analytics: Quickly profile a fresh CSV export to spot missing-value spikes or unexpected categorical cardinality before loading it into a warehouse.
  • Spreadsheet Audits: Inspect a colleague's spreadsheet without opening Excel, including columns Excel would mangle (long IDs, leading-zero ZIP codes, mixed-locale numbers).
  • Open-Data Exploration: Drag a government / research open-data CSV onto the page to learn the schema, ranges, and distribution at a glance.
  • Pre-Cleaning for ML Pipelines: Identify columns that need imputation or transformation before training, all without writing a Python script.
  • Format Conversion: Convert a sample of rows from CSV to JSON, TSV, or Excel-compatible CSV for downstream tooling that prefers a different format.

Limits & Important Notes

  • File size is capped at 100 MB. Larger files should be sampled with a command-line tool (e.g. head / shuf) before profiling here.
  • Only the first 1,000 parsed rows are kept for the preview table and exports — full statistics are still computed across all rows in the file.
  • Numeric percentiles and median require sorting the column's numeric values; for very large numeric columns this is the dominant memory cost.
  • Unique value tracking caps at 100,000 distinct values per column. After the cap is reached, the column reports "> 100,000" instead of an exact count.
  • Type inference is purely structural — a column of strings that all happen to look like integers will be classified as integer. If that's wrong, untick Infer column types.
  • The Excel-compatible CSV export prepends a UTF-8 BOM and uses CRLF line endings so Microsoft Excel auto-detects encoding correctly. Plain CSV / TSV exports omit the BOM.
  • Once loaded, this tool works fully offline. Refreshing the page clears all state — no data is persisted to localStorage or cookies.
  • Encoding (Shift_JIS / CP932): UTF-8 and UTF-16 (BOM) files are detected automatically. For Shift_JIS files without a BOM, auto-detection is not implemented — the tool will read them as UTF-8 and may display garbled text. In that case, please convert the file to UTF-8 (e.g. with a text editor or iconv) before loading it here.

References:
Tech Blog with curated related content
Web Tools Collection

Written by Hidekazu Konishi