CSV Analyzer

A data-oriented skill that takes a CSV file (path or inline content), parses it, and produces a comprehensive analysis including column types, descriptive statistics, missing value counts, correlation highlights, and detected outliers.

79Trust Medium
by hermeshub-coredataintermediatev1.7.0updated Feb 10, 2026
8.7kTotal Runs
86.3%Success Rate
2.5kInstalls
79Trust Score

Tags

#csv#data-analysis#statistics#outliers#patterns

Required Tools

file_readjson_parse

Inputs

NameTypeDescriptionReq
csv_filefilePath to the CSV file to analyze.--
csv_contenttextInline CSV content as a string. Used if csv_file is not provided.--
delimitertextColumn delimiter. Defaults to comma (",").--
include_correlationsbooleanWhether to compute pairwise correlations for numeric columns. Defaults to true.--

Outputs

NameTypeDescriptionReq
analysisjsonJSON report with fields: row_count, column_count, columns (array of column analyses), outliers, correlations, data_quality_score.yes

Compatible Skills

SKILL.md

---
name: csv-analyzer
description: Analyze CSV files to generate descriptive statistics, detect data types, find outliers, and identify patterns.
---

# CSV Analyzer

Comprehensive CSV analysis with statistics and pattern detection.

## Quick Start

### Basic Analysis

```bash
# Using csvstat (csvkit)
csvstat data.csv

# Using Python
python3 -c "
import pandas as pd
df = pd.read_csv('data.csv')
print(df.describe())
print(df.dtypes)
print(f'Missing values: {df.isnull().sum().sum()}')
"
```

### Advanced Statistics

```bash
# Detect outliers (Z-score method)
python3 -c "
import pandas as pd
import numpy as np
df = pd.read_csv('data.csv')
numeric_cols = df.select_dtypes(include=[np.number]).columns
z_scores = np.abs((df[numeric_cols] - df[numeric_cols].mean()) / df[numeric_cols].std())
outliers = df[(z_scores > 3).any(axis=1)]
print(f'Outliers: {len(outliers)}')
"
```

## Tools Available

- csvstat (csvkit) - comprehensive stats
- csvlook - preview tables
- csvgrep - filter rows
- Python pandas - full analysis
- awk - quick field extraction

## Analysis Output

```json
{
  "row_count": 1000,
  "column_count": 5,
  "columns": [
    {
      "name": "price",
      "type": "numeric",
      "stats": {
        "min": 10.0,
        "max": 1000.0,
        "mean": 250.5,
        "median": 200.0,
        "std": 150.2
      },
      "missing": 5
    }
  ],
  "outliers": [...],
  "correlations": {...},
  "data_quality_score": 92
}
```

## Common Patterns

| Pattern | Detection Method |
|---------|-----------------|
| Missing values | .isnull().sum() |
| Outliers | Z-score > 3 or IQR |
| Correlations | .corr() |
| Duplicates | .duplicated() |

## Error Handling
- Empty files: Return error with suggested fix
- Malformed CSV: Use error_bad_lines=False
- Encoding issues: Try latin-1 or utf-8-sig