安装方式
手动下载安装
下载 ZIP 后解压到技能目录即可安装。若在桌面客户端 WebView中直接下载出现异常,本站会改为提示页 + 原始链接,请按页内说明操作。
下载 ZIP (shub-pandas-skill-v1.0.0.zip)触发指令
/pandas-skill
跨平台安装指引
该技能声明兼容以下 1 个平台,将 ZIP 解压到对应目录即可被识别。
unzip shub-pandas-skill-v1.0.0.zip -d ~/.claude/skills/
mkdir -p 创建;启用 Skill 后请重启对应 Agent 让配置生效。
使用指南
Pandas 数据分析
围绕 Pandas 数据分析:DataFrame 清洗、分组、透视与时间序列;大数据集时注意分块与类型降载。 无需在每次任务前把零散英文说明手工拼进上下文,也 减少 与客户端默认行为脱节的试错;具体命令、钩子与 JSON 参数仍以 ZIP 包内 SKILL.md 为权威。下文结构与站内 MCP CLI 类专题稿相同:何时用、前置、流程、速查与故障。
何时使用
- DataFrame 清洗、分组、透视与时间序列
- 大数据集时注意分块与类型降载
- 已获取本技能 ZIP,并准备在 Claude Code / OpenClaw 中按 SKILL.md 挂载。
- 希望用中文专题稿快速判断「该不该启用」,再深入英文 SKILL 查参数与边界。
- 需要与团队对齐同一套触发方式、目录约定或回调格式时。
前置条件
- 通用:可运行 Claude Code 或文档要求的客户端;有可读写的项目工作区(或 SKILL.md 指定的沙箱目录)。
- 权威细节:API Key / OAuth、钩子路径、环境变量以 ZIP 内 SKILL.md 为准。
典型流程
- 从 ClawHub / 站内分发获取技能 ZIP,校验版本与校验和(若提供)。
- 阅读 SKILL.md 的安装段落:目录落点、客户端类型(Claude Code / OpenClaw / 脚本)。
- 用文档中的最小示例完成第一次调用(单文件修改、单次查询或单次委派)。
- 确认工作目录、权限边界与输出路径后,再处理多文件或长耗时任务。
- 需要回调 / Webhook / 通知时,按 SKILL.md 配置端点并在测试环境先验通。
与 ZIP / SKILL.md 的关系
站内专题稿与 MCP CLI 类 oss 稿同样:概括何时用、怎么接、怎么排错;命令模板、钩子名、JSON 字段、版本矩阵一律以 ZIP 内 SKILL.md 与 ClawHub 上游为准。
命令示例(摘自包内 SKILL.md)
以下为从上游 SKILL.md(或入库正文)自动抽取的终端/脚本片段;路径、环境变量与参数以当前 ZIP 与官方说明为准。
ClawHub slug:pandas-skill(安装命令以 SKILL.md / claw CLI 为准)。
python scripts/data_cleaner.py input.csv output.csv [options]
python scripts/data_cleaner.py data.csv cleaned_data.csv \
--remove-duplicates \
--handle-missing mean \
--remove-outliers \
--standardize-columns
python scripts/data_analyzer.py input.csv [options]
python scripts/data_analyzer.py sales_data.csv -o report.json --format json
python scripts/data_transformer.py convert input.csv output.xlsx
python scripts/data_transformer.py merge file1.csv file2.csv file3.csv \
--output merged.csv \
--how outer \
--on key_column
python scripts/data_transformer.py filter data.csv \
--query "age > 18 and city == 'Beijing'" \
--output filtered.csv
python scripts/data_transformer.py sort data.csv \
--by sales quantity \
--descending \
--output sorted.csv
python scripts/data_transformer.py select data.csv \
--columns name age city \
--output selected.csv
python scripts/data_analyzer.py input_file.csv -o analysis_report.json
站内入库时的触发命令(完整语义见 ZIP):
# 使用本技能时可在对话中引用或执行上述指令;完整参数与示例见下载包内 SKILL.md。
/pandas-skill
最佳实践
- 先 SKILL.md 再猜参数;站内专题稿不替代 schema 与必填字段说明。
- 委派任务时写清验收标准(命令、文件路径、测试命令),减少来回追问。
- 长任务用文档推荐的回调 / 日志落盘代替高频轮询,省 Token 也省机器负载。
- 多技能同时启用时,注意钩子加载顺序与重复工具调用(以 SKILL.md 冲突说明为准)。
调试与排错
- 打开 stderr 与客户端日志;PTY/tmux 场景同时看面板最后几十行输出。
- 参数错误时对照 SKILL.md 中的 JSON/CLI 示例(引号、转义、工作目录)。
- 网络类失败:查代理、防火墙、MCP 传输方式(stdio / HTTP / SSE)。
速查
| 动作 | 说明 |
|------|------|
| 获取技能包 | ClawHub / 站内 ZIP,核对版本 |
| 权威步骤 | 优先阅读 ZIP 内 SKILL.md |
| 首次试跑 | 使用 SKILL.md 最小示例 |
| 验收 | 对照路径、测试命令或回调负载 |
常见故障
- 无输出或立即退出 → 工作目录错误、依赖未装、或 Claude Code 未登录;按 SKILL.md 自检清单执行。
- 权限被拒绝 → 检查沙箱路径、
--permission-mode与工具白名单。 - 与简介不符 → 以英文 SKILL 与上游仓库为准,站内稿仅作结构化导读。
# Pandas Data Processing Skill
English | [简体中文](SKILL_CN.md)
This skill provides comprehensive pandas data processing capabilities through executable scripts and reference documentation. Use this skill whenever tasks involve data manipulation, cleaning, analysis, or transformation of tabular data.
## When to Use This Skill
Activate this skill when the user requests:
- Data cleaning operations (handling missing values, duplicates, outliers)
- Data analysis and statistical summaries
- Format conversions (CSV ↔ Excel ↔ JSON ↔ Parquet)
- Data transformation (filtering, sorting, aggregation, pivoting)
- Merging or combining multiple datasets
- Generating data quality reports
- Any pandas DataFrame operations
## Core Capabilities
### 1. Data Cleaning (`scripts/data_cleaner.py`)
Handles common data cleaning tasks with a single command:
**Usage:**
```bash
python scripts/data_cleaner.py input.csv output.csv [options]
```
**Available Options:**
- `--remove-duplicates`: Remove duplicate rows
- `--handle-missing [strategy]`: Handle missing values
- Strategies: `drop`, `fill`, `forward`, `backward`, `mean`, `median`
- `--fill-value [value]`: Custom fill value for missing data
- `--remove-outliers`: Remove outliers using IQR or Z-score method
- `--outlier-method [method]`: Choose `iqr` or `zscore` (default: iqr)
- `--standardize-columns`: Standardize column names (lowercase, underscores)
**Example:**
```bash
python scripts/data_cleaner.py data.csv cleaned_data.csv \
--remove-duplicates \
--handle-missing mean \
--remove-outliers \
--standardize-columns
```
### 2. Data Analysis (`scripts/data_analyzer.py`)
Generates comprehensive data analysis reports:
**Usage:**
```bash
python scripts/data_analyzer.py input.csv [options]
```
**Available Options:**
- `--output, -o [file]`: Save report to file
- `--format [format]`: Output format (`json` or `text`, default: json)
**Report Includes:**
- Basic information (rows, columns, memory usage)
- Data type distribution
- Missing values analysis
- Numeric column statistics (mean, std, min, max, quartiles, skewness, kurtosis)
- Categorical column statistics (unique values, value counts)
- Correlation analysis
- Outlier detection
**Example:**
```bash
python scripts/data_analyzer.py sales_data.csv -o report.json --format json
```
### 3. Data Transformation (`scripts/data_transformer.py`)
Performs various data transformation operations through subcommands:
#### Convert Format
```bash
python scripts/data_transformer.py convert input.csv output.xlsx
```
Supports: CSV, Excel (.xlsx/.xls), JSON, Parquet, HTML
#### Merge Files
```bash
python scripts/data_transformer.py merge file1.csv file2.csv file3.csv \
--output merged.csv \
--how outer \
--on key_column
```
#### Filter Data
```bash
python scripts/data_transformer.py filter data.csv \
--query "age > 18 and city == 'Beijing'" \
--output filtered.csv
```
#### Sort Data
```bash
python scripts/data_transformer.py sort data.csv \
--by sales quantity \
--descending \
--output sorted.csv
```
#### Select Columns
```bash
python scripts/data_transformer.py select data.csv \
--columns name age city \
--output selected.csv
```
## Reference Documentation
The `references/` directory contains detailed documentation:
### `references/common_operations.md`
Comprehensive reference covering:
- Data reading/saving (CSV, Excel, JSON, SQL, Parquet)
- Data exploration (head, info, describe, dtypes)
- Data selection and filtering (loc, iloc, boolean indexing, query)
- Data cleaning (handling missing/duplicate values, type conversion)
- Data transformation (apply, map, sorting, column operations)
- Groupby and aggregation operations
- Pivot tables
- Merging and joining (concat, merge, join)
- Time series operations
- String operations
- Performance optimization tips
**When to use:** When Claude needs to understand pandas syntax or find the right method for a specific operation.
### `references/data_cleaning_best_practices.md`
Best practices guide covering:
- Data quality check checklist
- Missing value handling strategies with decision tree
- Outlier detection methods (IQR, Z-Score, percentile)
- Data type optimization for memory efficiency
- String cleaning techniques
- Date/time standardization
- Complete cleaning pipeline template
- Common problems and solutions
- Data validation methods
**When to use:** When designing a data cleaning workflow or deciding on the best approach for specific data quality issues.
## Workflow Guidelines
### Step 1: Initial Assessment
Always start by analyzing the data:
```bash
python scripts/data_analyzer.py input_file.csv -o analysis_report.json
```
Review the report to understand data quality, types, missing values, and potential issues.
### Step 2: Plan Cleaning Strategy
Based on the analysis report:
- Identify missing value strategy (reference: `data_cleaning_best_practices.md`)
- Determine if duplicates should be removed
- Decide on outlier handling approach
- Plan any necessary type conversions
### Step 3: Execute Cleaning
Run the data cleaner with appropriate options:
```bash
python scripts/data_cleaner.py input.csv cleaned.csv [options]
```
### Step 4: Transform as Needed
Apply any transformations (filtering, sorting, format conversion, merging):
```bash
python scripts/data_transformer.py [subcommand] [options]
```
### Step 5: Validate Results
Re-run analysis on the cleaned data to verify improvements:
```bash
python scripts/data_analyzer.py cleaned.csv -o final_report.json
```
## Common Patterns
### Pattern 1: Quick Data Quality Report
```bash
python scripts/data_analyzer.py data.csv --format text
```
### Pattern 2: Standard Cleaning Pipeline
```bash
python scripts/data_cleaner.py raw_data.csv clean_data.csv \
--standardize-columns \
--remove-duplicates \
--handle-missing median \
--remove-outliers
```
### Pattern 3: Excel to CSV with Filtering
```bash
# Convert
python scripts/data_transformer.py convert data.xlsx data.csv
# Filter
python scripts/data_transformer.py filter data.csv \
--query "status == 'active'" \
--output filtered.csv
```
### Pattern 4: Merge Multiple CSVs
```bash
python scripts/data_transformer.py merge *.csv \
--output combined.csv
```
## Dependencies
Ensure pandas is installed:
```bash
pip install pandas numpy openpyxl
```
Optional for specific formats:
```bash
pip install pyarrow # For Parquet support
pip install xlrd # For older Excel files (.xls)
```
## Tips for Effective Use
1. **Start with analysis:** Always run the analyzer first to understand the data
2. **Incremental cleaning:** Apply cleaning operations step by step, verify each step
3. **Preserve originals:** Never overwrite original data files
4. **Check references:** Consult reference docs for complex operations or best practices
5. **Validate results:** Use the analyzer to verify cleaning effectiveness
6. **Memory efficiency:** For large files, consider using the data type optimization techniques in the reference docs
7. **Combine operations:** Chain multiple transformer commands for complex workflows
## Limitations
- Scripts work with single-machine memory constraints (for very large datasets, consider Dask)
- Time series resampling and rolling operations require custom pandas code
- Complex statistical modeling beyond basic descriptive statistics requires additional libraries
- For advanced visualizations, use matplotlib/seaborn directly
## Troubleshooting
**Import errors:** Ensure pandas and dependencies are installed
**Memory errors:** Process data in chunks or optimize dtypes (see references)
**Encoding issues:** Add `encoding='utf-8'` parameter when loading CSVs
**Date parsing issues:** Use `pd.to_datetime()` with explicit format string
For detailed pandas operations and troubleshooting, always refer to `references/common_operations.md` and `references/data_cleaning_best_practices.md`.