安装方式
手动下载安装
下载 ZIP 后解压到技能目录即可安装。若在桌面客户端 WebView中直接下载出现异常,本站会改为提示页 + 原始链接,请按页内说明操作。
下载 ZIP (oss-anthropic-docx-v1.0.0.zip)触发指令
/docx
跨平台安装指引
该技能声明兼容以下 1 个平台,将 ZIP 解压到对应目录即可被识别。
unzip oss-anthropic-docx-v1.0.0.zip -d ~/.claude/skills/
mkdir -p 创建;启用 Skill 后请重启对应 Agent 让配置生效。
使用指南
Word(DOCX)的创建、编辑与分析
.docx 本质是 ZIP + 内嵌 XML。适用于:新建/读取/编辑 Word、目录与页码、信纸抬头、从 docx 抽取或重组内容、插图、查找替换、修订与批注、将内容整理成正式 Word 文档等。不用于 PDF、电子表格、Google Docs 或与写文档无关的纯编程任务。
速查
| 任务 | 做法 |
|------|------|
| 读/分析 | pandoc 或解包看 XML |
| 新建 | docx-js(npm install -g docx),见下文要点 |
| 改已有稿 | 解包 → 改 XML → 再打包 |
老版 .doc 转 .docx
python scripts/office/soffice.py --headless --convert-to docx document.doc
读取内容
pandoc --track-changes=all document.docx -o output.md
python scripts/office/unpack.py document.docx unpacked/
转图片(便于目视检查)
python scripts/office/soffice.py --headless --convert-to pdf document.docx
pdftoppm -jpeg -r 150 document.pdf page
接受所有修订(需 LibreOffice)
python scripts/accept_changes.py input.docx output.docx
新建文档(docx-js)
用 JavaScript 生成 .docx,生成后 必须校验;失败则解包修 XML 再打包。
python scripts/office/validate.py doc.docx
页面尺寸(极其重要)
docx-js 默认 A4,不是美标 Letter。 美标请显式设置(单位 DXA,1440 DXA = 1 英寸):
- US Letter:宽
12240,高15840,四边边距常用各1440(1 英寸)。 - 横向:仍传入「竖向」宽高数值,并设
orientation: PageOrientation.LANDSCAPE,由库内部交换。
样式与标题
- 默认正文字体可用 Arial,标题保持黑色易读。
- 覆盖内置标题样式时 id 必须精确:
Heading1、Heading2…,并设outlineLevel,否则目录(TOC)异常。
列表
禁止手写 • 或 Unicode 项目符号字符。应使用 numbering + LevelFormat.BULLET / DECIMAL。同一 reference 连续编号;不同 reference 会重新从 1 开始。
表格
- 表宽用
WidthType.DXA,不要用百分比(Google Docs 等会坏)。 - 同时设置
columnWidths与每个单元格的width,且列宽之和等于表宽。 - 底纹用
ShadingType.CLEAR,不要用易出黑底的SOLID。 - 给单元格加
margins做内边距。 - 不要用表格当分隔线:空表会显示成难看条块;改用段落
border或制表位。
图片
ImageRun 必须 指定 type(png/jpg/…),并填齐 altText 的 title/description/name。
分页
PageBreak 必须放在 Paragraph 的 children 里,不能单独悬空,否则 XML 无效。
目录
TOC 要求标题段 只用 HeadingLevel,不要只靠自定义样式名。
其它要点
- 不要用
\n拼段落,应多个Paragraph。 - 脚注、超链接、制表位、多栏等见上游英文
SKILL.md中的完整代码块。
编辑已有文档(三步)
1. 解包
python scripts/office/unpack.py document.docx unpacked/
会整理 XML、合并相邻 run、把弯引号转成实体等。需要保留原始 run 结构时加 --merge-runs false。
2. 编辑 unpacked/word/ 下 XML
- 修订、批注的作者名默认用 「Claude」,除非用户指定。
- 字符串级替换请直接用编辑工具,不要为小事写 Python 脚本。
- 新增正文中的弯引号请用 XML 实体:
’(右单引)、“/”(双引号)等。 - 批注可借助
comment.py(详见上游英文文档)。
3. 打包
python scripts/office/pack.py unpacked/ output.docx --original document.docx
含校验与自动修复(部分问题可修,结构性错误需人工改)。可 --validate false 跳过校验。
常见坑
- 插入修订时 整段替换
w:r,不要把修订标签插在 run 中间弄碎结构。 - 保留原 run 的
w:rPr(粗体、字号等)到新的修订 run 里。
XML 与修订(摘要)
- 元素顺序、
xml:space="preserve"、RSID 格式等须符合 Word OOXML 约定。 - 插入用
w:ins,删除用w:del;段落内用w:delText等。 - 删整段 时还要处理段落标记上的删除标记,否则接受修订后会留下空段落。
- 批注范围标记
commentRangeStart/End与w:r同级,不要塞在w:r里面。
更完整的 XML 样例(修订嵌套、图片 relationship、多栏等)见上游仓库英文 SKILL.md。
依赖
- pandoc:文本抽取
- docx:
npm install -g docx - LibreOffice:PDF 转换等(
scripts/office/soffice.py) - Poppler:
pdftoppm
许可证以上游 LICENSE.txt 为准。
# DOCX creation, editing, and analysis
## Overview
A .docx file is a ZIP archive containing XML files.
## Quick Reference
| Task | Approach |
|------|----------|
| Read/analyze content | `pandoc` or unpack for raw XML |
| Create new document | Use `docx-js` - see Creating New Documents below |
| Edit existing document | Unpack → edit XML → repack - see Editing Existing Documents below |
### Converting .doc to .docx
Legacy `.doc` files must be converted before editing:
```bash
python scripts/office/soffice.py --headless --convert-to docx document.doc
```
### Reading Content
```bash
# Text extraction with tracked changes
pandoc --track-changes=all document.docx -o output.md
# Raw XML access
python scripts/office/unpack.py document.docx unpacked/
```
### Converting to Images
```bash
python scripts/office/soffice.py --headless --convert-to pdf document.docx
pdftoppm -jpeg -r 150 document.pdf page
```
### Accepting Tracked Changes
To produce a clean document with all tracked changes accepted (requires LibreOffice):
```bash
python scripts/accept_changes.py input.docx output.docx
```
---
## Creating New Documents
Generate .docx files with JavaScript, then validate. Install: `npm install -g docx`
### Setup
```javascript
const { Document, Packer, Paragraph, TextRun, Table, TableRow, TableCell, ImageRun,
Header, Footer, AlignmentType, PageOrientation, LevelFormat, ExternalHyperlink,
InternalHyperlink, Bookmark, FootnoteReferenceRun, PositionalTab,
PositionalTabAlignment, PositionalTabRelativeTo, PositionalTabLeader,
TabStopType, TabStopPosition, Column, SectionType,
TableOfContents, HeadingLevel, BorderStyle, WidthType, ShadingType,
VerticalAlign, PageNumber, PageBreak } = require('docx');
const doc = new Document({ sections: [{ children: [/* content */] }] });
Packer.toBuffer(doc).then(buffer => fs.writeFileSync("doc.docx", buffer));
```
### Validation
After creating the file, validate it. If validation fails, unpack, fix the XML, and repack.
```bash
python scripts/office/validate.py doc.docx
```
### Page Size
```javascript
// CRITICAL: docx-js defaults to A4, not US Letter
// Always set page size explicitly for consistent results
sections: [{
properties: {
page: {
size: {
width: 12240, // 8.5 inches in DXA
height: 15840 // 11 inches in DXA
},
margin: { top: 1440, right: 1440, bottom: 1440, left: 1440 } // 1 inch margins
}
},
children: [/* content */]
}]
```
**Common page sizes (DXA units, 1440 DXA = 1 inch):**
| Paper | Width | Height | Content Width (1" margins) |
|-------|-------|--------|---------------------------|
| US Letter | 12,240 | 15,840 | 9,360 |
| A4 (default) | 11,906 | 16,838 | 9,026 |
**Landscape orientation:** docx-js swaps width/height internally, so pass portrait dimensions and let it handle the swap:
```javascript
size: {
width: 12240, // Pass SHORT edge as width
height: 15840, // Pass LONG edge as height
orientation: PageOrientation.LANDSCAPE // docx-js swaps them in the XML
},
// Content width = 15840 - left margin - right margin (uses the long edge)
```
### Styles (Override Built-in Headings)
Use Arial as the default font (universally supported). Keep titles black for readability.
```javascript
const doc = new Document({
styles: {
default: { document: { run: { font: "Arial", size: 24 } } }, // 12pt default
paragraphStyles: [
// IMPORTANT: Use exact IDs to override built-in styles
{ id: "Heading1", name: "Heading 1", basedOn: "Normal", next: "Normal", quickFormat: true,
run: { size: 32, bold: true, font: "Arial" },
paragraph: { spacing: { before: 240, after: 240 }, outlineLevel: 0 } }, // outlineLevel required for TOC
{ id: "Heading2", name: "Heading 2", basedOn: "Normal", next: "Normal", quickFormat: true,
run: { size: 28, bold: true, font: "Arial" },
paragraph: { spacing: { before: 180, after: 180 }, outlineLevel: 1 } },
]
},
sections: [{
children: [
new Paragraph({ heading: HeadingLevel.HEADING_1, children: [new TextRun("Title")] }),
]
}]
});
```
### Lists (NEVER use unicode bullets)
```javascript
// ❌ WRONG - never manually insert bullet characters
new Paragraph({ children: [new TextRun("• Item")] }) // BAD
new Paragraph({ children: [new TextRun("\u2022 Item")] }) // BAD
// ✅ CORRECT - use numbering config with LevelFormat.BULLET
const doc = new Document({
numbering: {
config: [
{ reference: "bullets",
levels: [{ level: 0, format: LevelFormat.BULLET, text: "•", alignment: AlignmentType.LEFT,
style: { paragraph: { indent: { left: 720, hanging: 360 } } } }] },
{ reference: "numbers",
levels: [{ level: 0, format: LevelFormat.DECIMAL, text: "%1.", alignment: AlignmentType.LEFT,
style: { paragraph: { indent: { left: 720, hanging: 360 } } } }] },
]
},
sections: [{
children: [
new Paragraph({ numbering: { reference: "bullets", level: 0 },
children: [new TextRun("Bullet item")] }),
new Paragraph({ numbering: { reference: "numbers", level: 0 },
children: [new TextRun("Numbered item")] }),
]
}]
});
// ⚠️ Each reference creates INDEPENDENT numbering
// Same reference = continues (1,2,3 then 4,5,6)
// Different reference = restarts (1,2,3 then 1,2,3)
```
### Tables
**CRITICAL: Tables need dual widths** - set both `columnWidths` on the table AND `width` on each cell. Without both, tables render incorrectly on some platforms.
```javascript
// CRITICAL: Always set table width for consistent rendering
// CRITICAL: Use ShadingType.CLEAR (not SOLID) to prevent black backgrounds
const border = { style: BorderStyle.SINGLE, size: 1, color: "CCCCCC" };
const borders = { top: border, bottom: border, left: border, right: border };
new Table({
width: { size: 9360, type: WidthType.DXA }, // Always use DXA (percentages break in Google Docs)
columnWidths: [4680, 4680], // Must sum to table width (DXA: 1440 = 1 inch)
rows: [
new TableRow({
children: [
new TableCell({
borders,
width: { size: 4680, type: WidthType.DXA }, // Also set on each cell
shading: { fill: "D5E8F0", type: ShadingType.CLEAR }, // CLEAR not SOLID
margins: { top: 80, bottom: 80, left: 120, right: 120 }, // Cell padding (internal, not added to width)
children: [new Paragraph({ children: [new TextRun("Cell")] })]
})
]
})
]
})
```
**Table width calculation:**
Always use `WidthType.DXA` — `WidthType.PERCENTAGE` breaks in Google Docs.
```javascript
// Table width = sum of columnWidths = content width
// US Letter with 1" margins: 12240 - 2880 = 9360 DXA
width: { size: 9360, type: WidthType.DXA },
columnWidths: [7000, 2360] // Must sum to table width
```
**Width rules:**
- **Always use `WidthType.DXA`** — never `WidthType.PERCENTAGE` (incompatible with Google Docs)
- Table width must equal the sum of `columnWidths`
- Cell `width` must match corresponding `columnWidth`
- Cell `margins` are internal padding - they reduce content area, not add to cell width
- For full-width tables: use content width (page width minus left and right margins)
### Images
```javascript
// CRITICAL: type parameter is REQUIRED
new Paragraph({
children: [new ImageRun({
type: "png", // Required: png, jpg, jpeg, gif, bmp, svg
data: fs.readFileSync("image.png"),
transformation: { width: 200, height: 150 },
altText: { title: "Title", description: "Desc", name: "Name" } // All three required
})]
})
```
### Page Breaks
```javascript
// CRITICAL: PageBreak must be inside a Paragraph
new Paragraph({ children: [new PageBreak()] })
// Or use pageBreakBefore
new Paragraph({ pageBreakBefore: true, children: [new TextRun("New page")] })
```
### Hyperlinks
```javascript
// External link
new Paragraph({
children: [new ExternalHyperlink({
children: [new TextRun({ text: "Click here", style: "Hyperlink" })],
link: "https://example.com",
})]
})
// Internal link (bookmark + reference)
// 1. Create bookmark at destination
new Paragraph({ heading: HeadingLevel.HEADING_1, children: [
new Bookmark({ id: "chapter1", children: [new TextRun("Chapter 1")] }),
]})
// 2. Link to it
new Paragraph({ children: [new InternalHyperlink({
children: [new TextRun({ text: "See Chapter 1", style: "Hyperlink" })],
anchor: "chapter1",
})]})
```
### Footnotes
```javascript
const doc = new Document({
footnotes: {
1: { children: [new Paragraph("Source: Annual Report 2024")] },
2: { children: [new Paragraph("See appendix for methodology")] },
},
sections: [{
children: [new Paragraph({
children: [
new TextRun("Revenue grew 15%"),
new FootnoteReferenceRun(1),
new TextRun(" using adjusted metrics"),
new FootnoteReferenceRun(2),
],
})]
}]
});
```
### Tab Stops
```javascript
// Right-align text on same line (e.g., date opposite a title)
new Paragraph({
children: [
new TextRun("Company Name"),
new TextRun("\tJanuary 2025"),
],
tabStops: [{ type: TabStopType.RIGHT, position: TabStopPosition.MAX }],
})
// Dot leader (e.g., TOC-style)
new Paragraph({
children: [
new TextRun("Introduction"),
new TextRun({ children: [
new PositionalTab({
alignment: PositionalTabAlignment.RIGHT,
relativeTo: PositionalTabRelativeTo.MARGIN,
leader: PositionalTabLeader.DOT,
}),
"3",
]}),
],
})
```
### Multi-Column Layouts
```javascript
// Equal-width columns
sections: [{
properties: {
column: {
count: 2, // number of columns
space: 720, // gap between columns in DXA (720 = 0.5 inch)
equalWidth: true,
separate: true, // vertical line between columns
},
},
children: [/* content flows naturally across columns */]
}]
// Custom-width columns (equalWidth must be false)
sections: [{
properties: {
column: {
equalWidth: false,
children: [
new Column({ width: 5400, space: 720 }),
new Column({ width: 3240 }),
],
},
},
children: [/* content */]
}]
```
Force a column break with a new section using `type: SectionType.NEXT_COLUMN`.
### Table of Contents
```javascript
// CRITICAL: Headings must use HeadingLevel ONLY - no custom styles
new TableOfContents("Table of Contents", { hyperlink: true, headingStyleRange: "1-3" })
```
### Headers/Footers
```javascript
sections: [{
properties: {
page: { margin: { top: 1440, right: 1440, bottom: 1440, left: 1440 } } // 1440 = 1 inch
},
headers: {
default: new Header({ children: [new Paragraph({ children: [new TextRun("Header")] })] })
},
footers: {
default: new Footer({ children: [new Paragraph({
children: [new TextRun("Page "), new TextRun({ children: [PageNumber.CURRENT] })]
})] })
},
children: [/* content */]
}]
```
### Critical Rules for docx-js
- **Set page size explicitly** - docx-js defaults to A4; use US Letter (12240 x 15840 DXA) for US documents
- **Landscape: pass portrait dimensions** - docx-js swaps width/height internally; pass short edge as `width`, long edge as `height`, and set `orientation: PageOrientation.LANDSCAPE`
- **Never use `\n`** - use separate Paragraph elements
- **Never use unicode bullets** - use `LevelFormat.BULLET` with numbering config
- **PageBreak must be in Paragraph** - standalone creates invalid XML
- **ImageRun requires `type`** - always specify png/jpg/etc
- **Always set table `width` with DXA** - never use `WidthType.PERCENTAGE` (breaks in Google Docs)
- **Tables need dual widths** - `columnWidths` array AND cell `width`, both must match
- **Table width = sum of columnWidths** - for DXA, ensure they add up exactly
- **Always add cell margins** - use `margins: { top: 80, bottom: 80, left: 120, right: 120 }` for readable padding
- **Use `ShadingType.CLEAR`** - never SOLID for table shading
- **Never use tables as dividers/rules** - cells have minimum height and render as empty boxes (including in headers/footers); use `border: { bottom: { style: BorderStyle.SINGLE, size: 6, color: "2E75B6", space: 1 } }` on a Paragraph instead. For two-column footers, use tab stops (see Tab Stops section), not tables
- **TOC requires HeadingLevel only** - no custom styles on heading paragraphs
- **Override built-in styles** - use exact IDs: "Heading1", "Heading2", etc.
- **Include `outlineLevel`** - required for TOC (0 for H1, 1 for H2, etc.)
---
## Editing Existing Documents
**Follow all 3 steps in order.**
### Step 1: Unpack
```bash
python scripts/office/unpack.py document.docx unpacked/
```
Extracts XML, pretty-prints, merges adjacent runs, and converts smart quotes to XML entities (`“` etc.) so they survive editing. Use `--merge-runs false` to skip run merging.
### Step 2: Edit XML
Edit files in `unpacked/word/`. See XML Reference below for patterns.
**Use "Claude" as the author** for tracked changes and comments, unless the user explicitly requests use of a different name.
**Use the Edit tool directly for string replacement. Do not write Python scripts.** Scripts introduce unnecessary complexity. The Edit tool shows exactly what is being replaced.
**CRITICAL: Use smart quotes for new content.** When adding text with apostrophes or quotes, use XML entities to produce smart quotes:
```xml
<!-- Use these entities for professional typography -->
<w:t>Here’s a quote: “Hello”</w:t>
```
| Entity | Character |
|--------|-----------|
| `‘` | ‘ (left single) |
| `’` | ’ (right single / apostrophe) |
| `“` | “ (left double) |
| `”` | ” (right double) |
**Adding comments:** Use `comment.py` to handle boilerplate across multiple XML files (text must be pre-escaped XML):
```bash
python scripts/comment.py unpacked/ 0 "Comment text with & and ’"
python scripts/comment.py unpacked/ 1 "Reply text" --parent 0 # reply to comment 0
python scripts/comment.py unpacked/ 0 "Text" --author "Custom Author" # custom author name
```
Then add markers to document.xml (see Comments in XML Reference).
### Step 3: Pack
```bash
python scripts/office/pack.py unpacked/ output.docx --original document.docx
```
Validates with auto-repair, condenses XML, and creates DOCX. Use `--validate false` to skip.
**Auto-repair will fix:**
- `durableId` >= 0x7FFFFFFF (regenerates valid ID)
- Missing `xml:space="preserve"` on `<w:t>` with whitespace
**Auto-repair won't fix:**
- Malformed XML, invalid element nesting, missing relationships, schema violations
### Common Pitfalls
- **Replace entire `<w:r>` elements**: When adding tracked changes, replace the whole `<w:r>...</w:r>` block with `<w:del>...<w:ins>...` as siblings. Don't inject tracked change tags inside a run.
- **Preserve `<w:rPr>` formatting**: Copy the original run's `<w:rPr>` block into your tracked change runs to maintain bold, font size, etc.
---
## XML Reference
### Schema Compliance
- **Element order in `<w:pPr>`**: `<w:pStyle>`, `<w:numPr>`, `<w:spacing>`, `<w:ind>`, `<w:jc>`, `<w:rPr>` last
- **Whitespace**: Add `xml:space="preserve"` to `<w:t>` with leading/trailing spaces
- **RSIDs**: Must be 8-digit hex (e.g., `00AB1234`)
### Tracked Changes
**Insertion:**
```xml
<w:ins w:id="1" w:author="Claude" w:date="2025-01-01T00:00:00Z">
<w:r><w:t>inserted text</w:t></w:r>
</w:ins>
```
**Deletion:**
```xml
<w:del w:id="2" w:author="Claude" w:date="2025-01-01T00:00:00Z">
<w:r><w:delText>deleted text</w:delText></w:r>
</w:del>
```
**Inside `<w:del>`**: Use `<w:delText>` instead of `<w:t>`, and `<w:delInstrText>` instead of `<w:instrText>`.
**Minimal edits** - only mark what changes:
```xml
<!-- Change "30 days" to "60 days" -->
<w:r><w:t>The term is </w:t></w:r>
<w:del w:id="1" w:author="Claude" w:date="...">
<w:r><w:delText>30</w:delText></w:r>
</w:del>
<w:ins w:id="2" w:author="Claude" w:date="...">
<w:r><w:t>60</w:t></w:r>
</w:ins>
<w:r><w:t> days.</w:t></w:r>
```
**Deleting entire paragraphs/list items** - when removing ALL content from a paragraph, also mark the paragraph mark as deleted so it merges with the next paragraph. Add `<w:del/>` inside `<w:pPr><w:rPr>`:
```xml
<w:p>
<w:pPr>
<w:numPr>...</w:numPr> <!-- list numbering if present -->
<w:rPr>
<w:del w:id="1" w:author="Claude" w:date="2025-01-01T00:00:00Z"/>
</w:rPr>
</w:pPr>
<w:del w:id="2" w:author="Claude" w:date="2025-01-01T00:00:00Z">
<w:r><w:delText>Entire paragraph content being deleted...</w:delText></w:r>
</w:del>
</w:p>
```
Without the `<w:del/>` in `<w:pPr><w:rPr>`, accepting changes leaves an empty paragraph/list item.
**Rejecting another author's insertion** - nest deletion inside their insertion:
```xml
<w:ins w:author="Jane" w:id="5">
<w:del w:author="Claude" w:id="10">
<w:r><w:delText>their inserted text</w:delText></w:r>
</w:del>
</w:ins>
```
**Restoring another author's deletion** - add insertion after (don't modify their deletion):
```xml
<w:del w:author="Jane" w:id="5">
<w:r><w:delText>deleted text</w:delText></w:r>
</w:del>
<w:ins w:author="Claude" w:id="10">
<w:r><w:t>deleted text</w:t></w:r>
</w:ins>
```
### Comments
After running `comment.py` (see Step 2), add markers to document.xml. For replies, use `--parent` flag and nest markers inside the parent's.
**CRITICAL: `<w:commentRangeStart>` and `<w:commentRangeEnd>` are siblings of `<w:r>`, never inside `<w:r>`.**
```xml
<!-- Comment markers are direct children of w:p, never inside w:r -->
<w:commentRangeStart w:id="0"/>
<w:del w:id="1" w:author="Claude" w:date="2025-01-01T00:00:00Z">
<w:r><w:delText>deleted</w:delText></w:r>
</w:del>
<w:r><w:t> more text</w:t></w:r>
<w:commentRangeEnd w:id="0"/>
<w:r><w:rPr><w:rStyle w:val="CommentReference"/></w:rPr><w:commentReference w:id="0"/></w:r>
<!-- Comment 0 with reply 1 nested inside -->
<w:commentRangeStart w:id="0"/>
<w:commentRangeStart w:id="1"/>
<w:r><w:t>text</w:t></w:r>
<w:commentRangeEnd w:id="1"/>
<w:commentRangeEnd w:id="0"/>
<w:r><w:rPr><w:rStyle w:val="CommentReference"/></w:rPr><w:commentReference w:id="0"/></w:r>
<w:r><w:rPr><w:rStyle w:val="CommentReference"/></w:rPr><w:commentReference w:id="1"/></w:r>
```
### Images
1. Add image file to `word/media/`
2. Add relationship to `word/_rels/document.xml.rels`:
```xml
<Relationship Id="rId5" Type=".../image" Target="media/image1.png"/>
```
3. Add content type to `[Content_Types].xml`:
```xml
<Default Extension="png" ContentType="image/png"/>
```
4. Reference in document.xml:
```xml
<w:drawing>
<wp:inline>
<wp:extent cx="914400" cy="914400"/> <!-- EMUs: 914400 = 1 inch -->
<a:graphic>
<a:graphicData uri=".../picture">
<pic:pic>
<pic:blipFill><a:blip r:embed="rId5"/></pic:blipFill>
</pic:pic>
</a:graphicData>
</a:graphic>
</wp:inline>
</w:drawing>
```
---
## Dependencies
- **pandoc**: Text extraction
- **docx**: `npm install -g docx` (new documents)
- **LibreOffice**: PDF conversion (auto-configured for sandboxed environments via `scripts/office/soffice.py`)
- **Poppler**: `pdftoppm` for images