Evals & Deep Dives
safe-docx vs python-docx
python-docx is the standard Python library for creating .docx files and making straightforward edits to existing ones.
When choosing a library to edit existing Word documents, the comparison that matters is behavioral: given the same document and the same operation, what does each library actually produce? Feature checklists answer what a library claims; a conformance suite answers what it does. The results below come from docx-platform-tests, an open test suite in the web-platform-tests tradition where every scenario asserts behavior derivable from a cited clause of ECMA-376 (the Office Open XML standard) — not from either library’s internals.
Each scenario runs unchanged against both libraries through a small adapter. An adapter may decline an operation its library cannot perform — reported as Unsupported rather than Fail, because the suite’s contribution rules forbid implementing missing library capabilities inside the adapter. An Unsupported cell measures the library, not the adapter author.
One framing note: this matrix measures a single axis — conformance on spec-anchored editing operations against existing documents. python-docx is a generation-first library and is widely used for building documents from scratch in Python; nothing below speaks to that use, where it remains a sound choice.
Results
| Scenario | safe-docx v0.10.0+git.8a748ffd31c1 | python-docx v1.2.0 |
|---|---|---|
acceptDeletionsRemovesDelContent ECMA-376 edition 5, Part 1 § 17.13.5.14 (del (Deleted Run Content)) | Pass | Unsupported python-docx has no tracked-changes (revision) API |
acceptInsertionsUnwrapsInsWrappers ECMA-376 edition 5, Part 1 § 17.13.5.18 (ins (Inserted Run Content)) | Pass | Unsupported python-docx has no tracked-changes (revision) API |
replaceFirstOccurrencePreservesOffsets ECMA-376 edition 5, Part 1 § 17.3.3.31 (t (Text)) | Pass | Pass |
Suite run of 2026-06-11. Full matrix across all implementations: cross-implementation conformance.
Tracked changes
Tracked changes (called revisions in the standard) are how Word records edits for later review: inserted content is wrapped in w:ins elements and deleted content in w:del elements (ECMA-376 edition 5, Part 1 § 17.13.5). Accepting an insertion unwraps the w:ins and keeps the content; accepting a deletion removes the w:del and its content.
python-docx reports Unsupported on both accept scenarios because it has no revision API: its object model does not surface runs nested inside w:ins or w:del wrappers — a paragraph’s .runs yields only direct run children, so revision-wrapped text is invisible to it. A redline workflow (programmatically accepting or rejecting reviewer edits) is not expressible with the library’s public API.
safe-docx passes both scenarios: accept and reject are first-class operations, and the same engine is differentially tested against a formal Lean model and a LibreOffice oracle in its own repository.
Find and replace
Both libraries pass the find-replace scenario, which asserts that the replacement text lands at the exact character offset of the match. The scenario’s fixture keeps the matched text inside a single run (a run is WordprocessingML’s unit of identically-formatted text), because python-docx’s replace surface is per-run: a match that spans run boundaries — which happens routinely in real documents, since Word splits runs freely as text is edited — requires reassembling runs, which the suite’s glue-not-algorithms rule leaves to the library rather than the adapter.
safe-docx’s replace operates on the paragraph’s concatenated text and splices runs as needed, so run boundaries do not constrain the match.
Where to dig deeper
- docx-platform-tests — the suite these results come from: scenario DSL, adapter protocol, raw results JSON (Apache-2.0).
- python-docx documentation — the compared library, on its own terms.
- safe-docx evals — per-primitive scenario pages with fixtures and expected output.
- safe-docx vs python-docx: where each tool sits in the stack — the product-level comparison.