Evals & Deep Dives
Cross-implementation conformance
A test that only one library can run proves little about the spec. The scenarios below assert behavior derivable from cited ECMA-376 clauses — not from any library's internals — and run unchanged against every registered implementation, in the tradition of wpt.fyi for the web platform.
The suite lives in open-agreements/docx-platform-tests: each scenario is an XML input plus assertions, each implementation participates through a small adapter CLI, and an adapter may decline an operation it cannot perform (reported as Unsupported, the analog of wpt's NOTRUN). A gap in the matrix is information about the library, not a failure of the suite.
| Scenario | safe-docx v0.10.0+git.8a748ffd31c1 | python-docx v1.2.0 |
|---|---|---|
acceptDeletionsRemovesDelContent Accepting deletions removes w:del wrappers and their content ECMA-376 edition 5, Part 1 § 17.13.5.14 (del (Deleted Run Content)) | Pass | Unsupported python-docx has no tracked-changes (revision) API |
acceptInsertionsUnwrapsInsWrappers Accepting insertions unwraps w:ins and keeps run content ECMA-376 edition 5, Part 1 § 17.13.5.18 (ins (Inserted Run Content)) | Pass | Unsupported python-docx has no tracked-changes (revision) API |
replaceFirstOccurrencePreservesOffsets Replacing the first occurrence places the new text at the matched offset ECMA-376 edition 5, Part 1 § 17.3.3.31 (t (Text)) | Pass | Pass |
Results from the suite run of 2026-06-11 (DSL 1.0, adapter protocol v1). The snapshot is refreshed by a maintainer running scripts/refresh-cross-impl-results.mjs; the build never fetches at CI time, so pages stay deterministic.
Reading the matrix
- Pass — the adapter's output satisfied every assertion in the scenario.
- Unsupported — the adapter declined the operation with a stated reason. The suite's contribution rules forbid implementing missing library capabilities inside an adapter, so this measures the library, not the adapter author.
- Fail / Error — the output violated an assertion, or the adapter crashed; per-assertion detail is in the suite's published
results/latest.json.
Per-implementation comparisons
Each compared library gets its own page: the matrix slice for that pair, plus what every cell means for a reader's document workflow.
Where to dig deeper
- docx-platform-tests — scenario DSL, adapter protocol, and how to register another implementation (Apache-2.0).
- Suite-owned results matrix — the neutral view published from the suite's own CI, with the raw
results/latest.json. - safe-docx evals — the per-primitive scenario pages this matrix complements.
- safe-docx vs python-docx — where each tool sits in the stack.