When reviewing tracked insertions in a paragraph, the before state and after state need to describe the same paragraph on both sides of the edit. Locating those states can be challenging because OOXML stores paragraph text across run elements (i.e., <w:r> containers that hold pieces of visible text) and wraps inserted paragraph text in tracked-change elements.
extractRevisions returns paragraph-level revision records by comparing the tracked document against accepted and rejected clones. The comparison produces before_text, after_text, and revision entries so the inserted paragraph text can be reported with its surrounding paragraph context.[1]
Below is a test scenario of the baseline successful case of extractRevisions: extracting an insertion with before and after text.
The scenario
Given a document with an insertion by Alice,
When extractRevisions is called,
Then
- one change is returned.
before_textis the original text.after_textincludes the inserted text.- the revision is an
INSERTIONby Alice.
The test fixture
The fixture builds a WordprocessingML paragraph that has visible paragraph text before a tracked insertion and inserted paragraph text inside a <w:ins> wrapper. The scenario then calls extractRevisions with no comments so the returned change record reflects only the tracked insertion.[2]
Below is the test fixture code.
test('should extract insertions with before/after text', async ({ given, when, then, and }: AllureBddContext) => {
let doc: Document;
let result: ReturnType<typeof extractRevisions>;
await given('a document with an insertion by Alice', async () => {
doc = makeDoc(
'<w:p>' +
'<w:r><w:t>Original</w:t></w:r>' +
'<w:ins w:author="Alice" w:date="2024-01-01T00:00:00Z">' +
'<w:r><w:t> added</w:t></w:r>' +
'</w:ins>' +
'</w:p>',
);
});
await when('extractRevisions is called', async () => {
result = extractRevisions(doc, []);
});
await then('one change is returned', async () => {
expect(result.total_changes).toBe(1);
});
await and('before_text is the original text', async () => {
expect(result.changes[0]!.before_text).toBe('Original');
});
await and('after_text includes the inserted text', async () => {
expect(result.changes[0]!.after_text).toBe('Original added');
});
await and('the revision is an INSERTION by Alice', async () => {
expect(result.changes[0]!.revisions).toHaveLength(1);
expect(result.changes[0]!.revisions[0]!.type).toBe('INSERTION');
expect(result.changes[0]!.revisions[0]!.text).toBe(' added');
expect(result.changes[0]!.revisions[0]!.author).toBe('Alice');
});
});
The expected result shape
The scenario asserts the count of changed paragraphs and selected fields from the first returned change. Fields that the scenario does not assert, including para_id, comments, and has_more, are not part of this expected result shape.
Below is the result that extractRevisions is expected to return for this scenario.
{
// simplified - see packages/docx-core/test-primitives/extract_revisions.test.ts lines 70-86 for asserted fields
total_changes: 1,
changes: [
{
before_text: 'Original',
after_text: 'Original added',
revisions: [
{
type: 'INSERTION',
text: ' added',
author: 'Alice',
},
],
},
],
}
Below is a description of the expected fields:
total_changesis expected to be1, because the fixture paragraph contains one tracked insertion that produces one changed paragraph record.changes[0].before_textis expected to be'Original', because rejecting the insertion leaves only the paragraph text that existed before the edit.changes[0].after_textis expected to be'Original added', because accepting the insertion keeps the original paragraph text and the inserted paragraph text.changes[0].revisionsis expected to contain one entry, because the fixture has one content-level<w:ins>wrapper.changes[0].revisions[0].typeis expected to be'INSERTION', because<w:ins>maps to the insertion revision type.changes[0].revisions[0].textis expected to be' added', because the inserted run element contains that paragraph text.changes[0].revisions[0].authoris expected to be'Alice', because the<w:ins>wrapper carriesw:author="Alice".
A non-obvious detail
The insertion wrapper is a tracked-change element described by the OOXML tracked-change schema, and its placement matters for paragraph-level extraction. This scenario keeps the inserted run element under <w:ins>, so the extractor can collect the insertion entry while still computing before and after paragraph text from accepted and rejected document clones.[3]