UseJunior Book a Demo

safe-docx · Extract Revisions

Insertion revision with before and after text

When reviewing tracked insertions in a paragraph, the before state and after state need to describe the same paragraph on both sides of the edit. Locating those states can be challenging because OOXML stores paragraph text across run elements (i.e., <w:r> containers that hold pieces of visible text) and wraps inserted paragraph text in tracked-change elements.

extractRevisions returns paragraph-level revision records by comparing the tracked document against accepted and rejected clones. The comparison produces before_text, after_text, and revision entries so the inserted paragraph text can be reported with its surrounding paragraph context.[1]

Below is a test scenario of the baseline successful case of extractRevisions: extracting an insertion with before and after text.

The scenario

Given a document with an insertion by Alice,
When extractRevisions is called,
Then

  • one change is returned.
  • before_text is the original text.
  • after_text includes the inserted text.
  • the revision is an INSERTION by Alice.

The test fixture

The fixture builds a WordprocessingML paragraph that has visible paragraph text before a tracked insertion and inserted paragraph text inside a <w:ins> wrapper. The scenario then calls extractRevisions with no comments so the returned change record reflects only the tracked insertion.[2]

Below is the test fixture code.

test('should extract insertions with before/after text', async ({ given, when, then, and }: AllureBddContext) => {
  let doc: Document;
  let result: ReturnType<typeof extractRevisions>;

  await given('a document with an insertion by Alice', async () => {
    doc = makeDoc(
      '<w:p>' +
        '<w:r><w:t>Original</w:t></w:r>' +
        '<w:ins w:author="Alice" w:date="2024-01-01T00:00:00Z">' +
          '<w:r><w:t> added</w:t></w:r>' +
        '</w:ins>' +
      '</w:p>',
    );
  });

  await when('extractRevisions is called', async () => {
    result = extractRevisions(doc, []);
  });

  await then('one change is returned', async () => {
    expect(result.total_changes).toBe(1);
  });

  await and('before_text is the original text', async () => {
    expect(result.changes[0]!.before_text).toBe('Original');
  });

  await and('after_text includes the inserted text', async () => {
    expect(result.changes[0]!.after_text).toBe('Original added');
  });

  await and('the revision is an INSERTION by Alice', async () => {
    expect(result.changes[0]!.revisions).toHaveLength(1);
    expect(result.changes[0]!.revisions[0]!.type).toBe('INSERTION');
    expect(result.changes[0]!.revisions[0]!.text).toBe(' added');
    expect(result.changes[0]!.revisions[0]!.author).toBe('Alice');
  });
});

The expected result shape

The scenario asserts the count of changed paragraphs and selected fields from the first returned change. Fields that the scenario does not assert, including para_id, comments, and has_more, are not part of this expected result shape.

Below is the result that extractRevisions is expected to return for this scenario.

{
  // simplified - see packages/docx-core/test-primitives/extract_revisions.test.ts lines 70-86 for asserted fields
  total_changes: 1,
  changes: [
    {
      before_text: 'Original',
      after_text: 'Original added',
      revisions: [
        {
          type: 'INSERTION',
          text: ' added',
          author: 'Alice',
        },
      ],
    },
  ],
}

Below is a description of the expected fields:

A non-obvious detail

The insertion wrapper is a tracked-change element described by the OOXML tracked-change schema, and its placement matters for paragraph-level extraction. This scenario keeps the inserted run element under <w:ins>, so the extractor can collect the insertion entry while still computing before and after paragraph text from accepted and rejected document clones.[3]