Comment association for changed paragraphs — safe-docx Extract Revisions

When reviewing tracked changes in a DOCX document, comments anchored to a paragraph need to stay attached to the revision record for that paragraph. The paragraph anchor (the bookmark identifier used to identify the paragraph) provides the join point between the comment data and the changed paragraph.

extractRevisions uses that paragraph anchor to group comments by anchoredParagraphId, then attaches matching comments to the returned paragraph revision record.^[1] Because the same paragraph can contain tracked-change markup such as <w:ins>, the returned change needs both the revision details and the associated comments to describe the review state.^[2]

Below is a test scenario of the baseline successful case of extractRevisions: a comment anchored to a paragraph with a tracked insertion is returned with the change record.

The scenario

Given a document with an insertion and a comment on the same paragraph,
When extractRevisions is called with a matching comment,
Then

one change is returned.
the comment is associated with the change.

The test fixture

The fixture builds a paragraph with tracked insertion markup, resolves the paragraph bookmark used by the mock comment, and passes that comment into extractRevisions. The scenario then checks the returned change count and the comment fields on the first change record.^[3]

Below is the test fixture code.

test('should associate comments with changed paragraphs', async ({ given, when, then, and }: AllureBddContext) => {
  let doc: Document;
  let result: ReturnType<typeof extractRevisions>;

  await given('a document with an insertion and a comment on the same paragraph', async () => {
    doc = makeDoc(
      '<w:p>' +
        '<w:r><w:t>Text</w:t></w:r>' +
        '<w:ins w:author="Author"><w:r><w:t> added</w:t></w:r></w:ins>' +
      '</w:p>',
    );
  });

  await when('extractRevisions is called with a matching comment', async () => {
    // Get the paragraph's bookmark ID for the mock comment
    const paras = doc.getElementsByTagNameNS(W_NS, 'p');
    const firstP = paras[0]!;
    const bookmarkStarts = firstP.getElementsByTagNameNS(W_NS, 'bookmarkStart');
    let paraId = '';
    for (let i = 0; i < bookmarkStarts.length; i++) {
      const name = bookmarkStarts[i]!.getAttributeNS(W_NS, 'name') ?? bookmarkStarts[i]!.getAttribute('w:name') ?? '';
      if (name.startsWith('_bk_')) {
        paraId = name;
        break;
      }
    }
    // Check sibling-style bookmarks
    if (!paraId) {
      let prev = firstP.previousSibling;
      while (prev) {
        if (prev.nodeType === 1 && (prev as Element).localName === 'bookmarkStart') {
          const name = (prev as Element).getAttributeNS(W_NS, 'name') ?? (prev as Element).getAttribute('w:name') ?? '';
          if (name.startsWith('_bk_')) { paraId = name; break; }
        }
        prev = prev.previousSibling;
      }
    }

    const comments: Comment[] = [{
      id: 1,
      author: 'Reviewer',
      date: '2024-01-01T00:00:00Z',
      initials: 'R',
      text: 'Nice addition!',
      paragraphId: 'COMMENT_PARA_ID',
      anchoredParagraphId: paraId,
      replies: [],
    }];

    result = extractRevisions(doc, comments);
  });

  await then('one change is returned', async () => {
    expect(result.total_changes).toBe(1);
  });

  await and('the comment is associated with the change', async () => {
    expect(result.changes[0]!.comments).toHaveLength(1);
    expect(result.changes[0]!.comments[0]!.author).toBe('Reviewer');
    expect(result.changes[0]!.comments[0]!.text).toBe('Nice addition!');
  });
});

The expected result shape

The scenario asserts on selected fields of the extractRevisions return value, so the expected result is represented by the exact assertions over those fields.

Below is the result that extractRevisions is expected to return for this scenario.

expect(result.total_changes).toBe(1);
expect(result.changes[0]!.comments).toHaveLength(1);
expect(result.changes[0]!.comments[0]!.author).toBe('Reviewer');
expect(result.changes[0]!.comments[0]!.text).toBe('Nice addition!');

Below is a description of the expected fields:

The total_changes field is expected to be 1, because the fixture contains one paragraph with tracked-change markup.
The changes[0]!.comments field is expected to contain one item, because the mock comment uses the changed paragraph's bookmark identifier as its anchoredParagraphId.
The changes[0]!.comments[0]!.author field is expected to be "Reviewer", because commentToRevisionComment copies the comment author into the returned revision comment.
The changes[0]!.comments[0]!.text field is expected to be "Nice addition!", because commentToRevisionComment copies the comment text into the returned revision comment.

A non-obvious detail

Comment association happens after revision paragraphs are found, because the paragraph bookmark is the shared identifier between the changed paragraph and the comment payload. That ordering prevents a comment anchored to a different paragraph from being attached to the returned change record.