Skip to main content

DOCX Comparison

11 requirements · 31 scenarios

Correlation Status Enumeration

JR-docx-comparison-001
The system SHALL provide a CorrelationStatus enum with the following values: Nil, Normal, Unknown, Inserted, Deleted, Equal, Group, MovedSource, MovedDestination, FormatChanged.
6 test scenarios
  • Status assigned during comparison JR-docx-comparison-001.1
  • Status for unmatched atoms JR-docx-comparison-001.2
  • Status for deleted content JR-docx-comparison-001.3
  • Status for moved source content JR-docx-comparison-001.4
  • Status for moved destination content JR-docx-comparison-001.5
  • Status for format-changed content JR-docx-comparison-001.6

Legal Numbering Continuation Pattern Detection

JR-docx-comparison-007
The system SHALL detect "continuation patterns" in legal numbering where a paragraph at ilvl > 0 continues a flat sequence rather than creating a nested hierarchy. When detected, the system SHALL use the effective level (level 0) properties instead of the declared level.

A continuation pattern exists when:
1. The paragraph is the first at this level in the current sequence, AND
2. The level's start value equals the parent level's counter + 1
3 test scenarios
  • Orphan list item renders with parent format JR-docx-comparison-007.1
  • Proper nested list renders hierarchically JR-docx-comparison-007.2
  • Continuation pattern inherits formatting JR-docx-comparison-007.3

Footnote Sequential Numbering

JR-docx-comparison-008
The system SHALL calculate footnote display numbers sequentially based on document order, NOT using raw XML w:id attribute values. The w:id is a reference identifier linking footnoteReference to footnote definitions; display numbers are determined by the order footnotes appear in the document flow.
3 test scenarios
  • First footnote displays as 1 JR-docx-comparison-008.1
  • Sequential numbering ignores XML IDs JR-docx-comparison-008.2
  • Reserved footnote IDs excluded from numbering JR-docx-comparison-008.3

Move Detection Algorithm

JR-docx-comparison-010
The system SHALL provide a detectMovesInAtomList() function that identifies relocated content after LCS comparison. The algorithm:
1. Groups consecutive atoms by correlationStatus into blocks (Deleted blocks, Inserted blocks)
2. Extracts text from each block by joining content element values
3. Filters blocks by minimum word count (configurable, default: 3)
4. Calculates Jaccard word similarity between deleted and inserted blocks
5. Converts matching pairs (above threshold) to MovedSource and MovedDestination
3 test scenarios
  • Move detected between similar blocks JR-docx-comparison-010.1
  • Short blocks ignored JR-docx-comparison-010.2
  • Below threshold treated as separate changes JR-docx-comparison-010.3

Move Detection Settings

JR-docx-comparison-012
The system SHALL provide configurable settings for move detection:
- detectMoves: Enable/disable move detection (default: true)
- moveSimilarityThreshold: Jaccard threshold for move matching (default: 0.8)
- moveMinimumWordCount: Minimum words for move consideration (default: 3)
- caseInsensitive: Case-insensitive similarity matching (default: false)
2 test scenarios
  • Move detection disabled JR-docx-comparison-012.1
  • Custom threshold applied JR-docx-comparison-012.2

OpenXML Move Markup Generation

JR-docx-comparison-013
The system SHALL generate native Word move tracking markup when moves are detected:

For moved source (content moved FROM):
- w:moveFromRangeStart with w:id, w:name, w:author, w:date
- w:moveFrom containing the moved content
- w:moveFromRangeEnd with matching w:id

For moved destination (content moved TO):
- w:moveToRangeStart with w:id, w:name, w:author, w:date
- w:moveTo containing the moved content
- w:moveToRangeEnd with matching w:id
3 test scenarios
  • Move source markup structure JR-docx-comparison-013.1
  • Move destination markup structure JR-docx-comparison-013.2
  • Range IDs properly paired JR-docx-comparison-013.3

Format Change Info Interface

JR-docx-comparison-014
The system SHALL provide a FormatChangeInfo interface with:
- oldRunProperties: The w:rPr element from the original document (may be null)
- newRunProperties: The w:rPr element from the modified document (may be null)
- changedProperties: Array of friendly property names that differ (e.g., "bold", "italic")
2 test scenarios
  • Bold added JR-docx-comparison-014.1
  • Multiple properties changed JR-docx-comparison-014.2

Format Change Detection Algorithm

JR-docx-comparison-015
The system SHALL provide a detectFormatChangesInAtomList() function that identifies formatting differences in Equal atoms after LCS comparison. The algorithm:
1. Iterates through atoms with correlationStatus === Equal
2. Skips atoms without comparisonUnitAtomBefore reference
3. Extracts w:rPr from ancestor w:r element for both original and modified atoms
4. Normalizes w:rPr elements (removes existing w:rPrChange, sorts children)
5. Compares normalized properties for equality
6. Converts non-equal atoms to FormatChanged status with formatChange info
3 test scenarios
  • Text becomes bold JR-docx-comparison-015.1
  • No format change JR-docx-comparison-015.2
  • Format detection with text change JR-docx-comparison-015.3

Format Change Detection Settings

JR-docx-comparison-019
The system SHALL provide configurable settings for format change detection:
- detectFormatChanges: Enable/disable format change detection (default: true)
2 test scenarios
  • Format detection disabled JR-docx-comparison-019.1
  • Format detection enabled by default JR-docx-comparison-019.2

OpenXML Format Change Markup Generation

JR-docx-comparison-020
The system SHALL generate native Word format change tracking markup (w:rPrChange) when format changes are detected.

For format-changed content:
- The current w:rPr contains the NEW properties
- w:rPrChange is added as a child of w:rPr containing the OLD properties
- w:rPrChange includes w:id, w:author, and w:date attributes
3 test scenarios
  • Format change markup structure JR-docx-comparison-020.1
  • Bold added markup JR-docx-comparison-020.2
  • Bold removed markup JR-docx-comparison-020.3

Format Change Revision Reporting

JR-docx-comparison-021
The system SHALL include format changes in GetRevisions() output with type FormatChanged, extracting revision information from w:rPrChange elements.
1 test scenario
  • Get format change revisions JR-docx-comparison-021.1