Skip to main content

Document Change Extraction

3 requirements · 10 scenarios

Change Type Classification

JR-document-change-extraction-001
The system SHALL classify document changes into the following types:
- PARTIAL_INSERTION: Some text inserted within existing paragraph
- PARTIAL_DELETION: Some text deleted from existing paragraph
- MOVING_FROM: Text moved from this location
- MOVING_TO: Text moved to this location
- FORMAT_CHANGE: Only formatting changed (bold, italic, underline, etc.)
- NEW_PARAGRAPH: Entire paragraph is new (before_text is None)
- DELETED_PARAGRAPH: Entire paragraph deleted (after_text is None)

The system SHALL represent change types as a set, since a single paragraph can have multiple change types.
5 test scenarios
  • Format-only change detected JR-document-change-extraction-001.1
  • Partial content change detected JR-document-change-extraction-001.2
  • New paragraph detected JR-document-change-extraction-001.3
  • Deleted paragraph detected JR-document-change-extraction-001.4
  • Move operation detected JR-document-change-extraction-001.5

Structured Change Data Model

JR-document-change-extraction-002
The system SHALL provide a structured data model for changed paragraphs containing:
- para_id: Unique identifier for the paragraph (e.g., "edit-1" or bookmark ID)
- before_text: Plain text content before changes (None if new paragraph)
- after_text: Plain text content after changes (None if deleted paragraph)
- change_types: Set of ChangeType values affecting this paragraph
- html_snippet: Pre-rendered HTML with revision styling
- page_number: Optional page number where paragraph appears
3 test scenarios
  • Extract structured data from document comparison JR-document-change-extraction-002.1
  • HTML snippet generation JR-document-change-extraction-002.2
  • Multiple change types in single paragraph JR-document-change-extraction-002.3

Document Comparison Support

JR-document-change-extraction-003
The system SHALL support comparing two document versions to extract changes.
2 test scenarios
  • Compare original vs revised documents JR-document-change-extraction-003.1
  • Before/after text extraction JR-document-change-extraction-003.2