Document Change Extraction
3 requirements
·
10 scenarios
Change Type Classification
JR-document-change-extraction-001
The system SHALL classify document changes into the following types:
- PARTIAL_INSERTION: Some text inserted within existing paragraph
- PARTIAL_DELETION: Some text deleted from existing paragraph
- MOVING_FROM: Text moved from this location
- MOVING_TO: Text moved to this location
- FORMAT_CHANGE: Only formatting changed (bold, italic, underline, etc.)
- NEW_PARAGRAPH: Entire paragraph is new (before_text is None)
- DELETED_PARAGRAPH: Entire paragraph deleted (after_text is None)
The system SHALL represent change types as a set, since a single paragraph can have multiple change types.
- PARTIAL_INSERTION: Some text inserted within existing paragraph
- PARTIAL_DELETION: Some text deleted from existing paragraph
- MOVING_FROM: Text moved from this location
- MOVING_TO: Text moved to this location
- FORMAT_CHANGE: Only formatting changed (bold, italic, underline, etc.)
- NEW_PARAGRAPH: Entire paragraph is new (before_text is None)
- DELETED_PARAGRAPH: Entire paragraph deleted (after_text is None)
The system SHALL represent change types as a set, since a single paragraph can have multiple change types.
5 test scenarios
- Format-only change detected JR-document-change-extraction-001.1
- Partial content change detected JR-document-change-extraction-001.2
- New paragraph detected JR-document-change-extraction-001.3
- Deleted paragraph detected JR-document-change-extraction-001.4
- Move operation detected JR-document-change-extraction-001.5
Structured Change Data Model
JR-document-change-extraction-002
The system SHALL provide a structured data model for changed paragraphs containing:
-
-
-
-
-
-
-
para_id: Unique identifier for the paragraph (e.g., "edit-1" or bookmark ID)-
before_text: Plain text content before changes (None if new paragraph)-
after_text: Plain text content after changes (None if deleted paragraph)-
change_types: Set of ChangeType values affecting this paragraph-
html_snippet: Pre-rendered HTML with revision styling-
page_number: Optional page number where paragraph appears
3 test scenarios
- Extract structured data from document comparison JR-document-change-extraction-002.1
- HTML snippet generation JR-document-change-extraction-002.2
- Multiple change types in single paragraph JR-document-change-extraction-002.3
Document Comparison Support
JR-document-change-extraction-003
The system SHALL support comparing two document versions to extract changes.
2 test scenarios
- Compare original vs revised documents JR-document-change-extraction-003.1
- Before/after text extraction JR-document-change-extraction-003.2