When detecting moved document content, a deleted passage and an inserted passage should only become one tracked move when their wording is similar enough. The similarity threshold prevents unrelated passages from being linked, so downstream operations do not target the wrong text.
detectMovesInAtomList applies that threshold while processing comparison unit atoms (the smallest text-bearing comparison items used by the move detector). The function groups deleted and inserted atoms into blocks, compares each deleted block with available inserted blocks using word-set similarity, and mutates both blocks to move statuses only when the best match reaches the configured threshold.[1]
Below is a test scenario of the baseline threshold-rejection case of detectMovesInAtomList: low-similarity deleted and inserted atoms remain deleted and inserted.
The scenario
Given deleted and inserted atoms with low similarity,
When moves are detected,
Then atoms are not marked as moved due to low similarity.
The test fixture
The fixture creates one deleted atom and one inserted atom, then runs move detection with move detection enabled and a similarity threshold of 0.8.[2]
Below is the test fixture code.
test('respects similarity threshold', async ({ given, when, then }: AllureBddContext) => {
let atoms: ComparisonUnitAtom[];
await given('deleted and inserted atoms with low similarity', () => {
atoms = [
createTestAtom('the quick brown fox jumps', CorrelationStatus.Deleted),
createTestAtom('a slow red cat sleeps', CorrelationStatus.Inserted), // Completely different
];
});
await when('moves are detected', () => {
detectMovesInAtomList(atoms, {
detectMoves: true,
moveSimilarityThreshold: 0.8,
moveMinimumWordCount: 1,
caseInsensitiveMove: true,
});
});
await then('atoms are not marked as moved due to low similarity', () => {
// Should NOT be marked as moved due to low similarity
const atom0 = atoms[0];
const atom1 = atoms[1];
assertDefined(atom0, 'atoms[0]');
assertDefined(atom1, 'atoms[1]');
expect(atom0.correlationStatus).toBe(CorrelationStatus.Deleted);
expect(atom1.correlationStatus).toBe(CorrelationStatus.Inserted);
});
});
The expected outcome
The scenario asserts the atom statuses after detectMovesInAtomList mutates the atom list in place, so the expected outcome is the unchanged deleted and inserted status pair rather than a return value.
Below is the result that detectMovesInAtomList is expected to produce for this scenario.
expect(atom0.correlationStatus).toBe(CorrelationStatus.Deleted);
expect(atom1.correlationStatus).toBe(CorrelationStatus.Inserted);
Below is a description of the expected fields:
atom0.correlationStatusis expected to beCorrelationStatus.Deleted, because the deleted atom does not meet the configured similarity threshold against the inserted atom.atom1.correlationStatusis expected to beCorrelationStatus.Inserted, because the inserted atom is not selected as a move destination for that low-similarity deleted atom.
A non-obvious detail
The minimum word count setting is low enough for both atoms to enter move matching, so the unchanged statuses come from the similarity check rather than the length filter. findBestMatch only returns an inserted block when its Jaccard word similarity is greater than or equal to moveSimilarityThreshold; without that match, detectMovesInAtomList leaves the statuses unchanged.