Skip to main content

Patching a CharacterData state drift in xmldom

Patching a CharacterData state drift in xmldom
author

Steven Obiajulu

April 19, 2026 · 5 min read

Engineering Open Source JavaScript Node.js XML xmldom

We build document tooling on top of xmldom/xmldom, so we spend time reading the parts of it that touch parsing and serialization. Recently one of those reads surfaced a small state-drift issue in CharacterData: data and nodeValue were separate backing fields on the same node, so direct assignment to one did not propagate to the other, and XMLSerializer could emit content that the caller believed they had already updated.

We filed issue #989 with a minimal repro and followed up with PR #990, which a maintainer reviewed and merged.

The drift

A CharacterData node — Text, Comment, CDATASection, or ProcessingInstruction — conceptually holds one string. Before the fix, it held that string twice, as two independent own-properties:

CharacterData instance (before)
├── data:      "hello"   ← own property
├── nodeValue: "hello"   ← own property, independent of `data`
└── length:    5

Assigning one did not update the other:

const text = doc.documentElement.firstChild;
text.nodeValue = "Changed";

text.data; // still "hello"
new XMLSerializer().serializeToString(doc); // emits the old value

The internal serialization path for text, comment, CDATA, and processing-instruction nodes reads node.data, so an application that updated nodeValue directly — which the DOM spec permits — could silently serialize stale content. The reverse held too: assigning data left nodeValue stale.

The full repro is in issue #989.

The fix (PR #990)

PR #990 collapses the two fields into a single backing field and moves data and nodeValue onto the prototype as accessors. After the change, each instance carries one canonical field:

CharacterData instance (after)
├── _data:  "hello"     // the one canonical backing field
└── length: 5
                        // data and nodeValue live on the prototype as accessors

From a caller's perspective, direct assignment now behaves the way the DOM spec leads them to expect:

text.nodeValue = "Changed";
text.data;                                 // "Changed"
new XMLSerializer().serializeToString(doc); // emits the new value

Under the hood, the accessors look like this. Inline comments are added here for readability; the actual diff is commentless in the usual style.

Object.defineProperty(CharacterData.prototype, "data", {
  get() {
    return this._data;            // read returns the canonical value
  },
  set(value) {
    this._data = value;           // write updates the canonical value
    this.length = value.length;   // length stays in sync
  },
});

Object.defineProperty(CharacterData.prototype, "nodeValue", {
  get() { return this.data; },    // delegate to data (and therefore _data)
  set(value) { this.data = value; },
});

Full diff: PR #990.

A small side effect

Collapsing two own-properties into one prototype-backed accessor also removes one property slot from every CharacterData instance. In a Node v22.19.0 microbenchmark with 1,000,000 instances, retained heap was about 8 bytes lower per instance — consistent with removing one 64-bit tagged slot on an engine where pointer compression is disabled by default. The correctness fix is the main contribution; the heap delta is a small additional benefit. Benchmark script and numbers are in the appendix.

Why we contribute upstream

We build on open-source document tooling, so the quality of what we ship depends on the quality of the parsers and serializers underneath it. When we find an issue in one of those libraries we use, we try to report it precisely and, where we can, contribute the fix back.


Appendix: benchmark

Run with node --expose-gc bench.js.

function OldNode(value) {
  this.data = value;
  this.nodeValue = value;
  this.length = value.length;
}

function NewNode(value) {
  this._data = value;
  this.length = value.length;
}
Object.defineProperty(NewNode.prototype, "data", {
  get() { return this._data; },
  set(v) { this._data = v; this.length = v.length; },
});
Object.defineProperty(NewNode.prototype, "nodeValue", {
  get() { return this.data; },
  set(v) { this.data = v; },
});

function measure(Ctor, values) {
  global.gc(); global.gc();
  const before = process.memoryUsage().heapUsed;
  const arr = new Array(values.length);
  for (let i = 0; i < values.length; i++) arr[i] = new Ctor(values[i]);
  global.gc(); global.gc();
  const after = process.memoryUsage().heapUsed;
  return { arr, bytes: after - before };
}

const N = 1_000_000;
const shared = Array(N).fill("hello");
const unique = Array.from({ length: N }, (_, i) => "hello-" + i);

for (const values of [shared, unique]) {
  const a = measure(OldNode, values);
  const b = measure(NewNode, values);
  console.log((a.bytes / 1024 / 1024).toFixed(2), "MiB →",
              (b.bytes / 1024 / 1024).toFixed(2), "MiB");
}
Scenario Old heap New heap Savings Per instance
Shared string 53.42 MiB 45.79 MiB 7.63 MiB ~8 bytes
Unique strings 53.41 MiB 45.78 MiB 7.63 MiB ~8 bytes

The shared-string and unique-string cases landing on the same delta indicates the change removed object overhead, not duplicated string contents. The absolute number (~8 bytes per instance = one 64-bit tagged slot) is specific to Node 22 with pointer compression disabled; under pointer compression this refactor would save closer to 4 bytes per instance. The delta is also sensitive to construction pattern — the same conceptual refactor produces a different number if you construct via Object.create + Object.assign instead of a constructor, because V8's hidden-class layout differs.

About Steven Obiajulu

Steven Obiajulu
Steven Obiajulu

Steven Obiajulu is a former Ropes & Gray attorney with expertise in law and technology. Harvard Law '18 and MIT '13 graduate combining technical engineering background with legal practice to build accessible AI solutions for transactional lawyers.

New York, NY UseJunior Former Ropes & Gray attorney (6 years) • Harvard Law '18, MIT '13
Last updated: April 19, 2026

Not a law firm. Not legal advice.