I missed the editing window, but here's a simple example of ElementTree in Pytho...

easrng · on April 3, 2022

You aren't implementing innerText, you're implementing textContent. Consider the HTML a<b>c</b>d<div>e<div>f</div></div>. Using textContent results in "acdef", while innerText results in "acd\ne\nf". Implementing textContent is easy:

  function getText(node, returnArray){
    const a = Array.prototype.map.call(node.childNodes, e=>e.nodeType==3?e.data:getText(e, true))
    return returnArray?a:a.flat(Infinity).join("")
  }

Which can be golfed down to

  let getText=(n,f=(n,a)=>[].map.call(n.childNodes,e=>e.nodeType==3?e.data:f(e,1)))=>f(n,f).flat(1e333).join("")

InnerText is harder, you have to consider visibility and if the element is block or inline. You can't properly calculate innerText without parsing CSS.

kragen · on April 3, 2022

Very nice! And thank you for the correction about innerText.

Still, I think that

    let getText = n => n.text + n.getchildren().map(getText).join('') + n.tail

is nicer than

    let getText=(n,f=(n,a)=>[].map.call(n.childNodes,e=>e.nodeType==3?e.data:f(e,1)))=>f(n,f).flat(1e333).join("")

and not just because it's less strokes.

easrng · on April 4, 2022

Most of the reason it's more complicated is because I made it return arrays when recursing and only join at the end because that probably doesn't need as many allocations. The rest of the complication is because the DOM has separate text nodes and the library you're using doesn't seem to. (Oh, and also NodeLists don't have .map so we use the .map from the array prototype) Here's a non-optimized version:

  let getText = n =>[].map.call(n.childNodes,e => e.nodeType == 3 ? e.data : getText(e)).join("")

kragen · on April 4, 2022

Right, I'm saying I think the statically-typed ElementTree approach is superior to the dynamically-typed approach used by the W3C DOM.