Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I missed the editing window, but here's a simple example of ElementTree in Python; consider what it would take to do this in JS with the DOM if you had to implement .innerText yourself:

    >>> from xml.etree import ElementTree
    >>> def text(e):
    ...     return (e.text or '') + ''.join(
                text(kid) for kid in e.getchildren()) + (e.tail or '')
    ... 
    >>> text(ElementTree.fromstring('<p>My <b>Malamute</b> is <i>really</i> hairy.</p>'))
    'My Malamute is really hairy.'
Or if you had to implement .querySelectorAll yourself:

    >>> def find(e, clase):
    ...  if e.get('class') == clase: yield e
    ...  for kid in e.getchildren():
    ...   for nieto in find(kid, clase): yield nieto
    ... 
    >>> list(find(ElementTree.fromstring('<p>My <b class="dog">Malamute</b> is <i>really</i> hairy.</p>'), 'dog'))
    [<Element 'b' at 0x7f14717c7a10>]
In Python the .getchildren() call is unnecessary because ElementTree elements implement Python's iteration protocol, but I thought that might obscure the point a little.

On the other hand, without the dopey None case I objected to above, you wouldn't need the "or ''".



You aren't implementing innerText, you're implementing textContent. Consider the HTML a<b>c</b>d<div>e<div>f</div></div>. Using textContent results in "acdef", while innerText results in "acd\ne\nf". Implementing textContent is easy:

  function getText(node, returnArray){
    const a = Array.prototype.map.call(node.childNodes, e=>e.nodeType==3?e.data:getText(e, true))
    return returnArray?a:a.flat(Infinity).join("")
  }
Which can be golfed down to

  let getText=(n,f=(n,a)=>[].map.call(n.childNodes,e=>e.nodeType==3?e.data:f(e,1)))=>f(n,f).flat(1e333).join("")
InnerText is harder, you have to consider visibility and if the element is block or inline. You can't properly calculate innerText without parsing CSS.


Very nice! And thank you for the correction about innerText.

Still, I think that

    let getText = n => n.text + n.getchildren().map(getText).join('') + n.tail
is nicer than

    let getText=(n,f=(n,a)=>[].map.call(n.childNodes,e=>e.nodeType==3?e.data:f(e,1)))=>f(n,f).flat(1e333).join("")
and not just because it's less strokes.


Most of the reason it's more complicated is because I made it return arrays when recursing and only join at the end because that probably doesn't need as many allocations. The rest of the complication is because the DOM has separate text nodes and the library you're using doesn't seem to. (Oh, and also NodeLists don't have .map so we use the .map from the array prototype) Here's a non-optimized version:

  let getText = n =>[].map.call(n.childNodes,e => e.nodeType == 3 ? e.data : getText(e)).join("")


Right, I'm saying I think the statically-typed ElementTree approach is superior to the dynamically-typed approach used by the W3C DOM.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: