XPath is actually pretty useful once it stops being confusing
Mat Brown
8
Tech
Tekst piosenki
tag The first text node following a tag So, in the simplest case:
This is the beginning of a line.This is too.
But we also want to handle nested inline elements:This is the beginning of a line. This is not.
I'll take the low roadMy first instinct was to just write a Ruby method to scan over relevant parts of the DOM and recursively seek out text nodes that fit our criteria. I used some very light CSS selectors, but nothing too fancy: def each_new_line(document) document.css('p').each { |p| yield first_text_node(p) } document.css('br').each { |br| yield first_text_node(br.next) } enddef first_text_node(node) if node.nil? then nil elsif node.text? then node elsif node.children.any? then first_text_node(node.children.first) end end def first_text_node(node) if node.nil? then nil elsif node.text? then node elsif node.children.any? then first_text_node(node.children.first) end end This is a perfectly reasonable solution, but it's a whopping 11 lines of code. Further, it feels like we're using the wrong tool for the job: why are we using Ruby iterators and conditionals to get at DOM nodes? Can we do better? Enter XPathXPath is confusing for a couple of reasons. The first is that there are surprisingly few good references on the Internet (don't even think about looking at W3Schools!). The best doc I've found is the RFC itself. The second is that XPath looks deceptively like CSS. The word "path" is right there in the name, and so I had always assumed, mistakenly, that the / in an XPath expression plays the same role as the > in a CSS selector: document.xpath('//p/em/a') == document.css('p > em > a')As it turns out, the XPath expression involves a lot of shorthand, which we'll want to explode in order to really understand what's going on. Here's the same expression written out in longhand: /descendant-or-self::node()/child::p/child::em/child::a/This XPath expression and the CSS selector above are equivalent, but not for the reason I had always assumed. An XPath expression consists of one or more “location steps” separated by forward slashes. The / at the beginning means the context of the first step is the root node of the document. Each location step knows which nodes have already been matched, and uses that context to answer three questions: Where do I want to move from the current context?This is called the Axis, and it's optional. The default axis is child, meaning "select all of the children of the currently selected nodes." In the above example, descendant-or-self is the axis for the first location step, meaning "all of the currently selected nodes and all of their descendants." Most of the axes defined by the XPath spec are likewise intuitively named. What sort of nodes do I want to select?Am I selectingtags, text nodes, or is it a free-for-all? This is specified by the node test, which is the only required part of the location step. In our above example, node() is the most permissive node test: it selects everything. text() would only select text nodes; element() would only select elements; and explicitly specified node names like p and em above, of course, would only select elements with those names. Are there additional filters I want to add?Maybe I only want to select the first child of every node in the current context, or I only want to select
Tłumaczenie
Polecani artyści
Najnowsze teksty piosenek
Sprawdź teksty piosenek i albumy dodane w ciągu ostatnich 7 dni