I try to match ancestor-or-self of any element containing certain text string:
In step 1 matching of elements containing text works: //*[contains(text(),"ABC")]
.
But I struggle with the syntax of adding an ancestor. I tried //*ancestor-or-self::[contains(text(),"ABC")]
and //*[contains(text(),"ABC")]/ancestor-or-self
without success.
What is the correct syntax for this?
The code and string I want match can look like:
<p><strong>Vertreten durch:</strong><br>Max Mustermann</p>
So I look for the string Vertreten durch
to catch the parent element <p>...</p>
I created an example xml, and named it test.xml
:
<root>
<line1>
<line11>A</line11>
<line12>B</line12>
</line1>
<line2>
<line21>C</line21>
<line22>D</line22>
</line2>
</root>
Using xmlstarlet, you can do:
D:\TEMP>xml sel -t -m "//*[contains(text(),'D')]/ancestor-or-self::*" -v "name()" -n test.xml
root
line2
line22
D:\TEMP>xml sel -t -m "//*[contains(text(),'A')]/ancestor-or-self::*" -v "name()" -n test.xml
root
line1
line11
D:\TEMP>xml sel -C -t -m "//*[contains(text(),'A')]/ancestor-or-self::*" -v "name()" -n test.xml
<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:exslt="http://exslt.org/common" version="1.0" extension-element-prefixes="exslt">
<xsl:output omit-xml-declaration="yes" indent="no"/>
<xsl:template match="/">
<xsl:for-each select="//*[contains(text(),'A')]/ancestor-or-self::*">
<xsl:call-template name="value-of-template">
<xsl:with-param name="select" select="name()"/>
</xsl:call-template>
<xsl:value-of select="' '"/>
</xsl:for-each>
</xsl:template>
<xsl:template name="value-of-template">
<xsl:param name="select"/>
<xsl:value-of select="$select"/>
<xsl:for-each select="exslt:node-set($select)[position()>1]">
<xsl:value-of select="' '"/>
<xsl:value-of select="."/>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
D:\TEMP>
EDIT: With a minimum example from HTML (I made sure it is also valid XML):
D:\TEMP>type test.html
<html>
<head>
<title>test</title>
</head>
<body>
<p><strong>Vertreten durch:</strong><br />Max Mustermann</p>
</body>
</html>
D:\TEMP>xml sel -t -m //*[contains(text(),'Vertreten')] -c .. -n test.html
<p><strong>Vertreten durch:</strong><br/>Max Mustermann</p>
D:\TEMP>xml sel -C -t -m //*[contains(text(),'Vertreten')] -c .. -n test.html
<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output omit-xml-declaration="yes" indent="no"/>
<xsl:template match="/">
<xsl:for-each select="//*[contains(text(),'Vertreten')]">
<xsl:copy-of select=".."/>
<xsl:value-of select="' '"/>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
Don’t try to process text nodes; use the string value of elements instead.
If the string value of an element E contains the substring Vertreten durch:
, then the string value of all ancestors of E also contains this substring. So I think you simply need
//*[contains(., 'Vertreten durch:')]
If that doesn’t answer the question, then the question needs to be clearer. An example would help.
Can you add your XML (or a smaller version of it), with the output of the XPATH expression that your are searching for ?
Or take a look at: Difference between ancestor and ancestor-or-self
@Luuk: example added