3

I have a problem trying to update xmlType values in oracle. I need to modify the xml looking similar to the following:

<a>
  <b>Something to change here</b>
  <c>Here is some narrative containing weirdly escaped &lt;tags>\&lt;/tags> </c>
</a>

What I want to achieve is to modify <b/> without modifying <c/>

Unfortunately following modifyXml:

select 
  updatexml(XML_TO_MODIFY, '/a/b/text()', 'NewValue')
from dual;

returns this:

<a>
  <b>NewValue</b>
  <c>Here is some narrative containing weirdly escaped &lt;tags&gt;&lt;/tags&gt; </c>
</a>

as you can see, the '>' had been escaped.

Same happens for xmlQuery (the new non-deprecated version of updateXml):

select /*+ no_xml_query_rewrite */
      xmlquery(
        'copy $d := .
         modify (
           for $i in $d/a
           return replace value of node $i/b with ''nana''
         )
         return $d'
        passing t.xml_data
         returning content
       ) as updated_doc
from (select xmlType('<a>
      <b>Something to change here</b>
      <c>Here is some narrative containing weirdly escaped \&lt;tags>\&lt;/tags> </c>
    </a>') as xml_data from dual) t
;

Also when using xmlTransform I will get the same result. I tried to use the

disable-output-escaping="yes"

But it did the opposite - it unescaped the &lt; :

select XMLTransform(
    xmlType('<a>
      <b>Something to change here</b>
      <c>Here is some narrative containing weirdly escaped \&lt;tags>\&lt;/tags> </c>
    </a>'),
    XMLType(
'<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
        <xsl:template match="/a/b">
                <b>
                <xsl:value-of select="text()"/>
                </b>
        </xsl:template>

        <xsl:template match="/a/c">
                <c>
                <xsl:value-of select="text()" disable-output-escaping="yes"/>
                </c>
        </xsl:template>
</xsl:stylesheet>'))
from dual;

returned:

<a>
  <b>NewValue</b>
  <c>Here is some narrative containing weirdly escaped <tags></tags> </c>
</a>

Any suggestions? Two things you need to know:

  1. I cannot modify the initial format - it comes to me in this way and I need to preserve it.
  2. The original message is so big, that changing the message to string and back (to use regexps as workaround) will not do the trick.
  • At the risk of sounding silly: why is it a problem for you that the characters are escaped? No conforming XML processor can complain that `>` is being represented as `>` -- indeed, unless you are actually directly comparing the strings forming the markup, there's no way to tell the difference. Is there any chance this is just not a problem? – Jeroen Mostert Dec 15 '14 at 14:12
  • Unfortunately we need to reduce the changes to minimum, as this xml is obtained from external system, and than used to reconcile with this system. If there are changes, they complain that message is different and say they never sent out anything like that. Of cause what I do is obviously already changing the message, but this is to re-mediate a bug that created the change in the first place (storing such xml's is not an issue, updating is). – Michal Lozinski Dec 15 '14 at 14:26
  • Is your source data actually of `XMLType` or is it a `CLOB`? In the latter case you might be able to get away with using `DBMS_LOB` to change the value without the perf penalty (although it won't be easy). – Jeroen Mostert Dec 15 '14 at 15:10
  • unfortunately it is in XmlType. The underlying selected storage type for us is CLOB if I remember correctly, but I am not sure if that's sufficient for what you suggest, since I'm not sure if XmlType data stored in CLOB is not modified in some way in comparison to standard CLOB - never had to get that deep into it. – Michal Lozinski Dec 16 '14 at 07:08
  • If it's `XMLType` you're probably out of luck because the underlying `CLOB` is managed by the engine and inaccessible for direct modification (as far as I could tell), that's why I asked. – Jeroen Mostert Dec 16 '14 at 08:23

2 Answers2

1

The root of your issue seems to be that your original XML value for node C is not valid XML if it contains the > within the value instead of &gt;, and not inside a CDATA section (also What does <![CDATA[]]> in XML mean?).

The string value of:

Here is some narrative containing weirdly escaped &lt;tags>\&lt;/tags>

in XML format should really be

<c>Here is some narrative containing weirdly escaped &amp;lt;tags&gt;\&amp;lt;/tags&gt;</c>

OR

<c><![CDATA[Here is some narrative containing weirdly escaped &lt;tags>\&lt;/tags>]]></c>

I would either request that the XML be corrected at the source, or implement some method to sanitize the inputs yourself, such as wrapping the <c> node values in <![CDATA[]]>. If you need to save the exact original value, and the messages are large, then the best I can think of is the store duplicate copies, with the original value as string, and store the "sanitized" value as XML data type.

Community
  • 1
  • 1
BateTech
  • 5,047
  • 3
  • 18
  • 31
  • 1
    The XML is well-formed. The XML standard mandates that `&` and ``, except in the sequence `]]>` (to avoid confusion with CDATA). The backslash is the Oracle escape character, so the element contents really is (without escaping of any kind) `Here is some narrative containing weirdly escaped `. – Jeroen Mostert Dec 15 '14 at 14:44
  • 1
    While I agree that this should be in CDATA it is not so, and as Jeroen said, since it's not breaking anything, there is not enough justification to change the input - it's a part of big and complex system, and such change would have to be synchronized and communicated to several downstream systems. Not really plausible. Keeping the original string on the side is only a workaround, and while eligible, it's not a clean solution. What buggs me is that I can't find the way of simply updating this xml without Oracle doing me a favour of trying to help where it's not supposed to. ;) – Michal Lozinski Dec 15 '14 at 14:50
  • I did not realize that `>` was valid in XML, so thanks, I learned something new today! @MichalLozinski is your data in a table that is stored using `XMLType` data type? It appears that `updateXML` function is deprecated, so you could try using some of the suggestions mentioned in this article https://docs.oracle.com/database/121/ADXDB/app_depr_upd.htm#ADXDB6060 (check out sections "Example C-7 Object Relational Equivalent of UPDATEXML Expression" and "Table C-1 Migrating Oracle-Specific XML Updating Queries to XQuery Update") – BateTech Dec 15 '14 at 17:38
  • You are correct, but xmlQuery didn't help - it performs the same way as updatexml - I edited the question to give and example. If there is an option to disable character encoding for xmlUpdate I couldn't find it in the spec =/ – Michal Lozinski Dec 16 '14 at 12:08
0

In the end we managed to do this with the help of java. By:

  1. reading the xml as a clob
  2. modifying it in java
  3. storing it back in the database using java.sql.Connection (for some reason, if we used JdbcTemplate, it complained about casting to Long, which was indication that string was over 4000 bytes (talking about clean errors, all hail Oracle) and using CLOB Type didn't really help. I guess it's a different story though)

When storing the data, oracle does not perform any magic, only updates tend to modify escape characters.

Possibly not an answer for everyone, but a nice workaround if you stumble upon same problem as we did.