0

I have an XML string that i wish to log out. this XML contains some sensitive data that i'd like to mask out before sending to the log file. Currently using std::regex to do this:

std::regex reg("<SensitiveData>(\\d*)</SensitiveData>");
return std::regex_replace(xml, reg, "<SensitiveData>......</SensitiveData>");

Currently the data is being replaced by exactly 6 '.' characters, however what i really want to do is to replace the sensitive data with the correct number of dots. I.e. I'd like to get the length of the capture group and put that exact number of dots down.

Can this be done?

markv
  • 191
  • 5
  • 7
    Surely [you must be trolling](http://stackoverflow.com/q/1732348/596781)... – Kerrek SB Oct 14 '13 at 21:43
  • Single tag matching, with no nested tags, is regular. – Platinum Azure Oct 14 '13 at 21:46
  • Replacing your sensitive data with the exact number of dots would reveal information about the sensitive data. I would reconsider your question. – Olaf Dietsche Oct 14 '13 at 21:48
  • Then wouldn't the number of dots indicate sensitive information? To the question, if the lang doesn't support a callback, sit in a find while loop and rewrite the string `[\S\s]*(\\d*)` –  Oct 14 '13 at 21:51
  • Better this `([\S\s]*)(\\d*)` –  Oct 14 '13 at 22:05
  • The sensitive data in question is just a credit card number. For PCI compliance we are allowed to log the first 6 and last 4 digits of the number. I.e. 4111111111111111 --> 411111......1111 These number can vary in size from 15 digits up to 19 digits. – markv Oct 14 '13 at 23:28
  • Storing credit card number in XML? Uh, I am out of words... – mvp Oct 15 '13 at 05:25
  • @mvp My understanding of the OP, they're transferring the XML data between internal components, and only storing ("logging") the truncated numbers. – user4815162342 Oct 15 '13 at 06:39
  • yes - the full card number (and other sensitive data) are only stored permanently in encrypted form (using an HSM for key management). This question relates to XML based messages sent over SSL between two applications. – markv Oct 28 '13 at 21:19

1 Answers1

0

regex_replace of C++11 regular expressions does not have the capability you are asking for — the replacement format argument must be a string. Some regular expression APIs allow replacement to be a function that receives a match, and which could perform exactly the substitution you need.

But regexps are not the only way to solve a problem, and in C++ it's not exactly hard to look for two fixed strings and replace characters inbetween:

const char* const PREFIX = "<SensitiveData>";
const char* const SUFFIX = "</SensitiveData>";

void replace_sensitive(std::string& xml) {
    size_t start = 0;
    while (true) {
      size_t pref, suff;
      if ((pref = xml.find(PREFIX, start)) == std::string::npos)
        break;
      if ((suff = xml.find(SUFFIX, pref + strlen(PREFIX))) == std::string::npos)
        break;
      // replace stuff between prefix and suffix with '.'
      for (size_t i = pref + strlen(PREFIX); i < suff; i++)
        xml[i] = '.';
      start = suff + strlen(SUFFIX);
    }
}
user4815162342
  • 104,573
  • 13
  • 179
  • 246
  • What if the tag is re-used in an unrelated section ` don't overwrite me ` ? –  Oct 15 '13 at 17:18
  • @sln Good point, the code I posted is in that sense not 100% equivalent to the OP's regex. But I wouldn't be surprised if the contents of `` element of the OP's XML always contains digits. More importantly, it is a trivial excercise to add the check that `xml[pref:suff]` is all-digit if necessary. – user4815162342 Oct 15 '13 at 18:51