2

I have a massive text file that I have to manually parse. There really is no other way but to iterate over it.

I'm grabbing each of the lines in the file and .split(" ") them to get the individual components - some are int arrays, others are char arrays, and some other are actual text strings.

The text strings are causing me a headache because sometimes they have a space in there.

An example line is something like:

String strLine = "Identifier {2 4 \"#0# == \\\"This String\\\"\" 12 21 6}

When I do the following:

String[] strParts = strLine.split(" ");

The resulting output is a String array that has String values of:

Identifier,{2,4,"#0#,==,\"This,String\"",12,21,6}

I need the output to be:

Identifier,{2,4,"#0# == \"This String\"",12,21,6}

So I'm pondering if there is a different whitespace character I can apply to the String inside the quotes prior to executing the split().

Anyone know of one?

I also considered diving into RegEx, but I haven't worked with RegEx enough to be able to formulate the logic around the split properly - Split on space unless that space is between the first and last quote".

Thx.

[Update]

I am adding this here because the formatting of code in replies is not optimal.

String strLine = "Identifier {2 4 \"#0# == \\\"This String\\\"\" 12 21 6}";

String delim = "§"; //use the Section Sign as a delimeter

    StringBuilder sb = new StringBuilder();
    //first part
    sb.append(new String(strLine.substring(0,strLine.indexOf("\""))));

    //middle part
    sb.append(new String(strLine.substring(strLine.indexOf("\""),strLine.lastIndexOf("\""))).replace(" ", delim));

    //last part
    sb.append(new String(strLine.substring(strLine.lastIndexOf("\""))));

    //make array
    String[] parts = sb.toString().split(" ");

I'll need to replace those delimeter chars later on but at least it does what I need it to now.

Thanks all for the suggestions, it was a combination of them that ultimately solved this for me.

Bandolier2k
  • 135
  • 2
  • 12
  • 1
    Did not really understand what you need, could you please, add what would be the result for that line after split or what the line should became before the split. – Jorge Campos Jan 07 '16 at 18:39
  • 4
    I think a traditional `for`-loop over all the character and keeping track of wether or not you are in a quoted string should also work. (Basically `for(...) { if(currentChar == '"') insideQuotedString = !insideQuotedString; else { if(insideQuotedString) /* add to string literal */ else /* unquoted data*/ }`. Different approaches can be looked up on https://stackoverflow.com/questions/7804335/split-string-on-spaces-in-java-except-if-between-quotes-i-e-treat-hello-wor or https://stackoverflow.com/questions/7212276/parsing-quoted-text-in-java – Maximilian Gerhardt Jan 07 '16 at 18:44
  • Thanks for the replies.I edited my original post with more detail on the current output and the necessary output. Iterating at a character level occurred to me but then there could be an unknown amount of "String within a String" as well so I'd have to track each quote Level to only deal with the outermost quotes and leave the rest as-is. – Bandolier2k Jan 07 '16 at 19:23

2 Answers2

1

You don't really need to replace the string spaces with a space. Try something fairly unique like: '_!_!'

Look for your substitution string first to verify that it's not in the file, then do the substitution.

Then do your normal split.

And, finally, replace the substitution string with a normal space in your finished product.

marklark
  • 808
  • 1
  • 8
  • 18
  • 1
    I considered this but then I considered that I don't have control over what someone puts in certain text fields. I can just choose something VERY out of the norm or in an extended char set or something like that I suppose. I was also trying to not have to deal with this text a second time. – Bandolier2k Jan 07 '16 at 19:21
  • This is a version of what I'm going with for now: `String strLine = "Identifier {2 4 \"#0# == \\\"This String\\\"\" 12 21 6}"; String delim = "§"; //use the Section Sign as a delimeter StringBuilder sb = new StringBuilder(); sb.append(new String(strLine.substring(0,strLine.indexOf("\"")))); sb.append(new String(strLine.substring(strLine.indexOf("\""),strLine.lastIndexOf("\""))).replace(" ", delim)); sb.append(new String(strLine.substring(strLine.lastIndexOf("\"")))); //make array String[] parts = sb.toString().split(" ");` – Bandolier2k Jan 07 '16 at 20:55
  • If you're sure that there is only one quoted string per line, you could split the first substring and add its parts to the "parts" array, find the quoted string and add it to the "parts" array, and then split the rest of the string and add those to the "parts" array -- no substitutions necessary. – marklark Jan 08 '16 at 20:31
1

Not exactly elegant, but think it does the job :

private static String[] mySplit(String src)
{
    int firstIdx = src.indexOf('"');
    int lastIdx = src.lastIndexOf('"');
    if ( firstIdx == lastIdx )
        return src.split(" ");
    if ( src.charAt(lastIdx+1) == ' ' )
        lastIdx++;
    // first part
    String[] firstPart = src.substring(0, firstIdx-1).split(" ");
    String[] lastPart = src.substring(lastIdx+1).split(" ");

    String[] res = new String[firstPart.length+1+lastPart.length];
    System.arraycopy(firstPart,0,res,0, firstPart.length );
    res[firstPart.length] = src.substring(src.indexOf('"'), src.lastIndexOf('"'));
    System.arraycopy(lastPart,0,res,firstPart.length+1, lastPart.length );
    return res;
}