-1

I have an array of strings (or ArrayList) something like:

strMain = "S1R2G3M1D1N3";

The strMain consists of several alphabet followed by digits as suffix.

Also I have a string something like:

str1 = "S1,,--R2,,,,D3-N3";

I need to see if each S1, R2, D3 and N3 in str1 are part of the array of the string strMain. I could not figure out how to do this. I guess I need to split str1 such that I get only "letters followed by the digit" into an array. Then I could check the presence of these strings in strMain. Can anyone suggest the regex in-order to split? Is there any other way we could check the presence without splitting (instead use regex to search for the presence)

Can you tell me the regex for splitting this?

Unihedron
  • 10,251
  • 13
  • 53
  • 66
Mahesha Padyana
  • 411
  • 4
  • 20
  • 2
    I don't understand what you're asking. This seems to be an [XY Problem](http://xyproblem.info); Maybe [this](http://stackoverflow.com/a/24656216/3622940) will help you. – Unihedron Aug 27 '14 at 17:47
  • I believe matching will be easier (better) than trying to split. – Jerry Aug 27 '14 at 17:57

1 Answers1

2

This regex could work: [A-Z][0-9]

Example code:

String strMain = "S1R2G3M1D1N3";
String str = "S1,,--R2,,,,D3-N3";
Pattern pattern = Pattern.compile( "[A-Z][0-9]" );
Matcher matcher = pattern.matcher( str );
while ( matcher.find() ) {
    if ( strMain.contains( matcher.group() ) ) {
        System.out.println( matcher.group() );
    }
}

gives this output

S1
R2
N3

EDIT

In response to your comment...

Sometimes digit may not be present. What is the expression? Ex: str="S,,--R2,,,,-N3" shall print "SR2N3". Also sometimes I may have to include single dot or double dots or single quotes or two single quotes Ex: str="S.,,--R2..,,,D3-N3',N3''" shall print S., R2.., N3', N3'' . Here only alphabet is must and digit, single dot, two dots, single quote or two single quotes are all optional.

String strMain = "S1R2G3M1D1N3";
String str = "S.,,--R2...o,,,D3-N3',N3''";
Pattern pattern = Pattern.compile( "([A-Z][0-9]?)(?:\\.{1,2}|'{1,2})?" );
Matcher matcher = pattern.matcher( str );
while ( matcher.find() ) {
    if ( strMain.contains( matcher.group( 1 ) ) ) {
        System.out.println( matcher.group( 0 ) );
    }
}

gives this output:

S.
R2..
N3'
N3''

[A-Z] is one capital letter.
[0-9] is one number.
X? is X, one or zero times. so then...
[0-9]? is one number, one or zero times.

Parenthesis create a capturing group, meaning we can later grab what was matched between the parenthesis...

([A-Z][0-9]?) is going to capture one capital letter and the optional one number.

Then to match the dots and single quotes...

X{Y,Z} means match X, between Y and Z times, so...
X{1,2} means match X, between 1 and 2 times.
X|Y means to match either X or Y. I surround this in parenthesis, otherwise the whole expression will be OR'ed.
\\. means to match a period. You can't just use . because that has a special meaning, which is any one character. Therefore you must escape it with \, which itself also has to be escaped for the java compiler by using another one.
(\\.{1,2}|'{1,2}) means to match one or two periods, OR one or two single quotes, and capture the group.
(?:X) means to not capture the group - I don't care about capturing this group, so putting everything together...
(?:\\.{1,2}|'{1,2})? - match one or two periods, OR one or two single quotes, and do this whole match either one or zero times.

Then later you can call matcher.group(...) to get captured groups, starting at 1. 0 means the entire match. So then the group(1) call gives me just the alphanumeric part, which I use for checking if it exists.

Take a look here at the Javadoc: http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html

tobii
  • 517
  • 3
  • 13
  • Thanks, Works fine. Sometimes digit may not be present. What is the expression? Ex: str="S,,--R2,,,,-N3" shall print "SR2N3". Also sometimes I may have to include single dot or double dots or single quotes or two single quotes Ex: str="S.,,--R2..,,,D3-N3',N3''" shall print S., R2.., N3', N3'' . Here only alphabet is must and digit, single dot, two dots, single quote or two single quotes are all optional. – Mahesha Padyana Aug 28 '14 at 02:36
  • @MaheshaPadyana I expanded my answer to help with that, and also added some explanation for the regex used, and hopefully it will help you in the future for writing your own! – tobii Aug 28 '14 at 18:27
  • Works great. Thanks for wonderful explanation of each parameter. – Mahesha Padyana Aug 29 '14 at 01:29
  • @MaheshaPadyana No problem! Be sure to accept an answer if you think it answered your question - both you and the answerer will get reputation points! :) – tobii Aug 30 '14 at 01:27
  • Accepted. I did not know that there is an option to accept!! – Mahesha Padyana Aug 30 '14 at 16:12
  • What if I want to split and put this in an ArrayList like: [S.,,--][R2...o,,,][D3-][N3',][N3'']. That means I want to separate based on alphabets and put into arraylist. No characters should be lost. – Mahesha Padyana Sep 11 '14 at 18:23