-2

I want to extract fully qualified classnames from Java / JSP source code with a regular expression.

There are already some threads about this, esp. Regular expression matching fully qualified class names

Though I'm very close to solving the problem, I can't get rid of false positives.

Here are some examples. At the end of the line I have attached the expected value.

Logger l = LoggerFactory.getLogger("test");         // not a FQN, because it starts with an uppercase letter ("LoggerFactory")
if(!com.db.TFSec.isPermitted("test") return;        // should return "com.db.TFSecurity"
new java.util.concurrent.BrokenException();         // java.util.concurrent.BrokenException
java.util.Set<Log> ls = new java.util.HashSet<>();  // java.util.Set, java.util.HashSet
java.awt.Component c1d2 = new java.awt.List();      // java.awt.Component, java.awt.List
com.de.tfsecurity.TFUser u;                         // com.de.tfsecurity.TFUser

I've tried these 3 regexes:

// Own try. Only one false positive in line1: [oggerFactory.getLogger("test")] 
([a-z]\\w*\\.\\w+(\\.\\w+)*)[<\\( ;]

// The following two regexes are the "correct" answers from the thread mentioned above. But I get false positives.
([a-zA-Z_$][a-zA-Z\\d_$]*\\.)*[a-zA-Z_$][a-zA-Z\\d_$]*    // false positives: [Logger, LoggerFactory.getLogger, test, if, return, new, c1d2] etc.
([a-z][a-z_0-9]*\\.)*[A-Z_]($[A-Z_]|[\\w_])*          // false positives: the same as in the previous example

Here's my source code:

public class FileUsageScanner {
    // This is my own try. Works for most of the time, but we have false positives with LoggerFactory.getLogger, which is not a FQN
    private final Pattern fqnPatternOwnTry = Pattern.compile("([a-z]\\w*\\.\\w+(\\.\\w+)*)[<\\( ;]");
    //  Solutions from https://stackoverflow.com/questions/5205339/regular-expression-matching-fully-qualified-java-classes
    // Lots of false positives like: [Logger, LoggerFactory.getLogger, test, if, return, new, c1d2] etc.
    private final Pattern fqnPatternThr =  Pattern.compile("([a-zA-Z_$][a-zA-Z\\d_$]*\\.)*[a-zA-Z_$][a-zA-Z\\d_$]*");
    private final Pattern fqnPatternThr2 = Pattern.compile("([a-z][a-z_0-9]*\\.)*[A-Z_]($[A-Z_]|[\\w_])*");


    public static void main(String[] args) throws IOException {
        FileUsageScanner scan = new FileUsageScanner();
        scan.getFQClassname("Logger logger = LoggerFactory.getLogger(\"test\");)"); // not a FQN
        scan.getFQClassname("if(!com.db.TFSec.isPermitted(\"test\") return;");      // com.db.TFSec
        scan.getFQClassname("new java.util.concurrent.BrokenException();");         // java.util.concurrent.BrokenException
        scan.getFQClassname("java.util.Set<Log> loggers = new java.util.HashSet<>();"); // java.util.Set, java.util.HashSet
        scan.getFQClassname("java.awt.Component c1d2 = new java.awt.List();");  // java.awt.Component, java.awt.List
        scan.getFQClassname("com.de.tfsecurity.TFUser u;");                     //com.de.tfsecurity.TFUser
    }

    private List<String> getFQClassname(String line) {
        if (line != null && !line.isEmpty() && line.contains(".")) {
            Matcher matcher = fqnPatternThr2.matcher(line);
            List<String> l = null;
            while (matcher.find()) {
                if (l == null) {
                     l = new ArrayList<String>();
                }
                l.add(matcher.group());
            }
            if (l != null)
                System.out.println("Found FQN in " + line + " -> " + l);
            return l;
        }
        return null;
    }
}    

How can I get rid of the false positives?

Thanks for any comments,

Bernhard

Community
  • 1
  • 1
Bernie
  • 1,504
  • 1
  • 15
  • 25
  • The following FQNs *are* legal (one thing is convention, another is what Java considers valid): `My.Packages.With.UpperCase`, `a.package.aClass`. You are assuming that there will be two or more lowercase package-levels followed by a camel-case class-name. If so, state it clearly in the question. – tucuxi Jul 20 '14 at 00:01
  • Thanks. `My.Packages.With.UpperCase`: Ok, I wasn't aware, that this is possible, because I have never seen uppercase package names (just browsed through a bunch of jar libs). And yes, I was assuming a packagename like low.lower.UppercaseClassname. Maybe it's just convention, but I could live with that, even if the Java Specs tells me otherwise. – Bernie Jul 20 '14 at 00:22

1 Answers1

0

I would be inclined to use the utility methods of the Character:

And loop over the characters of each part (separated by dots) of the full name.

Bohemian
  • 365,064
  • 84
  • 522
  • 658