0

I want to use a regular expression in javascript in order to validate a Java package name.

The simplest way to do that is to match a list of words and dots, so I have implemented this regex:

(^(?:[a-z_]+(?:\d*[a-zA-Z_]*)*)(?:\.[a-z_]+(?:\d*[a-zA-Z_]*)*)*$)

This regex ensure the followings:

  1. the package name can only be formed by letters, digits, dots and underscores.
  2. the package name must always start with a word (or with an undersccore), but never with a digit or with a dot
  3. the package name can optionally contain a sequence of a pattern made by: a dot, followed by a word that can start with letters and underscore but not with digits.
  4. the package name must always end with a non-dot character.
  5. the package name must start with a lowercase letter by convention (Java permits Uppercase starting package names, even if they are discouraged)

This regex match for example these package names:

com
com.test
com.test.regex
_com._123
comTest.regEx

And it doesnt'match these package names:

123
com.
.com
test.123com
test.123_

This is correct, but it's not enough for my purpose, because if a single word is a Java-reserved word, I have to invalidate the all package name.

Let's take this package name as example:

com.test.package

This package name is recognized as valid with my regex, but the word package is a Java-reserved word and it can't be used for a Java package name.

Valid package names can be:

com.test.packageTest
com.test.testpackage

The word package can be a substring of another word, but it can't be a single word between the dots.

How can I modify my regex in order to add the validation of the single words with these rules, avoiding the use of the Java-reserved words?

Thanks

Alessandro C
  • 2,750
  • 5
  • 33
  • 66
  • http://stackoverflow.com/questions/5205339/regular-expression-matching-fully-qualified-class-names will help you – Jekin Kalariya Sep 02 '16 at 10:18
  • 2
    You're probably going to have use negative lookaheads, and a ton of them, to cover all of Java's keywords. I would prefer to use a combination of string operations and regex. Keep your current regex, but then split on dot, and check that each piece is not a keyword. – Tim Biegeleisen Sep 02 '16 at 10:19
  • Yeah, it's a practice solution: check if every term is a reserved word. Probably it's too complex to make this check directly in the regex. – Alessandro C Sep 02 '16 at 10:27
  • @Tim Thanks for your help. I have found by myself a working regex, check it if you wish. ;) – Alessandro C Sep 05 '16 at 13:04

1 Answers1

4

I think I have found the regex.

I have built this one and it works perfectly:

(?!^abstract$|^abstract\..*|.*\.abstract\..*|.*\.abstract$|^assert$|^assert\..*|.*\.assert\..*|.*\.assert$|^boolean$|^boolean\..*|.*\.boolean\..*|.*\.boolean$|^break$|^break\..*|.*\.break\..*|.*\.break$|^byte$|^byte\..*|.*\.byte\..*|.*\.byte$|^case$|^case\..*|.*\.case\..*|.*\.case$|^catch$|^catch\..*|.*\.catch\..*|.*\.catch$|^char$|^char\..*|.*\.char\..*|.*\.char$|^class$|^class\..*|.*\.class\..*|.*\.class$|^const$|^const\..*|.*\.const\..*|.*\.const$|^continue$|^continue\..*|.*\.continue\..*|.*\.continue$|^default$|^default\..*|.*\.default\..*|.*\.default$|^do$|^do\..*|.*\.do\..*|.*\.do$|^double$|^double\..*|.*\.double\..*|.*\.double$|^else$|^else\..*|.*\.else\..*|.*\.else$|^enum$|^enum\..*|.*\.enum\..*|.*\.enum$|^extends$|^extends\..*|.*\.extends\..*|.*\.extends$|^final$|^final\..*|.*\.final\..*|.*\.final$|^finally$|^finally\..*|.*\.finally\..*|.*\.finally$|^float$|^float\..*|.*\.float\..*|.*\.float$|^for$|^for\..*|.*\.for\..*|.*\.for$|^goto$|^goto\..*|.*\.goto\..*|.*\.goto$|^if$|^if\..*|.*\.if\..*|.*\.if$|^implements$|^implements\..*|.*\.implements\..*|.*\.implements$|^import$|^import\..*|.*\.import\..*|.*\.import$|^instanceof$|^instanceof\..*|.*\.instanceof\..*|.*\.instanceof$|^int$|^int\..*|.*\.int\..*|.*\.int$|^interface$|^interface\..*|.*\.interface\..*|.*\.interface$|^long$|^long\..*|.*\.long\..*|.*\.long$|^native$|^native\..*|.*\.native\..*|.*\.native$|^new$|^new\..*|.*\.new\..*|.*\.new$|^package$|^package\..*|.*\.package\..*|.*\.package$|^private$|^private\..*|.*\.private\..*|.*\.private$|^protected$|^protected\..*|.*\.protected\..*|.*\.protected$|^public$|^public\..*|.*\.public\..*|.*\.public$|^return$|^return\..*|.*\.return\..*|.*\.return$|^short$|^short\..*|.*\.short\..*|.*\.short$|^static$|^static\..*|.*\.static\..*|.*\.static$|^strictfp$|^strictfp\..*|.*\.strictfp\..*|.*\.strictfp$|^super$|^super\..*|.*\.super\..*|.*\.super$|^switch$|^switch\..*|.*\.switch\..*|.*\.switch$|^synchronized$|^synchronized\..*|.*\.synchronized\..*|.*\.synchronized$|^this$|^this\..*|.*\.this\..*|.*\.this$|^throw$|^throw\..*|.*\.throw\..*|.*\.throw$|^throws$|^throws\..*|.*\.throws\..*|.*\.throws$|^transient$|^transient\..*|.*\.transient\..*|.*\.transient$|^try$|^try\..*|.*\.try\..*|.*\.try$|^void$|^void\..*|.*\.void\..*|.*\.void$|^volatile$|^volatile\..*|.*\.volatile\..*|.*\.volatile$|^while$|^while\..*|.*\.while\..*|.*\.while$)(^(?:[a-z_]+(?:\d*[a-zA-Z_]*)*)(?:\.[a-z_]+(?:\d*[a-zA-Z_]*)*)*$)

This regex ensure that:

  1. the package name must not start and/or end with a single reserved word
  2. the package name must not contain a reserved term between the dots

I have tested it with:

while1.package2.void3.transient4

and it works.

Alessandro C
  • 2,750
  • 5
  • 33
  • 66