1

I am trying to write code to check domain names are valid according to rfc 1035 standard or not. RFC 1035(https://tools.ietf.org/html/rfc1035) standard has following criteria for domain names:

<domain> ::= <subdomain> | " "

<subdomain> ::= <label> | <subdomain> "." <label>

<label> ::= <letter> [ [ <ldh-str> ] <let-dig> ]

<ldh-str> ::= <let-dig-hyp> | <let-dig-hyp> <ldh-str>

<let-dig-hyp> ::= <let-dig> | "-"

<let-dig> ::= <letter> | <digit>

<letter> ::= any one of the 52 alphabetic characters A through Z in
upper case and a through z in lower case

<digit> ::= any one of the ten digits 0 through 9

Note that while upper and lower case letters are allowed in domain
names, no significance is attached to the case.  That is, two names with
the same spelling but different case are to be treated as if identical.

The labels must follow the rules for ARPANET host names.  They must
start with a letter, end with a letter or digit, and have as interior
characters only letters, digits, and hyphen.  There are also some
restrictions on the length.  Labels must be 63 characters or less.

I have written following code snippet in Java to check if domain name is valid according to rfc 1035 or not.

//DomainUtils.java
import java.util.HashSet;
import java.util.Set;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

class DomainUtils {

   private static Pattern pDomainNameOnly1;
   private static Pattern pDomainNameOnly2;

   private static final String DOMAIN_NAME_PATTERN_CHK_1 = "^(?![0-9-])[A-Za-z0-9-]{1,63}(?<!-)$";
   private static final String DOMAIN_NAME_PATTERN_CHK_2 = "^((?![0-9-])[A-Za-z0-9-]{1,63}(?<!-)\\.)+(?![0-9-])[A-Za-z0-9-]{1,63}(?<!-)$";

   static {
       pDomainNameOnly1 = Pattern.compile(DOMAIN_NAME_PATTERN_CHK_1);
       pDomainNameOnly2 = Pattern.compile(DOMAIN_NAME_PATTERN_CHK_2);
   }

   public static boolean isValidDomainName(String domainName) {
       return (pDomainNameOnly1.matcher(domainName).find() || pDomainNameOnly2.matcher(domainName).find() || domainName.equals(" "));
   }

}

and

//Main.java
public class Main{
   public static void main(String[] args){
       boolean valid = DomainUtils.isValidDomainName("a123456789a123456789a123456789a123456789a123456789a1234567891234.ARPA"); //check if domain name is valid or not
       System.out.println("Valid domain name : " + valid);
   }

}

I just wanted to check if there is some efficient way(other than what i have written) to check if domain name is valid with rfc 1035 standard? Also if I need to check my code works for corner cases for rfc 1035 standard, then where can I check. Are there some existing libraries I can use for this check?

rishi007bansod
  • 789
  • 1
  • 9
  • 30
  • It also depends where you use this check. RFC 1035 has various updates to change things. For example while it is technically forbidden to have a purely numeric TLD, in practice nowadays it is not possible. See my longer reply at https://stackoverflow.com/a/53875771/6368697 – Patrick Mevzek Jul 08 '19 at 01:32

1 Answers1

2

Try this:

^[a-zA-Z]([a-zA-Z0-9-]*[a-zA-Z0-9])?(\.[a-zA-Z]([a-zA-Z0-9-]*[a-zA-Z0-9])?)*$

as can be shown in this demo

To construct this expression, we first use the label component (a single char in the set a-zA-Z followed (optionally) by a sequence of chars in the set a-zA-Z0-9-, and ending in a non - (hyphen is permitted inside, but not at the beginning or end of a label) leading to

[a-zA-Z]([a-zA-Z0-9-]*[a-zA-Z0-9])?

this expression is repeated under the following pattern:

A(\.A)*

which means a sequence of A, followed by any number (even 0) of sequences of a dot followed by another instance of A.

By substituting the above reges in the positions of A, we get to the final regexp. The anchors eliminate any other surrounding strings in the beginning/end of the string.

To check that labels be only up to 63 chars, you can do

[a-zA-Z]([a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?

but beware as this regexp compiles to a very big table automaton (an automaton with many states) so you had better to relax if you are short of space.

Luis Colorado
  • 8,037
  • 1
  • 10
  • 27
  • This does not take into account all cases, but it depends on what the OP wants exactly (RFC 1035 is not anymore the up to date reference on domain names syntax) . For example the third and fourth characters of any label can not be `--` except if it is an IDN of the form `xn--` and then only `xn` can be used as prefix and nothing else. – Patrick Mevzek Jul 08 '19 at 01:34
  • I'm afraid the `xn--` prefix (not only the last two hyphens) is an IANA used way to allow for expandability. The allowance of double dashes has forced to change the basic DNS syntax. I'm not aware of the prohibition of having `--` in that position for different uses than IDN. Anyway, do you have the actual reference RFC for domain names syntax. I was aware that, originally a DNS name was permitted to begin with a digit (even when clashing with a textual description of a IPV4 address) – Luis Colorado Jul 09 '19 at 08:00
  • See my anwser to a smilar question that should give you pointers about syntax and formats and change over the years: https://stackoverflow.com/a/53875771/6368697 – Patrick Mevzek Sep 17 '19 at 02:58
  • Also about hyphens, see ICANN IDN implementation guidelines at https://www.icann.org/en/system/files/files/idn-guidelines-10may18-en.pdf point 4: "No label containing hyphens in both the third and the fourth positions maybe registered unless it is a valid A-label, with reservation for transitional action. Labels with hyphens in both the third and the fourth positions are explicitly reserved to indicate encoding schemes, of which IDNA is only one instantiation." – Patrick Mevzek Sep 17 '19 at 02:58