3

Is there an existing, working hosts file grammar on the web?

I checked out list on http://www.antlr.org/grammar/list, but I didn't find it there.

I also checked the hosts file entry in Wikipedia, and it referenced RFC 952, but I don't think that is the same format used by /windows/system32/drivers/etc/hosts.

Any grammar format is better than none, but I would prefer one in ANTLR format. This is the first time I've used any grammar generators, and I want to keep my learning curve low. I'm already planning to use ANTLR for consuming other files.

Merlyn Morgan-Graham
  • 54,918
  • 14
  • 119
  • 174

1 Answers1

5

From a Microsoft page:

The HOSTS file format is the same as the format for host tables in the Version 4.3 Berkeley Software Distribution (BSD) UNIX /etc/hosts file.

And the /etc/hosts file is described here.

An example file:

#
# Table of IP addresses and hostnames
#
172.16.12.2     peanut.nuts.com peanut
127.0.0.1       localhost
172.16.12.1     almond.nuts.com almond loghost
172.16.12.4     walnut.nuts.com walnut
172.16.12.3     pecan.nuts.com pecan
172.16.1.2      filbert.nuts.com filbert
172.16.6.4      salt.plant.nuts.com salt.plant salt

A hosts file looks to be formatted like this:

  • each table entry in /etc/hosts contains an IP address separated by whitespace(s) from a list of hostnames associated with that address
  • a table entry can optionally end with zero or more alias
  • comments begin with #

The bold words will be the rules in the ANTLR grammar, which may look like this:

grammar Hosts;

parse
  :  tableEntry* EOF
  ;

tableEntry
  :  address hostName aliases?
     {
       System.out.println("\n== Entry ==");
       System.out.println("  address  : " + $address.text);
       System.out.println("  hostName : " + $hostName.text);
       System.out.println("  aliases  : " + $aliases.text);
     }
  ;

address
  :  Octet '.' Octet '.' Octet '.' Octet
  ;

hostName
  :  Name
  ;

aliases
  :  Name+
  ;

Name
  :  Letter+ ('.' Letter+)*
  ;

Comment
  :  '#' ~('\r' | '\n')* {$channel=HIDDEN;}
  ;

Space
  :  (' ' | '\t' | '\r' | '\n') {$channel=HIDDEN;}
  ;

Octet
  :  Digit Digit Digit
  |  Digit Digit
  |  Digit
  ;

fragment Letter
  :  'a'..'z'
  |  'A'..'Z'
  ;

fragment Digit
  :  '0'..'9'
  ;

which can be tested with the class:

import org.antlr.runtime.*;

public class Main {
  public static void main(String[] args) throws Exception {
    String source = 
        "#                                                   \n" +
        "# Table of IP addresses and Hostnames               \n" +
        "#                                                   \n" +
        "172.16.12.2     peanut.nuts.com peanut              \n" +
        "127.0.0.1       localhost                           \n" +
        "172.16.12.1     almond.nuts.com almond loghost      \n" +
        "172.16.12.4     walnut.nuts.com walnut              \n" +
        "172.16.12.3     pecan.nuts.com pecan                \n" +
        "172.16.1.2      filbert.nuts.com filbert            \n" +
        "172.16.6.4      salt.plant.nuts.com salt.plant salt   ";
    ANTLRStringStream in = new ANTLRStringStream(source);
    HostsLexer lexer = new HostsLexer(in);
    CommonTokenStream tokens = new CommonTokenStream(lexer);
    HostsParser parser = new HostsParser(tokens);
    parser.parse();
  }
}

and will produce the following output:

bart@hades:~/Programming/ANTLR/Demos/Hosts$ java -cp antlr-3.3.jar org.antlr.Tool Hosts.g
bart@hades:~/Programming/ANTLR/Demos/Hosts$ javac -cp antlr-3.3.jar *.java
bart@hades:~/Programming/ANTLR/Demos/Hosts$ java -cp .:antlr-3.3.jar Main

== Entry ==
  address  : 172.16.12.2
  hostName : peanut.nuts.com
  aliases  : peanut

== Entry ==
  address  : 127.0.0.1
  hostName : localhost
  aliases  : null

== Entry ==
  address  : 172.16.12.1
  hostName : almond.nuts.com
  aliases  : almond loghost

== Entry ==
      address  : 172.16.12.4
  hostName : walnut.nuts.com
  aliases  : walnut

== Entry ==
  address  : 172.16.12.3
  hostName : pecan.nuts.com
  aliases  : pecan

== Entry ==
  address  : 172.16.1.2
  hostName : filbert.nuts.com
  aliases  : filbert

== Entry ==
  address  : 172.16.6.4
  hostName : salt.plant.nuts.com
  aliases  : salt.plant salt

Note that this is just a quick demo: host names can contain other characters than the ones I described, to name just one shortcoming.

Bart Kiers
  • 153,868
  • 34
  • 276
  • 272
  • I agree if I had a grammar that had all the features and was compatible that I should do the translation as an exercise. I can't tell for sure, but I don't think this grammar is compatible with the Windows' hosts file, which is what I am trying to consume. The other thing is I think hosts has some less-used features, but I can't find anything on the net. For example, most pages I've found with info only have one IP address and one host alias, but I'm pretty sure multiple aliases are supported per entry... – Merlyn Morgan-Graham May 27 '11 at 06:13
  • @Merlyn, I revised my answer. – Bart Kiers May 27 '11 at 08:18
  • This is a great starting point for me. Thanks :) I'll "make this my own," of course, and if I have corrections, I'll let you know. – Merlyn Morgan-Graham May 27 '11 at 18:19