2

I am trying to generate a temporary hosts file, which is based on a DNS log provided by dnsmasq. I have got it mostly working, however I am having a problem with CNAMEs. I will show what I have achieved so far.

There are 3 types of responses in the log which I need to obtain. The simplest is very easy to extract from the log, eg

Jun 20 14:27:59 dnsmasq[2551]: reply stackoverflow.com is 64.34.119.12

This can be output to "64.34.119.12 stackoverflow.com" using

grep reply /tmp/dnslog | grep -v 'NXDOMAIN\|NODATA' | awk '{print $8 " " $6}'

The other type of log concerns CNAMEs, here is one example;

Jun 20 14:42:11 dnsmasq[2551]: reply www.videolan.org is <CNAME>
Jun 20 14:42:11 dnsmasq[2551]: reply ganesh.videolan.org is 88.191.250.2

This can be output to "88.191.250.2 ganesh.videolan.org www.videolan.org" using

grep reply /tmp/dnslog | grep -v 'NXDOMAIN\|NODATA' | awk '{print $8 "\t" $6}' | awk '/CNAME/ {name=$2; getline ; print $0 " " 'name'}'

However this method does not work for the following type of log, where there are multiple CNAMEs

Jun 20 15:00:42 dnsmasq[2551]: reply en.wikipedia.org is <CNAME>                        
Jun 20 15:00:42 dnsmasq[2551]: reply wikipedia-lb.wikimedia.org is <CNAME>              
Jun 20 15:00:42 dnsmasq[2551]: reply wikipedia-lb.esams.wikimedia.org is 91.198.174.225

The previous command gives the following result

<CNAME> wikipedia-lb.wikimedia.org      en.wikipedia.org

By using the first command mentioned along with the second, wikipedia-lb.esams.wikimedia.org is associated with 91.198.174.225, however wikipedia-lb.wikimedia.org is not associated with wikipedia-lb.esams.wikimedia.org. The ideal result should be the following

91.198.174.225 wikipedia-lb.esams.wikimedia.org wikipedia-lb.wikimedia.org      en.wikipedia.org

To remedy this problem, I believe the file would need to be read backwards, however would doing this not mess up the getline part of awk, to append to the next line?

Ideally, I would like to combine both types of log into a command which would then output everything, rather than having to run both scripts separately. Can anyone assist in mending the awk command to do this?

Here is a sample of "grep reply /var/dnslog", and the desired hosts file to be output. There are other issues that are secondary at the moment. These are highlighted in the desired hosts output.

Jun 20 15:28:21 dnsmasq[2551]: reply photos-a.ak.fbcdn.net is <CNAME>
Jun 20 15:28:21 dnsmasq[2551]: reply photos-a.ak.facebook.com.edgesuite.net is <CNAME>
Jun 20 15:28:21 dnsmasq[2551]: reply a995.dspmm1.akamai.net is 213.200.108.25
Jun 20 15:28:21 dnsmasq[2551]: reply a995.dspmm1.akamai.net is 213.200.108.48
Jun 20 15:28:21 dnsmasq[2551]: reply a995.dspmm1.akamai.net is 213.200.108.64
Jun 20 15:28:21 dnsmasq[2551]: reply a995.dspmm1.akamai.net is 213.200.108.9
Jun 20 15:28:21 dnsmasq[2551]: reply a995.dspmm1.akamai.net is 213.200.108.26
Jun 20 15:28:21 dnsmasq[2551]: reply a995.dspmm1.akamai.net is 213.200.108.51
Jun 20 15:28:21 dnsmasq[2551]: reply a995.dspmm1.akamai.net is 213.200.108.8
Jun 20 15:28:21 dnsmasq[2551]: reply a995.dspmm1.akamai.net is 213.200.108.50
Jun 20 15:28:21 dnsmasq[2551]: reply a995.dspmm1.akamai.net is 213.200.108.65
Jun 20 15:28:22 dnsmasq[2551]: reply stackoverflow.com is 64.34.119.12
Jun 20 15:29:41 dnsmasq[2551]: reply www.wikipedia.org is <CNAME>
Jun 20 15:29:41 dnsmasq[2551]: reply wikipedia-lb.wikimedia.org is <CNAME>
Jun 20 15:29:41 dnsmasq[2551]: reply wikipedia-lb.esams.wikimedia.org is 91.198.174.225
Jun 20 15:29:42 dnsmasq[2551]: reply en.wikipedia.org is <CNAME>
Jun 20 15:29:42 dnsmasq[2551]: reply wikipedia-lb.wikimedia.org is <CNAME>
Jun 20 15:29:42 dnsmasq[2551]: reply wikipedia-lb.esams.wikimedia.org is 91.198.174.225
Jun 20 15:29:42 dnsmasq[2551]: reply ja.wikipedia.org is <CNAME>
Jun 20 15:29:42 dnsmasq[2551]: reply wikipedia-lb.wikimedia.org is <CNAME>
Jun 20 15:29:42 dnsmasq[2551]: reply wikipedia-lb.esams.wikimedia.org is 91.198.174.225

hosts file

213.200.108.26  a995.dspmm1.akamai.net photos-a.ak.facebook.com.edgesuite.net photos-a.ak.fbcdn.net 
##ideally select 1 host at random from multiple of a995.dspmm1.akamai.net, although list may be randomised already so 1st will suffice##
64.34.119.12    stackoverflow.com
91.198.174.225  wikipedia-lb.esams.wikimedia.org wikipedia-lb.wikimedia.org www.wikipedia.org
91.198.174.225  wikipedia-lb.esams.wikimedia.org wikipedia-lb.wikimedia.org en.wikipedia.org
91.198.174.225  wikipedia-lb.esams.wikimedia.org wikipedia-lb.wikimedia.org ja.wikipedia.org 
##Ideally, detect these similarities for wikipedia and convert the 3 lines into this;##
91.198.174.225  wikipedia-lb.esams.wikimedia.org wikipedia-lb.wikimedia.org www.wikipedia.org en.wikipedia.org ja.wikipedia.org

The intention is that the file will be distributable on a low bandwidth, high latency link, so the file should be as small as possible. I am aware that using this file over a long period of time will cause lots of issues, I have configured the file only to be valid for a short period of time. If anyone can help with the issues pointed out, it would be greatly appreciated. Also, I have a limited range of UNIX applications available. If the above can be achieved in awk, that would be preferable. Thank you in advance!

Matthew
  • 63
  • 1
  • 7

2 Answers2

0

Using awk with sort:

..|awk '{if($8 ~ /<CNAME>/){load=load" "$6}else{print $8" "load" "$6;load=""}}'
  | sort -u -k2
Prince John Wesley
  • 58,850
  • 11
  • 80
  • 91
  • Thank you very much! The device I am using does not have the sort command but I used "| awk ' !x[$2]++'" to fix it. Is this ok in terms of efficiency, etc? Is it possible as well to merge mostly duplicated lines, such as the wikipedia example given in the sample? Thanks – Matthew Jun 20 '12 at 16:24
0

Call using awk -f parse.awk dnsmasq.log.

/reply/ { 
    host = $6;
    ip = $8;

    names[length(names)+1] = host;

    if (ip !~ /CNAME/) {
    # assign all names up to now the same IP
    # This will overwrite any previous IP assignment as well
    for (i in names) IPs[names[i]] = ip;
    delete names;
    }
}

END {
    # collate hostnames for a particular IP
    for (host in IPs) hosts[IPs[host]] = hosts[IPs[host]]" "host;
    for (IP in hosts) print IP hosts[IP];
}
chthonicdaemon
  • 16,668
  • 1
  • 39
  • 59