3

I need to resolve a large number (hundreds of thousands) of domains to IP addresses in Java. While using InetAddress.getByName() is feasible for small numbers it is far to slow for use in large quantities (probably because it is only sending one request at a time to the DNS server and waiting for the response before moving on to the next one).

Is there a more efficient way (such as sending them to the DNS server in bulk) that would cut down the time required to resolve a large number of domains?

At fmucar's request I'm adding the code used to try a more multi-threaded approach:

Set<String> ips = Collections.synchronizedSet(new HashSet<String>());
int i = 0;
List<Set<String>> sets = new ArrayList<Set<String>>();
for (String host : domains) {
    if (i++ % 5 == 0) {
        sets.add(new HashSet<String>());
    }
    Set<String> ipset = sets.get(sets.size()-1);
    ipset.add(host);
}
for (Set<String> ipset : sets) {
    Thread t = new Thread(new DomainResolver(ips, ipset));
    t.start();
}

At 250 per thread we peaked around 700 results per minute. Which, while better than before (<300) was still not that great when needing to resolve hundreds of thousands. Lowering it to only 5 per thread greatly speeds this up to several thousand per minute. This obviously creates an insane amount of threads though, so presently investigating doing the resolving in C to make use of http://www.chiark.greenend.org.uk/~ian/adns/

Exupery
  • 3,070
  • 4
  • 36
  • 48

2 Answers2

2

According to the RFC for DNS Implementation you can only ask one question at a time as defined below:

4.1.2. Question section format

The question section is used to carry the "question" in most queries, i.e., the parameters that define what is being asked. The section contains QDCOUNT (usually 1) entries, each of the following format:

                                1  1  1  1  1  1
  0  1  2  3  4  5  6  7  8  9  0  1  2  3  4  5
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|                                               |
/                     QNAME                     /
/                                               /
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|                     QTYPE                     |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|                     QCLASS                    |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+

where:

QNAME a domain name represented as a sequence of labels, where each label consists of a length octet followed by that number of octets. The domain name terminates with the zero length octet for the null label of the root. Note that this field may be an odd number of octets; no padding is used.

QTYPE a two octet code which specifies the type of the query. The values for this field include all codes valid for a TYPE field, together with some more general codes which can match more than one type of RR.

Mockapetris [Page 28] RFC 1035 Domain Implementation and Specification
November 1987

QCLASS a two octet code that specifies the class of the query. For example, the QCLASS field is IN for the Internet. ....

However you might get custom [ higly unlikely ] resolvers that mainitain their own caches and support bulk transfers as their specification is slightly open ended. I dont know if any exist though. Maybe you can write one :) ... For more information about resolvers look at section 5 of this RFC

The easiest solution would be to use threading as suggested before.

EDIT: The moral of the story I guess is that DNS servers are not designed to accept bulk requests. This makes sense as otherwise it might be easy for attackers to request too much information from a single DNS server

Osama Javed
  • 1,364
  • 1
  • 13
  • 21
  • 1
    Turns out even though bulk resolution isn't possible the lookups can be greatly sped up by utilizing a well configured DNS server. Using Google's DNS we were able to process several thousand per minute. – Exupery May 01 '12 at 19:17
1

You can use java.util.concurrent.* classes to create a multithreaded app to do several queries without waiting the result.

See ExecutorService, Runnable, Callable, Future, Thread ... classes.

It may be a good idea to read a tutorial if those are new to you.

eg. You can use a `BlockingQueue`, and producer-consumer pattern.

One part of your app will start creating Callable objects which they will place the result into BlockingQueue as they become available and another will take results from the BlockingQueue and write to file maybe.

EDIT 1 : Sample:

ExecutorService threadExecutor = Executors.newFixedThreadPool(50);
for(....){
  Runnable thread = new Thread(new DomainResolver(ips, ipset));
  threadExecutor.execute(thread);
}

Instead of creating and starting several threads at once, delegate execution task to executor (see above edit) service which accept 50 thread at max at any time. You will need to find the optimum number of threads, too many thread means, most cpu cycles will be used to switch through the thread. Too low mean, cpu cycles will be wasted waiting for DNS server to return a result

fmucar
  • 13,463
  • 2
  • 42
  • 50
  • Experimented with one, four, and eight threads and the gains aren't actually that great. A single thread was processing about 275 a minute, eight threads was around 330 per minute...we really need something closer to thousands per minute (which may not be possible) – Exupery May 01 '12 at 16:12
  • If your software is at its peak point of optimization then you may want to upgrade the hardware. But I would say it should probably be better then 330per/min, if it was 275per/min before. Show us some code, how did you implemented it – fmucar May 01 '12 at 16:25
  • See my edit for the code used. I've re-factored it slightly and it now handles a little over 700 per minute. – Exupery May 01 '12 at 17:22
  • Increasing the number of thread does not mean the output will increase with a direct ratio. The number of optimum thread depends on the hardsware and diretly related to the number of cores and response wait time for each operation. You may want to try thread in between 10 to 50 but as i said it is greatly hardware dependent after some point – fmucar May 02 '12 at 09:02
  • ExecutorService is probably the way to go (I had to bump the max threads up to 500 though), am now able to resolve tens of thousands per minute which makes a million doable in a reasonable amount of time. Thanks! – Exupery May 02 '12 at 13:35