30

Is there a clean way to resolve a DNS query (get IP by hostname) in Java asynchronously, in non-blocking way (i.e. state machine, not 1 query = 1 thread - I'd like to run tens of thousands queries simultaneously, but not run tens of thousands of threads)?

What I've found so far:

  • Standard InetAddress.getByName() implementation is blocking and looks like standard Java libraries lack any non-blocking implementations.
  • Resolving DNS in bulk question discusses similar problem, but the only solution found is multi-threaded approach (i.e. one thread working on only 1 query in every given moment of a time), which is not really scalable.
  • dnsjava library is also blocking only.
  • There are ancient non-blocking extensions to dnsjava dating from 2006, thus lacking any modern Java concurrency stuff such as Future paradigm usage and, alas, very limited queue-only implementation.
  • dnsjnio project is also an extension to dnsjava, but it also works in threaded model (i.e. 1 query = 1 thread).
  • asyncorg seems to be the best available solution I've found so far targeting this issue, but:
    • it's also from 2007 and looks abandoned
    • lacks almost any documentation/javadoc
    • uses lots of non-standard techniques such as Fun class

Any other ideas/implementations I've missed?

Clarification. I have a fairly large (several TB per day) amount of logs. Every log line has a host name that can be from pretty much anywhere around the internet and I need an IP address for that hostname for my further statistics calculations. Order of lines doesn't really matter, so, basically, my idea is to start 2 threads: first to iterate over lines:

  • Read a line, parse it, get the host name
  • Send a query to DNS server to resolve a given host name, don't block for answer
  • Store the line and DNS query socket handle in some buffer in memory
  • Go to the next line

And a second thread that will:

  • Wait for DNS server to answer any query (using epoll / kqueue like technique)
  • Read the answer, find which line it was for in a buffer
  • Write line with resolved IP to the output
  • Proceed to waiting for the next answer

A simple model implementation in Perl using AnyEvent shows me that my idea is generally correct and I can easily achieve speeds like 15-20K queries per second this way (naive blocking implementation gets like 2-3 queries per second - just the sake of comparison - so that's like 4 orders of magnitude difference). Now I need to implement the same in Java - and I'd like to skip rolling out my own DNS implementation ;)

Community
  • 1
  • 1
GreyCat
  • 15,483
  • 17
  • 70
  • 107
  • 4
    In what situation do you need "tens of thousands of queries" at the same time? As in, what is the problem that you're really trying to solve? – HonkyTonk Aug 14 '12 at 15:31
  • I've added clarifications on the algorithm I'm trying to implement (in fact, it's fairly standard parallelization technique that compresses lots of slow queries in a small amount of time, executing them in parallel). – GreyCat Aug 15 '12 at 22:34
  • 3
    How about 1 thread read data, encapsulate the host name in an object and throw it into a queue for n threads to do blocking DNS/fetch job from queue if done, and the results are sent to one thread that do the job of ordering the output? Non-blocking communication is likely to hide the fact that there is a separate thread that is doing blocked communication. – nhahtdh Aug 15 '12 at 22:41
  • n in "n threads" would around 15000-20000 to be effective. I don't really want to create 20K threads for this purpose. That's the whole point of doing non-blocking calls. – GreyCat Aug 16 '12 at 16:03
  • 1
    Setting up a local DNS server might be an option too. Should be faster even with your Perl solution. At least try addressing several DNS servers, to improve speed, and reduce flooding them with requests - also in your own interest. – Joop Eggen Aug 18 '12 at 00:58
  • We have several layers of caching: an internal cache within an application and a cluster of local DNS servers to execute these queries (as they should be executed by a cluster of parsers, so it would be k * 15000..20000 queries per second and they should be balanced across this cluster) – GreyCat Aug 18 '12 at 07:30
  • This won't become skynet will it? :) – Shark Aug 21 '12 at 08:41
  • For similar purposes we've used a local BerkeleyDB to cache and retrieve the already resolved addresses – Lorand Bendig Aug 21 '12 at 21:22
  • Is there some restriction on simply using java to call your perl script? Perhaps feeding host name data to the script via a local socket and reading the output from another local socket? Just a thought. – Jeremy Feb 21 '13 at 04:43
  • @Jeremy: It is possible, but it's kind of messy solution. Given that this one executes in a clustered Java environment, this would mean that I have to somehow distribute and maintain Perl installation and all required modules (such as AnyEvent) on every cluster node. – GreyCat Feb 23 '13 at 22:12

6 Answers6

5

It may be that the Apache Directory Services implementation of DNS on top of MINA is what you're looking for. The JavaDocs and other useful guides are on that page, in the left-hand side-bar.

Community
  • 1
  • 1
andersoj
  • 20,778
  • 6
  • 59
  • 72
5

There is some work on non blocking DNS in netty, but it's still work in progress in will be probably released only in 5.0

Sean Bright
  • 109,632
  • 17
  • 131
  • 138
valodzka
  • 4,927
  • 3
  • 33
  • 47
  • DnsNameResolver will be included in version 4.1.0 that will be released soon (it is in 4.1.0.CR2 currently). If you want to use some DNS extension, you have to build and parse protocol records yourself, but it shouldn't be problem. – Karry Feb 13 '16 at 15:43
3

You will, I think, have to implement the DNS client protocol yourself on top of raw UDP using base sockets support, or on top of TCP using NIO channels.

Lawrence Dol
  • 59,198
  • 25
  • 134
  • 183
2

I don't have an answer to your question (I don't know if there is a DNS library that will operate in the async mode that you want) and this is too long for a comment.

But, you should be able to quickly produce an async one without having to write the full DNS handler yourself. Warning, I haven't done this so I could be all wrong.

Starting with the dnsjava code you ought to be able to implement your own resolver that will provide you both a sender and receiver method. Check out SimpleResolver and look at the send method. You ought to be able to break up this method into two methods, one to send your request that runs up to the call to either the TCPClient or the UDPClient (you would handle the actual on the wire sending at this point, as you described, with your first thread), and, one to receive, which would be called by your second thread as a response to a socket read, and handle parsing the response. You may have to either copy all of the code from the SimpleResolver (lots of private methods that you'll need and licensing allows for it), or, you could create your own version and simply load it ahead of the jared one in your classpath, or, your could reflect your way to the methods in question and set them accessible.

You can quickly build the network client side with either netty or mina. I prefer netty for the docs.

If you do go down this path and can/want to open source it, I can set aside some time to help if you get into trouble.

philwb
  • 3,685
  • 17
  • 19
1

Linux has an asynchronous DNS lookup function: http://www.imperialviolet.org/2005/06/01/asynchronous-dns-lookups-with-glibc.html

If you are on Linux you just need to wrap that up in some JNI.

vladimir e.
  • 713
  • 3
  • 7
0

You have multiple options

Option 1: Java 5 Executors

  1. A Fixed thread pool: Executors.newFixedThreadPool(int)
  2. Future: A Future represents the result of an asynchronous computation. Methods are provided to check if the computation is complete, to wait for its completion, and to retrieve the result of the computation.

Option 2: JMS with MessageListener

  1. Requires dependency on JMS Provider etc.

Option 2: Actor based framework

You can scale this well with this.Look at Akka.

Aravind Yarram
  • 74,434
  • 44
  • 210
  • 298
  • 1
    Sorry, I'm not asking for parallelization methodology - I already know which one would be the best for such a task - it's a classic job for an event machine with minimal number of threads. I don't even need a *queue* - order of lines I process don't matter as they would be resorted in later processing anyway. I'm asking for DNS querying libraries for Java that can be ran in non-blocking mode. – GreyCat Aug 16 '12 at 16:05