I have a Thrift API served from a Java application running on Linux. I'm using a .NET client to connect to the API and execute operations.
The first few calls to the service work fine without errors, but then (seemingly at random) a call will "hang." If I force-quit my client and try to reconnect, the service either hangs again, or my client has the following error:
Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host.
at System.Net.Sockets.NetworkStream.Read(Byte[] buffer, Int32 offset, Int32 size)
at Thrift.Transport.TStreamTransport.Read(Byte[] buf, Int32 off, Int32 len)
(etc.)
When I use JConsole to get a thread dump, the server is on accept()
"Thread-1" prio=10 tid=0x00002aaad457a800 nid=0x79c7 runnable [0x00000000434af000]
java.lang.Thread.State: RUNNABLE
at java.net.PlainSocketImpl.socketAccept(Native Method)
at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:408)
- locked <0x00000005c0fef470> (a java.net.SocksSocketImpl)
at java.net.ServerSocket.implAccept(ServerSocket.java:462)
at java.net.ServerSocket.accept(ServerSocket.java:430)
at org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:113)
at org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:35)
at org.apache.thrift.transport.TServerTransport.accept(TServerTransport.java:31)
at org.apache.thrift.server.TSimpleServer.serve(TSimpleServer.java:63)
netstat
on the sever shows connections to the service port that are on TIME_WAIT
which eventually disappear several minutes after I force-quit the client (as would be expected).
The code that sets up the Thrift service is as follows:
int port = thriftServicePort;
String host = thriftServiceHost;
InetAddress adr = InetAddress.getByName(host);
InetSocketAddress address = new InetSocketAddress(adr, port);
TServerTransport serverTransport = new TServerSocket(address);
TServer server = new TSimpleServer(new TServer.Args(serverTransport).processor((org.apache.thrift.TProcessor)processor));
server.serve();
Note that we're using the TServerTransport
constructor that takes an explicit hostname or IP address. I suspect that I should change it to take the constructor that only specifies a port (ultimately binding to InetAddress.anyLocalAddress()
). Alternatively, I suppose I could configure the service to bind to the "wildcard" address ("0.0.0.0").
I should mention that the service is not hosted on the open Internet. It is hosted in a private network and I am using SSH tunneling to reach it. Hence, the hostname that the service is bound to does not resolve in my local network (although I can make the initial connection via tunneling). I wonder if this is something similar to the RMI TCP callback problem?
Is there a technical explanation for what's going on (if this is a common issue), or additional troublehshooting steps that I can take?
UPDATE
Had the same problem today, but this time jstack
shows that the Thrift server is blocking forever reading from the input stream:
"Thread-1" prio=10 tid=0x00002aaad43fc000 nid=0x60b3 runnable [0x0000000041741000]
java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:129)
at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:22)
at org.apache.thrift.server.TSimpleServer.serve(TSimpleServer.java:70)
So we need to set a "client timeout" in the TServerSocket
constructor. But why would that cause the application to also refuse connections when blocking on accept()
?