0

I am attempting to stream a large JSON file ( ~300mb ) to Solr using JRuby ( 9.1.8.0 using jruby-openssl 0.9.21 ). The remote Solr server is using SSL authentication. I've included my script below.

When I use JRuby, I run out of heap space almost immediately using default 500mb. Using MRI, the usage never goes over 40mb. Not using SSL, JRuby works fine. I've done similar processes in pure Java, but never had these kinds of problems. Not sure what's happening here..

Thanks for any suggestions...

require 'openssl'
require 'net/http'
require 'json'

PEM_FILE = ENV["CLIENT_CERT"]
SOLR_URL = ENV["SOLR_URL"]

class SolrClient

  DEFAULT_OPTIONS = {
    use_ssl: true,
    verify_mode: OpenSSL::SSL::VERIFY_PEER,
    keep_alive_timeout: 30,
    cert: OpenSSL::X509::Certificate.new(IO.read(PEM_FILE)),
    key:  OpenSSL::PKey::RSA.new(IO.read(PEM_FILE)),
  }


  def initialize(http = nil)
    if http
      @http = http
    else
      @http = Net::HTTP.start('my.solr.url', 443, DEFAULT_OPTIONS)
    end
  end

  def update()
    bytes = File.open('index_batch.json', 'rb').bytes.count.to_s
    stream = File.open('index_batch.json', 'rb')
    puts "starting request..." 
    request = Net::HTTP::Post.new "/solr/archivesspace/update" 
    request['Content-Type'] = 'application/json'
    request['Content-Length'] = bytes
    request.body_stream = stream

    response = @http.request request
    puts response.body
  end

end


SolrClient.new.update
fitz
  • 22
  • 1
  • 3
  • Are you setting the heap size to 500mb or you think it's 500mb? – pvg Nov 08 '17 at 22:10
  • No, just running the script without any -J-Xmx flags, it stop immediately when it starts to send the request. If run the script with -J-Xmx1024m, it runs for a minute or so, then runs out...go up to 2048m, it goes a bit longer...etc etc. – fitz Nov 08 '17 at 22:22
  • You could try VisualVM or similar tools to see where the memory is going ( e.g. https://stackoverflow.com/questions/9154785/how-to-find-memory-leaks-using-visualvm) although unless you're particularly interested in debugging jruby-openssl, you might find it easier to just rewrite this particular thing in Java. – pvg Nov 08 '17 at 22:27
  • Yeah, I took a heap dump in VisualVM...97.5% of the memory was being used by byte[]..so i guess the stream is being read before it's being sent. Also..just tried this on jruby 1.7.9 and it ran withouth any problems, so, yeah looks like an issue with newer version of jruby-openssl... – fitz Nov 08 '17 at 22:55

1 Answers1

0

Posted an issue on the jruby github and got a response almost immediately:

https://github.com/jruby/jruby/issues/4842

As a temporary fix, I'm patching http:

class Net::HTTPGenericRequest

  def send_request_with_body_stream(sock, ver, path, f)                                                                                                                                      
    unless content_length() or chunked?                                                                                                                                                      
      raise ArgumentError,                                                                                                                                                                   
          "Content-Length not given and Transfer-Encoding is not `chunked'"                                                                                                                  
    end                                                                                                                                                                                      
    supply_default_content_type                                                                                                                                                              
    write_header sock, ver, path                                                                                                                                                             
    wait_for_continue sock, ver if sock.continue_timeout                                                                                                                                     
    if chunked?                                                                                                                                                                              
      chunker = Chunker.new(sock)                                                                                                                                                            
      IO.copy_stream(f, chunker)                                                                                                                                                             
      chunker.finish                                                                                                                                                                         
    else                                                                                                                                                                                     
      # copy_stream can sendfile() to sock.io unless we use SSL.                                                                                                                             
      # If sock.io is an SSLSocket, copy_stream will hit SSL_write()                                                                                                                         
      if  sock.io.is_a? OpenSSL::SSL::SSLSocket                                                                                                                                              
        IO.copy_stream(f, sock.io, 16 * 1024 * 1024) until f.eof?                                                                                                                            
      else                                                                                                                                                                                   
        IO.copy_stream(f, sock.io)                                                                                                                                                           
      end                                                                                                                                                                                    
    end                                                                                                                                                                                      
  end                                                                                                                                                                                        
end 
fitz
  • 22
  • 1
  • 3