18

I am given a task to convert a huge table to custom XML file. I will be using Java for this job.

If I simply issue a "SELECT * FROM customer", it may return huge amount of data that eventually causing OOM. I wonder, is there a way i can process the record immediately once it become available, and remove the record from memory after that during sql retrieving process?

--- edited on 13 Jul 2009

Let me elaborate my question. I have 1 db server and 1 application server. When I issue a select query in application, the data will travel from db server to app server.

I believe (correct me if I am wrong) ResultSet will need to wait until receiving all records in the query. Even if we set fetch size as 4, for a 1000-record table, we still end up having 1000 records in heap memory of app server, is it correct? Fetch size only affect the number of round trip from/to db server.

My question is, how to start processing on that 4 (or any number) records immediately after it's arrival to app server, and dispose it to free up memory in app server?

Kara
  • 5,650
  • 15
  • 48
  • 55
janetsmith
  • 7,972
  • 11
  • 48
  • 72
  • Which SQL server are you using (i.e. MS, Oracle, MySql, etc.)? There are different ways of handling this issue depending upon what you're using. – Michael Todd Jul 10 '09 at 04:49
  • Please show the code that you use to retrieve the records. It's not clear if you just iterate over them, or store them locally in some collection and then iterate over that. – Pavel Minaev Jul 10 '09 at 05:07
  • I am going to use Oracle for this task. I haven't start any code yet, because I am still researching the best way of doing that :) – janetsmith Jul 10 '09 at 07:31
  • Oracle has export tools for such kind of operations btw. Is this a homework? Are you using some middle tier solution (hibernate, jpa, etc.)? – akarnokd Jul 10 '09 at 07:56

6 Answers6

6

With a little more information I can get a more helpful answer.

If you are using MySQL:

stmt = conn.createStatement(java.sql.ResultSet.TYPE_FORWARD_ONLY,
       java.sql.ResultSet.CONCUR_READ_ONLY);
stmt.setFetchSize(Integer.MIN_VALUE);

from http://www.oracle.com/technology/tech/java/sqlj_jdbc/htdocs/jdbc_faq.html:

java.util.Properties info = new java.util.Properties();
info.put ("user", "scott");
info.put ("password","tiger");
info.put ("defaultRowPrefetch","15");
getConnection ("jdbc:oracle:oci:@",info);
Clint
  • 8,717
  • 1
  • 24
  • 32
4

I think you could use the same solution as this one. A scrollable resultset.

Community
  • 1
  • 1
javamonkey79
  • 16,837
  • 35
  • 104
  • 166
  • 1
    thanks. This is also a possible solution, however with this method, only 1 record per round trip (fetch size = 1)? – janetsmith Jul 13 '09 at 00:30
4

If you are using JDBC you can use a ResultSet with a cursor which you iterate through one record at a time. You need to makes sure then that you write your XML out to a file one record at a time rather than using DOM to build the XML.

Dean Povey
  • 8,585
  • 1
  • 37
  • 49
  • 1
    Regarding XML generation, I've thought of that before. I am going to use StAX for xml output (http://blogger.ziesemer.com/2007/06/xml-generation-in-java.html) – janetsmith Jul 10 '09 at 07:34
4

One rule of thumb that I've learnt from my experience is that you NEVER bring ALL the data from the database to your application server. One thing you can do is implement a procedure to page your data.

You can bring one page of data containing around 1000-5000 records, process them, then again fetch the data for the next page.

Kirtan
  • 20,187
  • 6
  • 43
  • 60
  • 1
    I second this option. But the example limit is not good in this case. Make it large, say 1000 - 5000. Fetching again wouldn't take much time, because there would be already an execution plan available. Further, it works with all database servers. – Adeel Ansari Jul 10 '09 at 05:01
  • I think even 10,000 to 50,000 or 100,000 might work. Not sure how big the data of a row is. But 50,000 seems alright. Try it out yourself. – Adeel Ansari Jul 10 '09 at 05:04
  • @Vinegar: The limit has been edited. It all depends on how much memory is available, and what is the content of the table. But, I think 1000-5000 is better. – Kirtan Jul 10 '09 at 05:05
  • May be, but for a million record, which is quite likely, it would take 200 rounds in case you choose the limit of 5000. Whereas, limit of 50,000 will make it 20 rounds. Of course it all depends on the size and memory available, but according to my experience and the memory available these days, especially for the DB server machines, it wouldn't be a problem. – Adeel Ansari Jul 10 '09 at 05:11
  • @Vinegar: Yes, the memory available for the DB server may not be a problem, but it will also depend on the application server's memory, as the data will be fetched from the database to the app server's memory. – Kirtan Jul 10 '09 at 05:14
  • Thanks, this generally a good idea. But I have some concern about concurrency problem. I've just posted another question for this :) http://stackoverflow.com/questions/1117291/what-will-happen-during-data-retriveing-process-records-has-been-deleted – janetsmith Jul 13 '09 at 00:28
1

A concept for exporting the entire table. (Note to experts: I'm aware of its shortcomings.)

import java.io.BufferedWriter;
import java.io.File;
import java.io.FileOutputStream;
import java.io.OutputStreamWriter;
import java.io.PrintWriter;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.ResultSetMetaData;
public class FullTableExport {
    public static String toXML(String s) {
        if (s != null) {
            StringBuilder b = new StringBuilder(s.length());
            for (int i = 0, count = s.length(); i < count; i++) {
                char c = s.charAt(i);
                switch (c) {
                case '<':
                    b.append("&lt;");
                    break;
                case '>':
                    b.append("&gt;");
                    break;
                case '\'':
                    b.append("&#39;");
                    break;
                case '"':
                    b.append("&quot;");
                    break;
                case '&':
                    b.append("&amp;");
                    break;
                default:
                    b.append(c);
                }
            }
            return b.toString();
        }
        return "";
    }
    public static void main(String[] args) throws Exception {
        String table = "CUSTOMER";
        int batch = 100;

        Class.forName("oracle.jdbc.driver.OracleDriver");
        Connection conn = DriverManager.getConnection(
            "jdbc:oracle:thin:@server:orcl", "user", "pass");
        PreparedStatement pstmt = conn.prepareStatement(
            "SELECT /*+FIRST_ROWS(" + batch + ") */ * FROM " + table);
        ResultSet rs = pstmt.executeQuery();
        rs.setFetchSize(batch);
        ResultSetMetaData rsm = rs.getMetaData();
        File output = new File("result.xml");
        PrintWriter out = new PrintWriter(new BufferedWriter(
            new OutputStreamWriter(
            new FileOutputStream(output), "UTF-8")), false);
        out.printf("<?xml version='1.0' encoding='UTF-8'?>%n");
        out.printf("<table name='%s'>%n", toXML(table));
        int j = 1;
        while (rs.next()) {
            out.printf("\t<row id='%d'>%n", j++);
            for (int i = 1; i <= rsm.getColumnCount(); i++) {
                out.printf("\t\t<col name='%s'>%s</col>%n", 
                    toXML(rsm.getColumnName(i)), 
                    toXML(rs.getString(i)));
            }
            out.printf("\t</row>%n");
        }
        out.printf("</table>%n", table);
        out.flush();
    }
}

Edit The shortcomings (thanks @J.S.):

  • No external libraries used beyond the ojdbc
  • Nothing is closed
  • A generic Exception is thrown
  • It is a main method
  • Usage of print for XML generation
  • Oracle specific SQL
  • Plain text password
  • Some columns look awkward in string representation
  • UTF-8 is too international
  • XML structure footprint is large
akarnokd
  • 64,441
  • 14
  • 138
  • 177
  • When you post sample code and know the shortcommings, then you should at least mention them. As you know example code *will* show up in production systems. – Joachim Sauer Jul 10 '09 at 08:25
0

At which stage is the OOM error occurring, is it on data retrieval or processing data to XML file?

If its data retrieval, get the data in batches. Get the total number of rows first, order the selects by the primary key and limit the rows selected to chewable sizes.

If its at creating the XML file, send the XML node of each customer to System.out.println, don't hold it in memory. Launch the program via commad line and redirect all output to a file;

java MyConverter > results.txt

As you loop through the record all is saved in the file.

n002213f
  • 7,408
  • 13
  • 64
  • 100