0

I've written a class method that will take "batches" of data (each row that makes a "value" to be inserted, via SQL, to the database comes from a two-dimensional array labeled "data_values").

However, there will be instances when my program will be getting redundant data, i.e. data that might already be in the database. Because there's a primary key in the database, the program will break if it cannot upload the data because of a duplicate entry.

Is there a way to use a try/catch so that the program will continue uploading data, effectively "skipping" the duplicates? If so, how can I implement it?

Thank you in advance. If I could clarify my question, please let me know.

My current code is here:

public void insertData(ArrayList<String> data_types, String[][] data_values) {

        try{    
            c.setAutoCommit(false); 
            // creates insert statement
            String insertDataScript = "INSERT INTO "+tableName+" VALUES (";
            for(int q = 0; q < data_types.size()-1; q++) {
                insertDataScript += "?, ";
            }
            insertDataScript += "?)";

            PreparedStatement stmt = c.prepareStatement(insertDataScript);
            for (int i = 0; i < data_values.length; i++) {

                for(int j = 1; j < data_types.size()+1; j++) {

                    if(data_types.get(j-1).toLowerCase().equals("double")) {
                        stmt.setDouble(j, Double.valueOf(data_values[i][j-1]));
                    }
                    else if(data_types.get(j-1).toLowerCase().equals("string")) {
                        stmt.setString(j, data_values[i][j-1]);
                    }
                    else {
                        System.out.println("Error");
                    }
                }
                stmt.addBatch();
            }
            stmt.executeBatch();
            c.commit();
            c.setAutoCommit(true);
            stmt.close();
        } 
        catch ( Exception e ) {
            System.err.println( e.getClass().getName() + ": " + e.getMessage() );
            System.exit(0);
        }
    }
daOnlyBG
  • 574
  • 3
  • 15
  • 43
  • 1
    I suggest to put the logic onto the database. According to [this answer](https://stackoverflow.com/a/4330694/7525132), it might be as easy as just adding `OR REPLACE` to your SQL statement - but as I never used sqlite I cannot confirm. – Izruo May 06 '19 at 23:48
  • Do you create the primary key yourself or it is handled by the database? If yes is it auto-incremented? – user2987773 May 06 '19 at 23:39
  • I create the primary key myself, and will update the post accordingly. I don't believe it's auto-incremented, as I haven't explicitly created the table as such. – daOnlyBG May 07 '19 at 01:29

2 Answers2

1

My first suggestion would be to deduplicate the data before inserting it into the db. (Edit: totally missed the "already in the db" part, so this probably won't work unless you want to do a query before every insert. Maybe you can use an INSERT IGNORE?)

If you cannot do this because you do not have control over the primary key or there is no way to ignore duplicates in the insert, then there are ways to catch specific exception types and continue the program instead of calling System.exit. In order to do that you would probably need to have smaller prepared statements and put the try/catch inside the for loop over 'data_values`.

Here is a post talking about catching this type of exception: Catch duplicate key insert exception.

  • 1
    Thanks Lilith. I accepted your answer not only because the `INSERT OR IGNORE INTO` is the correct solution, but because you referenced that exception handling link, which is sort of what I had in mind. – daOnlyBG May 07 '19 at 03:43
1

INSERT OR IGNORE

Simply change (albeit it not really exception handling, but rather exception bypassing)

String insertDataScript = "INSERT INTO "+tableName+" VALUES (";

to

String insertDataScript = "INSERT OR IGNORE INTO "+tableName+" VALUES (";

Consider the following demo (equivalent to suggested and then what you currently have) :-

  • rowid has been used for convenience as it's basically a build in primary key.

  • the only reason why the columns have been specified i.e.(rowid,othercolumn,mydatecolumn) is that rowid is normally hidden. In your case just VALUES (without the preceding columns) will expect values for all columns and thus include the defined primary key column(s).

  • shown/actioned in reverse order as both can then run together

:-

INSERT OR IGNORE INTO mytable (rowid,othercolumn,mydatecolumn) -- rowid is a PRIMARY KEY as such
    VALUES
        (10,'x','x'),
        (11,'x','x'),
        (12,'x','x'),
        (13,'x','x'),
        (14,'x','x'),
        (10,'x','x')
;

INSERT INTO mytable (rowid,othercolumn,mydatecolumn) -- rowid is a PRIMARY KEY as such
    VALUES
        (20,'x','x'),
        (21,'x','x'),
        (22,'x','x'),
        (23,'x','x'),
        (24,'x','x'),
        (20,'x','x')
;

results in :-

INSERT OR IGNORE INTO mytable (rowid,othercolumn,mydatecolumn) -- rowid is a PRIMARY KEY as such
    VALUES
        (10,'x','x'),
        (11,'x','x'),
        (12,'x','x'),
        (13,'x','x'),
        (14,'x','x'),
        (10,'x','x')
> Affected rows: 5
> Time: 0.208s

i.e. 5 of the 6 were added the 6th a duplicate (according to the primary key) was skipped.

INSERT INTO mytable (rowid,othercolumn,mydatecolumn) -- rowid is a PRIMARY KEY as such
    VALUES
        (20,'x','x'),
        (21,'x','x'),
        (22,'x','x'),
        (23,'x','x'),
        (24,'x','x'),
        (20,'x','x')
> UNIQUE constraint failed: mytable.rowid
> Time: 0.006s

i.e. none are inserted due to 1 duplicate.

INSERT OR REPLACE (may be useful)

If you wanted the data from the duplicates to be applied then instead of INSERT OR IGNORE, you could use INSERT OR REPLACE.

e.g. the following (run after the above i.e. all are duplicates bit with different data):-

INSERT OR REPLACE INTO mytable (rowid,othercolumn,mydatecolumn) -- rowid is a PRIMARY KEY as such
    VALUES
        (10,'xx','x'),
        (11,'x','xx'),
        (12,'aa','x'),
        (13,'x','aa'),
        (14,'x','bb'),
        (10,'cc','x')
;

then you get :-

INSERT OR REPLACE INTO mytable (rowid,othercolumn,mydatecolumn) -- rowid is a PRIMARY KEY as such
    VALUES
        (10,'xx','x'),
        (11,'x','xx'),
        (12,'aa','x'),
        (13,'x','aa'),
        (14,'x','bb'),
        (10,'cc','x')
> Affected rows: 6
> Time: 0.543s

i.e. now all 6 INSERTs are actioned (5 rows updated as the 1st and last update the same row twice).

Community
  • 1
  • 1
MikeT
  • 32,107
  • 13
  • 38
  • 54
  • 1
    Much thanks for your detailed response. I ended up going with the `insert or ignore into` solution. – daOnlyBG May 07 '19 at 03:44