Good morning to the community, I have a query you happen to have to import 14 million records containing the information of clients of a company.
Flat File. Txt weighs 2.8 GB, I have developed a java program that reads the flat file line by line, deal the information and put it in an object that in turn inserted into a table in the PostgreSQL database, the subject is that I have made a calculation that 100000 records inserted in a time of 112 minutes, but the issue is that I insert parts.
public static void main(String[] args) {
// PROCESSING 100,000 records in 112 minutes
// PROCESSING 1,000,000 records in 770 minutes = 18.66 hours
loadData(0L, 0L, 100000L);
}
/**
* Load the number of records Depending on the input parameters.
* @param counterInitial - Initial counter, type long.
* @param loadInitial - Initial load, type long.
* @param loadLimit - Load limit, type long.
*/
private static void loadData(long counterInitial, long loadInitial, long loadLimit){
Session session = HibernateUtil.getSessionFactory().openSession();
try{
FileInputStream fstream = new FileInputStream("C:\\sppadron.txt");
DataInputStream entrada = new DataInputStream(fstream);
BufferedReader buffer = new BufferedReader(new InputStreamReader(entrada));
String strLinea;
while ((strLinea = buffer.readLine()) != null){
if(counterInitial > loadInitial){
if(counterInitial > loadLimit){
break;
}
Sppadron spadron= new Sppadron();
spadron.setSpId(counterInitial);
spadron.setSpNle(strLinea.substring(0, 9).trim());
spadron.setSpLib(strLinea.substring(9, 16).trim());
spadron.setSpDep(strLinea.substring(16, 19).trim());
spadron.setSpPrv(strLinea.substring(19, 22).trim());
spadron.setSpDst(strLinea.substring(22, 25).trim());
spadron.setSpApp(strLinea.substring(25, 66).trim());
spadron.setSpApm(strLinea.substring(66, 107).trim());
spadron.setSpNom(strLinea.substring(107, 143).trim());
String cadenaGriSecDoc = strLinea.substring(143, strLinea.length()).trim();
String[] tokensVal = cadenaGriSecDoc.split("\\s+");
if(tokensVal.length == 5){
spadron.setSpNac(tokensVal[0]);
spadron.setSpSex(tokensVal[1]);
spadron.setSpGri(tokensVal[2]);
spadron.setSpSec(tokensVal[3]);
spadron.setSpDoc(tokensVal[4]);
}else{
spadron.setSpNac(tokensVal[0]);
spadron.setSpSex(tokensVal[1]);
spadron.setSpGri(tokensVal[2]);
spadron.setSpSec(null);
spadron.setSpDoc(tokensVal[3]);
}
try{
session.getTransaction().begin();
session.save(spadron); // Insert
session.getTransaction().commit();
} catch (Exception e) {
session.getTransaction().rollback();
e.printStackTrace();
}
}
counterInitial++;
}
entrada.close();
} catch (Exception e) {
e.printStackTrace();
}finally{
session.close();
}
}
The main issue is if they check my code when I insert the first million records, the parameters would be as follows: loadData (0L, 0L, 1000000L);
The issue is that when you insert the following records in this case would be the next million records would be: loadData (0L, 1000000L, 2000000L);
What will cause it to scroll back the first 100 billion of records, and then when the counter is in the value 1000001 recently will begin insert following records, someone can give me a more optimal suggestion to insert the records, knowing that it is necessary treat information as seen in previous code shown.