0

My csv contains

6901257 5.010635294 Apartment   Entire home/apt {"Wireless Internet","Air conditioning",Kitchen,Heating,"Family/kid friendly",Essentials,"Hair dryer",Iron,"translation missing: en.hosting_amenity_50"}    3   1   Real Bed    strict  TRUE    NYC Beautiful, sunlit brownstone 1-bedroom in the loveliest neighborhood in Brooklyn. Blocks from the promenade and Brooklyn Bridge Park, with their stunning views of Manhattan, and from the great shopping and food. 6/18/2016   t   t       3/26/2012   f   7/18/2016   40.69652363 -73.99161685    Beautiful brownstone 1-bedroom  Brooklyn Heights    2   100 https://a0.muscache.com/im/pictures/6d7cbbf7-c034-459c-bc82-6522c957627c.jpg?aki_policy=small   11201   1   1

When I try to read this via BufferReader I get this :

6901257,5.010635294096256,Apartment,Entire home/apt,"{""Wireless Internet"",""Air conditioning"",Kitchen,Heating,""Family/kid friendly"",Essentials,""Hair dryer"",Iron,""translation missing: en.hosting_amenity_50""}",3,1.0,Real Bed,strict,True,NYC,"Beautiful, sunlit brownstone 1-bedroom in the loveliest neighborhood in Brooklyn. Blocks from the promenade and Brooklyn Bridge Park, with their stunning views of Manhattan, and from the great shopping and food.",2016-06-18,t,t,,2012-03-26,f,2016-07-18,40.696523629970756,-73.99161684624262,Beautiful brownstone 1-bedroom,Brooklyn Heights,2,100.0,https://a0.muscache.com/im/pictures/6d7cbbf7-c034-459c-bc82-6522c957627c.jpg?aki_policy=small,11201,1.0,1.0

I wanted to split it by comma, But the problem is when it goes to this line

"{""Wireless Internet"",""Air conditioning"",Kitchen,Heating,""Family/kid friendly"",Essentials,""Hair dryer"",Iron,""translation missing: en.hosting_amenity_50""}"

It even splits this line by comma which I dont want. Is there a way to overcome this?

        String line;
        fileWriter = new FileWriter("C:\\Users\\nagesingh\\IdeaProjects\\machineLearning\\src\\main\\resources\\train_new.csv");
        while ((line = trainCsv.readLine()) != null) {
            String[] tokens = line.split(",");
            for (int i = 0; i < tokens.length; i++) {
                try {
                    fileWriter.append(Double.valueOf(tokens[i]).toString());
                }catch (Exception e){
                    fileWriter.append("0");
                }
                fileWriter.append(COMMA_DELIMITER);
            }
            fileWriter.append(NEW_LINE_SEPARATOR);
        }
Nagendra Singh
  • 129
  • 1
  • 10

2 Answers2

0

Just looking at your data I strongly believe that you should, and I would have all those attributes as separate columns in your csv.

Is there any reason why you want it in that format? The only logical deduction I can make is that you want an Object? If so then you could rather put all of these attributes into an Object after reading from the file.

But if you really want to keep your current format. You could just have you csv pipe(|) delimited and split by pipe(|) when reading? This will give you all of this: "{""Wireless Internet"",""Air conditioning"",Kitchen,Heating,""Family/kid friendly"",Essentials,""Hair dryer"",Iron,""translation missing: en.hosting_amenity_50""}", as a single entry in your array.

Damiane
  • 16
  • 3
0

I used apache commons CSVParser dependency and got what I was expecting. This one was simple to use rather than writing tones of code.

<!-- https://mvnrepository.com/artifact/org.apache.commons/commons-csv -->
<dependency>
    <groupId>org.apache.commons</groupId>
    <artifactId>commons-csv</artifactId>
    <version>1.1</version>
</dependency>

        CSVParser parser =  new CSVParser(trainCsv, CSVFormat.EXCEL);
        Iterable<CSVRecord> csvRecords = parser.getRecords();
        for (CSVRecord csvRecord : csvRecords) {

            for (int i = 0; i < csvRecord.size(); i++) {
                try {
                    fileWriter.append(Double.valueOf(String.valueOf(csvRecord.get(i))).toString());
                }catch (Exception e){
                    fileWriter.append("0");
                }
                fileWriter.append(COMMA_DELIMITER);
            }
            fileWriter.append(NEW_LINE_SEPARATOR);
        }
Nagendra Singh
  • 129
  • 1
  • 10