0

Actually I had a .rtf file and from that I was trying to create a csv file. While searching I saw that I have convert it into plain text and then to csv file. But right now I am kind of stuck with logic. I am not getting idea what to apply to move forward.

I have below data which I want to convert to csv.

Input :

Search Target Redmond40_MAS  Log Written 01/18/2013 9:13:19 Number of attempts 1
Search Target Redmond41_MAS  Log Written 01/19/2013 9:15:16 Number of attempts 0

Output :

Search Target,Log Written,Number of attempts
Redmond40_MAS,01/18/2013 9:13:19,1
Redmond41_MAS,01/19/2013 9:15:16,0

If there was any delimiter then I would have done it but in this case I know are the "keys" i.e. header values but not getting the idea how to extract corresponding contents.

Any suggestion will help.

import java.io.*;
import javax.swing.text.BadLocationException;
import javax.swing.text.Document;
import javax.swing.text.rtf.RTFEditorKit;

public class Rtf2Csv {

    public static void main(String[] args) {
        RTFEditorKit rtf = new RTFEditorKit();
        Document document = rtf.createDefaultDocument();
        try {
            FileInputStream fi = new FileInputStream("test.rtf");
            rtf.read(fi, document, 0);
        } catch (FileNotFoundException e) {
            System.out.println("File not found");
        } catch (IOException e) {
            System.out.println("I/O error");
        } catch (BadLocationException e) {
        }
        String output = "Search Target,Log Written,Number of attempts";
        try {
            String text = document.getText(0, document.getLength());
            text = text.replace('\n', ' ').trim();
            String[] textHeaders = text
                    .split("===================================================================================");

            String[] header = { "Search Target", "Log Written",
                    "Number of attempts"};
            System.out.println(textHeaders.length);
            int headLen = header.length;
            int textLen = textHeaders.length;
            for (int i = 0; i < textLen; i++) {
                String finalString = "";
                String partString = textHeaders[i];
                for (int j = 0; j < headLen; j++) {
                    int len = header[j].length();
                    if (j + 1 < header.length)
                        finalString += partString.substring(
                                partString.indexOf(header[j]) + len,
                                partString.indexOf(header[j + 1])).trim()
                                + ",";
                    else
                        finalString += partString.substring(
                                partString.indexOf(header[j]) + len).trim();
                }
                output += "\n" + finalString;
            }
        } catch (BadLocationException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
        try {
            FileWriter writer = new FileWriter("output.csv");
            writer.append(output);
            writer.flush();
            writer.close();
        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }

    }

}

I have written this code. Is there any better way to improve it?

Ankur
  • 191
  • 1
  • 3
  • 13

4 Answers4

0

I would suggest using Scanner or StringTokenizer. There is an in depth explanation here:

Scanner vs. StringTokenizer vs. String.Split

Something like this should do it:

StringTokenizer s = new StringTokenizer("Search Target Redmond40_MAS  Log Written 01/18/2013 9:13:19 Number of attempts 1"
);

String out = new String();

while (s.hasMoreTokens()) {
   out =  s.nextToken() + "," + out ;
}
Community
  • 1
  • 1
phcoding
  • 598
  • 3
  • 16
  • In StringTokenizer or String.split, I would need some delimiter. But in my case I have list of the "headers". As per your suggestion it will take space as delimiter and also the order won't be same as required. – Ankur Feb 09 '13 at 21:56
0

If the columns you are interested in are of a fixed width, you may open the txt file in Excel and place column dividers where desired.

It would be simple to export from Excel as a csv.

rajah9
  • 9,513
  • 3
  • 39
  • 50
0

If you are sure it is fixed width, then just calculate the length of the fields. Otherwise, I would recommend writing a simple parser. You might get lucky with the correct regular expression, but from my experience this can be a lot of trail and error.

It should not be too hard to parse it...

RobAu
  • 17,042
  • 8
  • 69
  • 108
  • Its giving me correct answer but I am not sure for big files. – Ankur Feb 09 '13 at 23:41
  • For big files you should not read the entire document in memory (unless it fits) and make your code streaming, by processing each line, then writing it to your file. For CSV I recommend the SuperCSV lib instead of coding your own. – RobAu Feb 09 '13 at 23:45
  • Its a rtf file so I would need to process that into plain text first. How do I read rtf file line by line? – Ankur Feb 10 '13 at 10:34
  • No idea :) Mayby apache-poi can do it? Or you could code up your own or google for it. The RTF format is always under development, see http://en.wikipedia.org/wiki/Rich_Text_Format. – RobAu Feb 11 '13 at 17:07
  • RTKEditor kit can read a certain amount instead of all too.. http://docs.oracle.com/javase/6/docs/api/javax/swing/text/rtf/RTFEditorKit.html#read%28java.io.Reader,%20javax.swing.text.Document,%20int%29 – RobAu Feb 11 '13 at 17:10
  • I used the same but it read the whole file at once and returns plain text. – Ankur Feb 11 '13 at 18:39
0

If you want to read it in line by line you can use something like this:

public int countLines(File inFile)
{
   int count = 0;
   Scanner fileScanner = new Scanner(inFile);

   while(fileScanner.hasNextLine()) //if you are trying to count lines
   {                                //you should use hasNextLine()
       fileScanner.nextLine() //advance the inputstream
       count++;
   }

   return count;
}

Does this answer your question?

phcoding
  • 598
  • 3
  • 16