1

After verifing this post in stackoverflow I am using indexOf() method to parse values from file. Below is the format of my file-

10/05/2005 10:02;AM;a@xyz.com;student=student1 std=X marks=87 rollnumber=102
10/05/2005 10:05;AM;b@xyz.com;student=student2 std=IX rollnumber=26
10/05/2005 10:15;PM;c@xyz.com;student=student3 std=VII marks=87 attandance=5 rollnumber=12
10/05/2005 10:32;AM;d@xyz.com;student=student4 std=V marks=87 rollnumber=69

Note:The domain name i.e. xyz.com in email is not going to be changed in anywhere.
Below is the code snippet i am using currently-

            FileInputStream fis = new FileInputStream(file);
            BufferedReader br = new BufferedReader(new InputStreamReader(
                    fis));

            String line = "";
            while ((line = br.readLine()) != null) {

                int index = -1;

                if ((index = line.indexOf("xyz.com")) != -1) {
                    int inStudent = line.indexOf("student=", index);
                    int spaceExistsinStudent = -1;
                    int studentIndex = -1;

                    if ((spaceExistsinStudent = line.indexOf("student=\"", inStudent)) != -1)
                        studentIndex = line.indexOf(" ", inStudent);
                    else
                        studentIndex = line.indexOf("\" ", spaceExistsinStudent);

                    int inSTD = line.indexOf("std=", studentIndex);
                    int spaceExistsinSTD = -1;
                    int stdIndex = -1;

                    if ((spaceExistsinSTD = line.indexOf("std=\"", inSTD)) != -1)
                        stdIndex = line.indexOf(" ", inSTD);
                    else
                        stdIndex = line.indexOf("\" ", spaceExistsinSTD);

                    String studentName = line.substring(inStudent + 9, studentIndex);
                    String stdName = line.substring(inSTD + 4, stdIndex);

There is no need to paste the entire code.
Well, using the above implementation, i am able to work, but is this effective solution as performace is considered? Any better way for achiveing the same....
Thank you in advance.

Community
  • 1
  • 1
Ravi Joshi
  • 5,124
  • 15
  • 58
  • 123
  • 1
    Even if the domain name isn't going to change in your approximation of the future, you should write your code to not embed it in a fixed string in the middle of your logic. At least make it a member variable, so it can be changed without tons of wasted effort in the future. Or better yet, make it a parameter to the method, and fake the configuration subsystem until later (when you do need it). – Edwin Buck Sep 10 '12 at 16:08
  • Yeah, i will keep it in mind... – Ravi Joshi Sep 10 '12 at 16:56

3 Answers3

2

Instead of indexOf(), i would suggest StringTokenizer. basically you can split you String based on some separator (eg: ;)..

Example inside your while loop

        StringTokenizer st = new StringTokenizer(line,";");


        st.nextToken(); //Date
        st.nextToken(); //AM
        String email = st.nextToken();
        String values = st.nextToken();

        StringTokenizer st2 = new StringTokenizer(values," ");


        while (st2.hasMoreElements()) {
            String token = (String) st2.nextElement();
            if(token.startsWith("student=")){
                System.out.println(token.substring("student=".length()));
            }else if(token.startsWith("std=")){
                System.out.println(token.substring("std=".length()));
            }

        }
bhatanant2
  • 576
  • 4
  • 9
  • You seems right. However currently i am only worried about the performance. What do you think about your implementation in terms of execution time? – Ravi Joshi Sep 10 '12 at 17:07
  • The bottleneck on performance will almost certainly not be the parsing code - it will be the disk IO. But yes, StringTokenizer is fast, and this implementation should in theory be faster than using indexOf, as it only scans the input line once. – Joe K Sep 10 '12 at 18:01
  • @JoeK: Let me run both the version in my machine and trigger the execution time – Ravi Joshi Sep 10 '12 at 18:14
  • @JoeK: Sadly, `StringTokenizer` is taking 1.5 times more time rather than `indexOf`. The reason why i go for `indeOf` is just because of it's better performance compare to `StringTokenizer` as i said in my question earlier. – Ravi Joshi Sep 10 '12 at 18:39
1

You don't need to use indexOf for everything. If you want to look at one character, you can use charAt() e.g. where you are checking for a '"'

I would use a method which extracts the value for a field to simplify the code.

Peter Lawrey
  • 498,481
  • 72
  • 700
  • 1,075
1

As I stated in an earlier comment, I am surprised that the parsing is the bottleneck here. But if you wish to know other ways that you could do this, and just try them out and see which is the fastest, here are two more ideas that haven't been posted- using .split:

String[] arr1 = line.split(";");
String dateTime = arr1[0];
String ampm = arr1[1];
String email = arr1[2];
String[] arr2 = arr1[3].split(" ");
String student, std, marks, rollnumber;
student = std = marks = rollnumber = null;
for (String str : arr2) {
    String value = str.substring(str.indexOf("=") + 1);
    switch(str.charAt(2)) {
    case 'u': student = value; break;
    case 'd': std = value; break;
    case 'r': marks = value; break;
    case 'l': rollnumber = value; break;
    }
}

Or using a regex:

private static final Pattern PATTERN = Pattern.compile("([^;]+);([^;]+);([^;]+);student=([^ ]+) std=([^ ]+) marks=([^ ]+) rollnumber=([^ ]+)");

Matcher m = PATTERN.matcher(line);
m.find();
String dateTime = m.group(1);
String ampm = m.group(2);
String email = m.group(3);
String student = m.group(4);
String std = m.group(5);
String marks = m.group(6);
String rollnumber = m.group(7);
Joe K
  • 17,254
  • 1
  • 31
  • 54
  • [Check this..](http://stackoverflow.com/questions/5965767/performance-of-stringtokenizer-class-vs-split-method-in-java) I referred this link before starting my dev. process. – Ravi Joshi Sep 10 '12 at 20:00