10

Write a java program to read input from a file, and then sort the characters within each word. Once you have done that, sort all the resulting words in ascending order and finally followed by the sum of numeric values in the file.

  • Remove the special characters and stop words while processing the data
  • Measure the time taken to execute the code

Lets Say the content of file is: Sachin Tendulkar scored 18111 ODI runs and 14692 Test runs.

Output:achins adeklnrtu adn cdeors dio estt nrsu nrsu 32803

Time Taken: 3 milliseconds

My Code takes 15milliseconds to execute.....

please suggest me any fast way to solve this problem...........

Code:

import java.io.BufferedReader;
import java.io.FileReader;
import java.util.*;

public class Sorting {

    public static void main(String[] ags)throws Exception
    {
        long st=System.currentTimeMillis();
        int v=0;
        List ls=new ArrayList();
        //To read data from file
        BufferedReader in=new BufferedReader(
                 new FileReader("D:\\Bhive\\File.txt"));
        String read=in.readLine().toLowerCase();
        //Spliting the string based on spaces
        String[] sp=read.replaceAll("\\.","").split(" ");
        for(int i=0;i<sp.length;i++)
        {
            //Check for the array if it matches number
            if(sp[i].matches("(\\d+)"))
                //Adding the numbers
                v+=Integer.parseInt(sp[i]);
            else
            {
                //sorting the characters
                char[] c=sp[i].toCharArray();
                Arrays.sort(c);
                String r=new String(c);
                //Adding the resulting word into list
                ls.add(r);
            }
        }
        //Sorting the resulting words in ascending order
        Collections.sort(ls);
        //Appending the number in the end of the list
        ls.add(v);
        //Displaying the string using Iteartor
        Iterator it=ls.iterator();
        while(it.hasNext())
            System.out.print(it.next()+" ");
        long time=System.currentTimeMillis()-st;
        System.out.println("\n Time Taken:"+time);
    }
}
dantuch
  • 8,697
  • 4
  • 42
  • 65
Kishore Karunakaran
  • 578
  • 1
  • 6
  • 16
  • when I run the above code in my PC it takes only 2 ms.achins adeklnrtu adn cdeors dio estt nrsu nrsu 32803 Time Taken:2 – UVM Jun 21 '12 at 04:53
  • 2
    Does your file contain only one line? – MoraRockey Jun 21 '12 at 05:03
  • 1
    Create the List after the splitting is done. At that point, you know the size and can provide the capacity. Perhaps instead of calling System.out.print every time, you could create the resulting string in memory (using StringBuilder) or creating a BufferedWriter first. But for your small input I am not sure all this would be worth the effort... – Axel Jun 21 '12 at 06:25
  • you can try without using arraylist, directly append `new String(c) + " "; ` to String variable. – swapy Jun 21 '12 at 05:16
  • whats wrong in this? there is unnecessary iteration process in your code.. – swapy Jun 21 '12 at 06:46

5 Answers5

5

Use indexOf() to extract words from your string instead of split(" "). It improves performance.

See this thread: Performance of StringTokenizer class vs. split method in Java

Also, try to increase the size of the output, copy-paste the line Sachin Tendulkar scored 18111 ODI runs and 14692 Test runs. 50,000 times in the text file and measure the performance. That way, you will be able to see considerable time difference when you try different optimizations.

EDIT

Tested this code (used .indexOf())

        long st = System.currentTimeMillis();
        int v = 0;
        List ls = new ArrayList();
        // To read data from file
        BufferedReader in = new BufferedReader(new FileReader("D:\\File.txt"));
        String read = in.readLine().toLowerCase();
        read.replaceAll("\\.", "");
        int pos = 0, end;
        while ((end = read.indexOf(' ', pos)) >= 0) {
            String curString = read.substring(pos,end);
            pos = end + 1;
        // Check for the array if it matches number
            try {
                // Adding the numbers
                v += Integer.parseInt(curString);
            }
            catch (NumberFormatException e) {
                // sorting the characters
                char[] c = curString.toCharArray();
                Arrays.sort(c);
                String r = new String(c);
                // Adding the resulting word into TreeSet
                ls.add(r);
            }
        }
        //sorting the list
        Collections.sort(ls);
        //adding the number
        list.add(v);
        // Displaying the string using Iteartor 
        Iterator<String> it = ls.iterator();
        while (it.hasNext()) {
            System.out.print(it.next() + " ");
        }
        long time = System.currentTimeMillis() - st;
        System.out.println("\n Time Taken: " + time + " ms");

Performance using 1 line in file
Your code: 3 ms
My code: 2 ms

Performance using 50K lines in file
Your code: 45 ms
My code: 32 ms

As you see, the difference is significant when the input size increases. Please test it on your machine and share results.

Community
  • 1
  • 1
Dhwaneet Bhatt
  • 601
  • 1
  • 9
  • 18
  • 1
    The advice to go for a tree set rather than a sort on an arraylist is ludicrous. – nes1983 Jun 21 '12 at 08:41
  • Every time its trying to parse a string and I see that's costly. IMO, its better to check curString.charAtIndex[0]>65 in order to differentiate string from a number. – sgowd Jun 21 '12 at 09:26
  • @sans481 Did you check or are you just talking out of your ass? parseInt should fail very fast for words, though not as fast as your test. See http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/lang/Integer.java#Integer.parseInt%28java.lang.String%2Cint%29 – nes1983 Jun 21 '12 at 09:43
  • @nes1983: Agreed, I based my results on duplicating the input string 50K times and running the test. But in that case, tree set does not add the additional duplicate elements. I am not sure how I made that blunder. This can be done only if the elements are unique and even in that case ArrayList would perform faster even after having an sort overhead. Thanks for the tip. Edited. – Dhwaneet Bhatt Jun 21 '12 at 12:32
  • @nes1983 so which you think is better? parseInt? If so, why? – sgowd Jun 21 '12 at 12:49
  • @sans481 They have different trade-offs. If you use parseInt, you rely on fast exception throwing and catching, which is the case in modern JVMs, but still feels a bit funny. Your test is really fast, but of course allows for false positives. In real code, I'd go for a regex or parseInt, not for something that accepts false positives. – nes1983 Jun 21 '12 at 14:02
  • Which Java are you using? In Java 7b147, split just calls indexOf for this input. See http://www.grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/7-b147/java/lang/String.java#String.split%28java.lang.String%2Cint%29 – nes1983 Jun 21 '12 at 22:34
  • @nes1983: I use Java 6u29. I havent switched to Java 7 becuase I am working on GAE which supports only Java 5 an 6 (according to their website, havent yet verified it) – Dhwaneet Bhatt Jun 22 '12 at 04:15
  • Yes, I agree that in real code, we shouldn't use that logic. Just to save time here, I was thinking of that. – sgowd Jun 22 '12 at 04:43
3

The only thing I see: the following line is needlessly expensive:

   System.out.print(it.next()+" ");

That's because print is inefficient, due to all the flushing going on. Instead, construct the entire string using a string builder, and then reduce to one call of print.

nes1983
  • 14,012
  • 4
  • 42
  • 63
1

I removed the list and read it using Arrays only, In my machine the code to 6 msec with your code, by using Arrays only it taking 4 to 5 msec. Run this code in your machine and let me know the time.

import java.io.BufferedReader;

import java.io.FileReader;

import java.util.*;

public class Sorting {
public static void main(String[] ags)throws Exception
{
    long st=System.currentTimeMillis();
    int v=0;
    //To read data from file
    BufferedReader in=new BufferedReader(new FileReader("File.txt"));
    String read=in.readLine().toLowerCase();
    //Spliting the string based on spaces
    String[] sp=read.replaceAll("\\.","").split(" ");
    int j=0;
    for(int i=0;i<sp.length;i++)
    {
        //Check for the array if it matches number
        if(sp[i].matches("(\\d+)"))
            //Adding the numbers
            v+=Integer.parseInt(sp[i]);
        else
        {
            //sorting the characters
            char[] c=sp[i].toCharArray();
            Arrays.sort(c);
            read=new String(c);
            sp[j]= read;
            j++;
        }
    }
    //Sorting the resulting words in ascending order
    Arrays.sort(sp);
    //Appending the number in the end of the list
    //Displaying the string using Iteartor
    for(int i=0;i<j; i++)
        System.out.print(sp[i]+" ");
        System.out.print(v);
    st=System.currentTimeMillis()-st;
    System.out.println("\n Time Taken:"+st);
}

}
Theja
  • 714
  • 1
  • 7
  • 24
  • 1
    Bah. I wouldn't trust any recruiter that preferred the array over the array list. I can see how it's faster, but some things you just don't do for performance :). And the gain is probably really tiny, perhaps non-existent after Hotspot is done optimizing. – nes1983 Jun 21 '12 at 08:43
1

I ran the same code using a PriorityQueue instead of a List. Also, as nes1983 suggested, building the output string first, instead of printing every word individually helps reduce the runtime.

My runtime after these modifications was definitely reduced.

0

I have modified the code like this further by including @Teja logic as well and resulted in 1 millisecond from 2 millisescond:

long st=System.currentTimeMillis();
     BufferedReader in=new BufferedReader(new InputStreamReader(new FileInputStream("D:\\Bhive\\File.txt")));
     String read= in.readLine().toLowerCase();
     String[] sp=read.replaceAll("\\.","").split(" ");
     int v=0;
     int len = sp.length;
     int j=0;
     for(int i=0;i<len;i++)
     {
            if(isNum(sp[i]))
             v+=Integer.parseInt(sp[i]);
             else
            {
              char[] c=sp[i].toCharArray();
              Arrays.sort(c);
              String r=new String(c);
              sp[j] = r;
              j++;
             }
      }
        Arrays.sort(sp, 0, len);
        long time=System.currentTimeMillis()-st;
        System.out.println("\n Time Taken:"+time);
        for(int i=0;i<j; i++)
        System.out.print(sp[i]+" ");
        System.out.print(v); 

Wrote small utility to perform for checking a string contains number instead of regular expression:

private static boolean isNum(String cs){
     char [] s = cs.toCharArray();
     for(char c : s)
     {
      if(Character.isDigit(c))
       {
         return true;
       }
     }
     return false;
 }

Calcluate time before calling System.out operation as this one is blocking operation.

UVM
  • 9,526
  • 5
  • 39
  • 63
  • I think System.out is part of the process...well it should not be ideally, but you should not assume until explicitly stated. – Dhwaneet Bhatt Jun 21 '12 at 16:04
  • Pls explain what part of the process. I think System.out is also wanted to be considered as part of the performance.Is that what you mean? However, if there is a System.out. it is a blocking operation.So to bench mark , you cannot include System.out. – UVM Jun 22 '12 at 04:06