0

I have a long list of words and I am trying to print the nouns only to the output. The method in which I am trying to do this is :

IF THE WORD IS A PROPER NOUN , JUST PRINT IT {
 // THIS IS DONE USING REGULAR EXPRESSION
}
ELSE{
 // GO TO ONLINE DICTIONARY http://www.thefreedictionary.com/WORD AND SEE IF CHECK IF THE WORD IS NOUN BY HAVING ANOTHER REGULAR EXPRESSION CHECKING FOR THE NOUN IN THE SOURCE CODE OF THIS PAGE
}

I have unit tested the else part and it is working fine for individual words. Without the else part, the program is printing 4000+ words but when i integrated the else part, the program is only printing around 80 words, which is wrong.

Can someone point out what the problem could be? Is there some parallel way of processing these requests for many words?

rkt
  • 1,131
  • 2
  • 8
  • 18

4 Answers4

3

Can someone point out what the problem could be?

I assume that's because a HTTP request to the dictionary website takes it's time.

Is there some parallel way of processing these requests for many words?

You could build a list of the none matching words and then process it later / in parallel. But that's not trivial. You could start with sending multiple HTTP requests at once with the curl library or another multi request component.

Additionally instead of sending requests to a website that can only answer one word at a time, you could ask a database that has many and that you can put on the system like it is suggested here Extracting nouns from a long list of words .

Community
  • 1
  • 1
hakre
  • 178,314
  • 47
  • 389
  • 754
0

Why don't you use a dictionary file like /usr/share/dict/words?

Nick ODell
  • 5,641
  • 1
  • 23
  • 47
0

Making thousands of requests to a server for each unit test, not to mention the live environment itself, might very well get you banned.

Try doing this some other way, such as using a static dictionary. It's faster, more efficient, and risk free.

rid
  • 54,159
  • 26
  • 138
  • 178
0

Of course the if branch with just a bit regular expressions is a lot faster than network requests. So I think there is no "problem", it is just slow.

There are native ways to get parallel in PHP, but that is not so easy. See http://de.php.net/manual/en/ref.pcntl.php

Another option would be to use some exec functions to call sub-scripts for every network request and don't wait for the response in the main script. See Is there a way to use shell_exec without waiting for the command to complete?

Community
  • 1
  • 1
flori
  • 10,662
  • 3
  • 45
  • 54