54

First, I checked this question but the answer refers to an obsolete service.

So is there a web-based (or software, I don't care) that provide searching internet content with regular expression?

Community
  • 1
  • 1
ilyes kooli
  • 11,448
  • 13
  • 47
  • 78
  • I believe you'd get a more succinct answer if you were to provide [more details](http://whathaveyoutried.com/) around what you are trying to accomplish. – Wil Moore III Jun 20 '12 at 22:50
  • 20
    I am trying to get results based on regular expression, exactly like my question title says! – ilyes kooli Jun 21 '12 at 00:26
  • Google Search is able to find matches of some simple regular expressions. See [this answer](http://webapps.stackexchange.com/questions/19673/is-there-a-way-to-search-in-google-using-regular-expressions-regex/82769#82769) for an example of regular expression searching. – Anderson Green Aug 25 '15 at 17:28
  • 1
    http://webapps.stackexchange.com/questions/19673/is-there-a-way-to-search-in-google-using-regular-expressions-regex – Ciro Santilli新疆棉花TRUMP BAN BAD May 26 '16 at 09:10

5 Answers5

20

Let me write here an answer from the superuser.com question due to my complete solidarity with the author:

quote from the Ask Metafilter:

The only possible way to make keyword searching efficient over hundreds of terabytes (or whatever their index is up to these days) is to precompute an index of words.

In fact a full regex engine is turing-complete, and you can write arbitrary regexps that will gobble up near infinite amounts of CPU time and memory. For all these reasons it would be technical insanity for them to offer regex searching to the general public.

Update: as it rightfully pointed out, regexp is not Turing Complete. Stay tuned for the more detailed answer:

TBD...

gahcep
  • 4,368
  • 3
  • 31
  • 56
  • Wait, If you mean to create a small web service, than you are right (in some case such service wold be very useful). But if we are talking about an implementation of a kind of full-featured web crowler, than I think it is insanity (well, it is possible, but very time and MIPS consumable). – gahcep Jun 20 '12 at 12:20
  • 12
    So timeout queries that take too long, it doesn't have to be insanity. – Jim W says reinstate Monica Oct 17 '14 at 18:11
  • 2
    @MikeBantegui Eh? There are plenty of services that evaluate expressions in a turing-complete language. If it takes too long, it gives up. – Navin Nov 02 '14 at 10:14
  • @MikeBantegui Are we talking cents-per-search, or national-GDPs-per-search for the CPU time required? "Enterprise search" subscriptions seems like a plausible proposition. – Zaaier Jan 22 '15 at 20:36
  • @Zaaier: It's quite possible to write some regex expressions on relatively small input sizes that can take exponential time to evaluate. So if you don't constrain the search to a limited amount of time or memory, it could use up all the computing resources available. In terms of dollars spent per search, it would all depend on how you valuate a potential denial of search due to a regex search consuming *all* available resources. See [ReDoS](http://en.wikipedia.org/wiki/ReDoS) on Wikipedia. – Mike Bailey Jan 23 '15 at 13:10
  • @Zaaier: To follow up on the cost/search, if you were Google and a search was responsible for maxing out a significant portion of the search backend, it could mean thousands of dollars of revenue lost per second due to lost ad revenue. So, assuming just "centers per search" or "GDP per search", it's more towards the "GDP per search" side of things. – Mike Bailey Jan 23 '15 at 13:14
  • 2
    A hybrid version would be nice: First the engine searches for x results the old fashioned way and afterwards it filters based on a regex. A smart interface might be able to convert the regex to a google query first. – Pieter De Bie Jan 28 '15 at 12:25
  • 4
    regex turing complete?! regex can express regular languages that is a strict subset of all the languages accepted by turing machines... see https://en.wikipedia.org/wiki/Chomsky_hierarchy – jakubdaniel Nov 11 '16 at 14:51
  • Rightful comment. Thx for vociferating it. I will provide a valid update. – gahcep Oct 30 '18 at 14:38
  • That's pretty much like saying "you can drive too fast so the government can't allow cars to be available to the public" – Bojidar Stanchev Jun 23 '20 at 08:56
  • Still waiting for the more detailed answer... – pacoverflow Mar 28 '21 at 22:40
2

There isn't an instant search by regex engine. This is likely due to how pages are indexed. Allowing one to grep the web would take a lot of computational power.

Will Hayworth
  • 1,258
  • 9
  • 22
dayyan
  • 381
  • 3
  • 6
2

dayyan is correct, it's reverse indexes which make search engines fast; there's no way to accelerate regex search over a petabyte of content if you only have 100 terabytes of flash disk. Keyword searches, reverse index, no problem.

blekko's web grep (https://blekko.com/ws/+/webgrep) supports regexes, but most of the searches we get for it are for constant strings, usually which are in the HTML, because that's what's interesting: who uses microformats? who uses various javascript libraries? who uses various comment systems? And so forth.

If you sent us a regex, we'd be happy to run it for you.

Running these searches consists of a MapReduce job run over all the html in our crawl. That's why it takes a while (a day or two) to get an answer.

Greg Lindahl
  • 388
  • 3
  • 13
0

Although you are unlikely to find a site that uses full regular expression search, google does have some ability to do matching. Depending what you're trying to achieve this might be enough.

GoogleGuide appears to be fairly in depth with some of the options available. Perhaps if you give an example of the kind of query you want to search for, we can find a solution?

  • 1
    I checked this, but is pretty poor, very poor actually! I cannot run any *simple* (simple compared to what I can do with regular expressions) search, like \paul*\ (googling paul* is **way** different than \paul*\\) or \paul{3}\ and many other cases.. – ilyes kooli Jun 20 '12 at 12:26
  • This is also pretty interesting for power searching http://www.johntedesco.net/blog/2012/06/21/how-to-solve-impossible-problems-daniel-russells-awesome-google-search-techniques/ – MutterMumble Jun 26 '12 at 15:22
0

If it NEEDS to be regex, then I think you're screwed. But, if you're just looking for more search power, http://www.googleguide.com/advanced_operators_reference.html