Questions tagged [search-engine]

A search engine is program that searches documents for specified keywords and returns a list of the documents where the keywords were found.

A search engine is a program that searches documents for specified keywords and returns a list of the documents where the keywords were found.

Although search engine is really a general class of programs, the term is often used to specifically describe systems like Google, Yahoo!, Yandex and Excite that enable users to search for documents on the World Wide Web and USENET newsgroups.

2893 questions
21
votes
3 answers

Recommendable Maven repository search engines?

mavensearch.net doesn't know current versions in many cases, mvnrepository.com is a bit more up to date but doesn't show repositories from where a package can be downloaded, what I would find very useful. What Maven respository search engines do…
deamon
  • 78,414
  • 98
  • 279
  • 415
20
votes
5 answers

An alternative web crawler to Nutch

I'm trying to build a specialised search engine web site that indexes a limited number of web sites. The solution I came up with is: using Nutch as the web crawler, using Solr as the search engine, the front-end and the site logic is coded with…
wassimans
  • 7,496
  • 9
  • 42
  • 57
20
votes
8 answers

How to find Google's IP address?

Google is blocked in some countries. However, there are many ways to access Google, like VPN, agent, and by changing the hosts file. If I want to change the hosts file to access Google, how can I find an available IP address? Update I can't access…
yorelog
  • 255
  • 1
  • 2
  • 6
19
votes
4 answers

How does a full text search server like Sphinx work?

Can anyone explain in simple words how a full text server like Sphinx works? In plain SQL, one would use SQL queries like this to search for certain keywords in texts: select * from items where name like '%keyword%'; But in the configuration files…
0x4a6f4672
  • 24,450
  • 15
  • 96
  • 130
18
votes
8 answers

Can search engines index JavaScript generated web pages?

Can search engines such as Google index JavaScript generated web pages? When you right click and select view source in a page that is generated by JavaScript (e.g using GWT) you do not see the dynamically generated HTML. I suppose that if a search…
Roy
  • 288
  • 3
  • 8
18
votes
5 answers

Internationalization and Search Engine Optimization

I'd like to internationalize my site such that it's accessible in many languages. The language setting will be detected in the request data automatically, and can be overridden in the user's settings / stored in the session. My question pertains to…
Matt Huggins
  • 73,807
  • 32
  • 140
  • 214
18
votes
4 answers

How does "DHT search engine" work?

I'm interested in the Btdigg.org which is called a "DHT search engine". According to this article, it doesn't store any content and even has no database. Then how does it work? Doesn't it need to gather meta infos and store them in database like…
user2025043
  • 181
  • 1
  • 1
  • 3
17
votes
6 answers

SOLR Permissions / Filtering Results depending on Access Rights

For example I have Documents A, B, C. User 1 must only be able to see Documents A, B. User 2 must only be able to see Document C. Is it possible to do it in SOLR without filtering by metadata? If I use metadata filter, everytime there are access…
Manny
  • 6,093
  • 3
  • 28
  • 43
17
votes
1 answer

Is it possible to link directly to Google search results using href?

I would like to link directly to a search results page from a standard link. To give an example of what I'm hoping for, here is some pseudocode: Click here to search…
Frank
  • 1,606
  • 3
  • 15
  • 34
17
votes
4 answers

Strategy for how to crawl/index frequently updated webpages?

I'm trying to build a very small, niche search engine, using Nutch to crawl specific sites. Some of the sites are news/blog sites. If I crawl, say, techcrunch.com, and store and index their frontpage or any of their main pages, then within hours my…
OdieO
  • 6,276
  • 7
  • 46
  • 83
16
votes
1 answer

Precision recall in lucene java

I want to use Lucene to calculate Precision and Recall. I did these steps: Made some index files. To do this I used indexer code and indexed .txt files which exist in this path C:/inn (there are 4 text files in this folder) and take them in "outt"…
BlueGirl
  • 481
  • 9
  • 24
16
votes
1 answer

How to evaluate a search/retrieval engine using trec_eval?

Is there any body who has used TREC_EVAL? I need a "Trec_EVAL for dummies". I'm trying to evaluate a few search engines to compare parameters like Recall-Precision, ranking quality, etc for my thesis work. I can not find how to use TREC_EVAL to…
Babak
  • 161
  • 1
  • 1
  • 4
16
votes
10 answers

How to download google image search results in Python

This question has been asked numerous times before, but all answers are at least a couple years old and currently based on the ajax.googleapis.com API, which is no longer supported. Does anyone know of another way? I'm trying to download a hundred…
xanderflood
  • 666
  • 2
  • 8
  • 20
16
votes
11 answers

How does a search engine rank millions of pages within 1 second?

I understand the basics of search engine ranking, including the ideas of "reverse index", "vector space model", "cosine similarity", "PageRank", etc. However, when a user submits a popular query term, it is very likely that millions of pages…
user1036719
  • 966
  • 2
  • 12
  • 29
16
votes
1 answer

how to configure the synonyms_path in elasticsearch

i'm pretty new to elasticsearch and i want to use synonyms, i added these lines in the configuration file: index : analysis : analyzer : synonym : type : custom tokenizer : whitespace …
Rachid Oussanaa
  • 10,348
  • 14
  • 54
  • 83