2

faceted search is often seen nowadays, but what's its algorithm, how does it do a faceted search so fast among large datasets?

I am going to implement a faceted search by myself so any tips or clues are welcome

Mickey Shine
  • 11,559
  • 22
  • 84
  • 142
  • 1
    could you provide some additional information on what exactly do you mean by "faceted search"? This could mean many things. Can you give us an example, what would search for the user of you application and how? – bpgergo Mar 23 '12 at 11:26
  • Look at Solr's implementation of faceted search on top of Lucene. They have server administrators set flags to determine which fields to facet, and then build their faceted search based on which terms show up in that field. Might be worth looking into emulating. – FloppyDisk Mar 23 '12 at 12:01

2 Answers2

3

In short: You create several indexes, e.g. one for the texts, one for dates, one for geo location, one for numbers, etc. When you add a document to your index, you define how to index each field it has.

Retrieving the documents usually involves crossing results (document ids) from several indexes (products with the words "shoes" in a radius of 100km and a price range of 50-100).

To scale this to huge datasets, you usually use a technique called sharding - each server holds the index data for N documents, and you send the query to all the index servers at once. They each return the top X results, and you sort those and get the unified top X results.

I hope this was the direction you were looking for.

Not_a_Golfer
  • 40,006
  • 7
  • 115
  • 81
0

A quick Google Scholar search for "Faceted Search" should turn up some research papers by the Endeca guys.

http://scholar.google.com/scholar?q=faceted+search&hl=en&btnG=Search&as_sdt=1%2C47&as_sdtp=on