Index
A design problem many sites deploying search engine would face, using SingleSearcher vs. MultiSearcher. Lucene gives access to search capability using a Searcher class. Searcher class accepts a query and returns list of Hits sorted by default by relevance. Searcher is an abstract class with possibility of wrangling up customized concrete Searcher. Two already available Searcher classes are IndexSearcher which loads an lucene index from disk and MultiSearcher which loads a list of lucene indices. MultiSearcher does an additional step of running merge sort after indices return the results.

Why the question of IndexSearcher Vs. MultiSearcher
While pondering in a meeting room with nothing but an empty drawing board, it wouldn’t take much time for a design team to come to the conclusion that certain search criterion would be used more than other. Now simple thing would be to make a small manageable indices for that specific criterion and a separate index for general search.

Why not to take this decision on outset

  • Lucene in default configuration is fast enough for most search requirements. Don’t use it as a premature optimization
  • It is not good option for distibuting indices over many disks. Its easier to put disks in RAID 0 configuration
  • Its simpler to maintain single index configuration
  • It involves extra cost of running a merge sort

Some situations it makes sense to distribute indices because the frequency on particular search criterion is too skewed. Still in that case using many indices with load balancer would be better. MultiSearcher does fulfills certain niche, its a premature optimization for most.

Advertisements