Grapeshot blitz
Grapeshot is a SDK providing advanced concept-based bayesian search methods for developers to insert “implicit search” capabilities inside application. In plain english, a promising search engine library for developers.

The technology section summarizes various aspects of the library which puts it apart from other similar projects. Some interesting features are:

  • Document clustering
  • Sentences or paragraphs can be used as queries
  • Word ranking

One feature that has been highlighted is its small footprint. Grapeshot claims to be 300K binary.
small footprint
The bar graph shows, what grapeshot claims to be sizes of binaries for various similar software libraries. The footprint of lucene specifically is of interest. Unlike claimed by the site 11+MB, lucene core jar file as of 2.2.0 version is about 526K only. Which could also be reduced depending on the users requirement.

Reducing binary footprint of lucene
Although 526K doesn’t seem like a large footprint. As an exercise, one can reduce it for embedded or mobile device like grapeshot claims. To reduce binary size:

  • Run the java application of interest with -verbose:class flag. This produces verbose output of class loading details on stdout
  • Run the output through
    cat * |grep lucene-core|cut -f2 -d' '|uniq|tr '.' '/'| awk '{printf "%s.class\n", $1}'
    command. This will filter out all the classes from lucene library loaded at runtime
  • Create a custom jar file by deleting all .class files which are not in the list.

Following this procedure for demo application bundled with lucene core binary, custom jar was reduced by half to 262k. Less than Grapeshot binary.

As side note this python script can be used to deleted files from extracted jar.