Comparison between Solr and Zend Lucene - March 19 2012

Version 1 by Md Abdus Salam
on Mar 20, 2012 08:40.

compared with
Key
This line was removed.
This word was removed. This word was added.
This line was added.

Changes (12)

View Page History
* When searching on large index, search performance is poor

 Experience shared by different software professionals in various websites are presented below.\\
\\

*Experience 1:*
[http://stackoverflow.com/questions/5159892/which-way-you-recommend-zend-lucene-search-with-php-or-lucene-itself-and-port-wi|http://stackoverflow.com/questions/5159892/which-way-you-recommend-zend-lucene-search-with-php-or-lucene-itself-and-port-wi]


* In my experience Zend Lucene is good for small amounts of data, but slows down very quick as you add more data. I had to research a new alternative to zend lucene because it's performance just wasn't cutting it on my current project. To make a long story short, we went with [Solr|http://lucene.apache.org/solr/], which is built on Apache Lucene. Indexing of 70k + articles went from hours to minutes.
* We went from 9 - 10 seconds waiting for search results with Zend_Lucene down to to milliseconds with Solr. And that was for 70k records. \-- [Jeff Busby|http://stackoverflow.com/users/140413/jeff-busby] [Mar 4 '11 at 15:16|http://stackoverflow.com/questions/5159892/which-way-you-recommend-zend-lucene-search-with-php-or-lucene-itself-and-port-wi#comment5839678_5161099]\\

*Experience 2:*
[http://stackoverflow.com/questions/2892519/performance-comparision-between-zend-lucene-and-java-lucene|http://stackoverflow.com/questions/2892519/performance-comparision-between-zend-lucene-and-java-lucene]


* [Zend Lucene|http://framework.zend.com/manual/en/zend.search.lucene.html] _and Java Lucene are built in PHP and java repectively, and PHP language has a higher level than java. Just wondering how big the performance difference among these two, regarding to index building and data searching? Is it much more effective to let java create and rebuild index, and let php use the index?_

Against my better judgment, the company I work for migrated our previous search solution to Zend_Search_Lucene. On pretty heavy-duty hardware, indexing a million documents took several hours, and searches were relatively slow. The indexing process consumed vast amounts of memory, and the indexes frequently became corrupted (using 1.5.2). A single wild card search literally brought the web server to its knees, so we disabled that feature. Memory usage was very high for searches, and as a result requests per second necessarily declined heavily as we had to reduce the number of Apache child processes.

We have since moved to Solr (a Lucene-based Java search server) and the difference is dramatic. Indexing now takes around 10 minutes and searches are lightning fast. What a difference a language makes.\\
\\

*Experience 3:*



After my adventures with Zend-Lucene-Search, and discovering it isn't all its cracked up to be when indexing large datasets, I've turned to Solr (thanks to Bill Karwin for that \:) )

However, when I come to try and search the index with the Zend port, I run into the following error;

Fatal error: Uncaught exception 'Zend_Search_Lucene_Exception' with message 'Unsupported segments file format' in /var/www/Zend/Search/Lucene.php:407 Stack trace: #0 /var/www/Zend/Search/Lucene.php(555): Zend_Search_Lucene->_readSegmentsFile() #1 /var/www/z_search.php(12): Zend_Search_Lucene->_\_construct('tmp/feeds_index') #2 {main} thrown in /var/www/Zend/Search/Lucene.php on line 407

I've tried to have a search around but can't seem to find anything about this problem, everyone just seems to be able to get them to work?



h6. Recommendation: Don’t use Zend PHP Lucene\!

So what happened to my project? We ended up leaving the indexing part in PHP (until for now, I feel a final end coming as the amount of data will be constantly growing which will exceed all limitations of PHP Lucene), and some post processing and optimizing of the index and the querying part has been ported to Java, after I finally managed run a small footprint Jetty “illegally” and unsupported on that server.

*Conclusion:* Don’t consider using PHP Lucene ever. (Unless your project and amount of data is quite small and any limitations don’t matter.)\\
\\
----