Forgot your password?
typodupeerror
Books Book Reviews

Book Review: Solr 1.4 Enterprise Search Server 43

Posted by samzenpus
from the read-all-about-it dept.
MassDosage writes "Solr 1.4 Enterprise Search Server written by David Smiley and Eric Pugh provides in-depth coverage of the open source Solr search server. In some ways this book reads like the missing reference manual for the advanced usage of Solr. It is aimed at readers already familiar with Solr and related search concepts as well as those having some knowledge of programming (specifically Java). The book covers a lot of ground, some of it fairly challenging, and gives those working with Solr a lot of hands-on technical advice on how to use and fine-tune many parts of this powerful application." Keep reading for the rest of MassDosage's review.
Solr 1.4 Enterprise Search Server
author David Smiley and Eric Pugh
pages 317
publisher Packt Publishing
rating 8/10
reviewer Mass Dosage
ISBN 978-1-847195-88-3
summary Enhance your search with faceted navigation, result highlighting, fuzzy queries, ranked scoring, and more.
Solr 1.4 Enterprise Search Server starts off with a brief description of what Solr is, how it is related to the Lucene libraries (which it is built around) and how it compares to other technologies such as databases. This book is not an introduction to search and this chapter covers only the basics and assumes the reader already knows what they are getting into or that they will read up on search concepts themselves before reading further. Solr is free, open-source technology licensed under the Apache license and is available here. This book covers the 1.4 version of Solr and was published before this version was actually released so it is a bit patchy in areas which were still undergoing change but the authors point this out very clearly in the text where applicable.

The book provides details on downloading and installing Solr, building it from source and the manifold options available for configuring and tweaking it. A freely available data set from Music Brainz is provided for download along with various code examples and a bundled version of Solr 1.4 which is used as the basis for many of the examples referred to throughout the text. In some ways this dataset is limited as it only allows for fairly simple usages compared with the challenges of indexing and searching large bodies of text. Again, the authors clearly mention these limits and briefly describe how certain concepts would be better applied to other data sources.

The basics of schema design, text analysis, indexing and searching are covered over the next three chapters and these include a wide-range of essential search concepts such as tokenizers, stemming, stop-words, synonyms, data import handlers, field qualifiers, filters, scoring, sorting etc. The reader is taken through the process of setting up Solr so it can be used to index data that is to be searched and then how this data can be imported into Solr from a variety of sources like XML and HTML documents, PDF's, databases, CSV files and many others. Using Solr to build search queries is covered with examples that the reader can run via the Solr web interface and provided sample data.

More advanced search techniques are covered next and at this point I felt a lot of what was being discussed went over my head. Perhaps this was because my own search experience hasn't extended very far and the behind-the-scenes algorithms powering search aren't something I've had to directly work with. There were sections here that definitely felt aimed at people with a much more thorough understanding of the theory underpinning search and how a knowledge of mathematics and the data being searched are essential for search algorithm design. Having said this, these chapters felt like they would be really useful to come back to at some point in the future and I'm sure that people working with search on a daily basis would find some useful advice here for how to get the best out of Solr.

Solr provides much more than just indexing and search and the fact that various components are available to do many other common search-related functions is one of its main benefits. These components provide things like the highlighting of search terms in returned results, spell-checking, related documents and so on. The authors cover components which ship with Solr to provide this functionality as well as a mentioning a few that are currently separate software projects. One can easily see how all of this would be directly applicable if one was adding search capability to one's own product or web site as there are a lot of wheels that Solr saves you from having to re-invent. The book also mentions the various parts of Solr that can be extended to modify or add new behaviours, which of course if one of the many advantages of its open source nature.

The final three chapters move on to the more practical side of actually using Solr in the "real world" and discuss various deployment options, how it can be monitored using JMX, security, integration and scaling. In addition to Java (which is the probably the most powerful and straightforward way of integrating with Solr) support for languages like JavaScript, PHP and Ruby is described. I felt the Ruby section was way too long, maybe one of the authors has a soft spot for the Ruby language? The sections on writing a web crawler and doing autocomplete were far more interesting and probably also more generally applicable. The book wraps up with a thorough discussion on how to scale Solr from scaling high (optimising a single server through techniques like caching, shingling and clever schema design and indexing strategies), scaling wide (using multiple Solr servers and replicating or sharding data between them) and scaling deep (a combination of the former two approaches).

On the whole this is a very thorough, detailed book and it is clear that the authors have a lot of experience with Solr and how it is used in practice. This book does not cover a lot of theory and assumes a fair amount of prior knowledge and is definitely aimed at those who need to get their hands dirty and get up and running with Solr in a production environment. The authors have a straightforward, open and honest writing style and aren't afraid of clearly stating where Solr has limitations or imperfections. While the book may have a somewhat steep learning curve, this is isolated to certain chapters which can be skipped and returned to later if necessary. The fact that the writing is concise and to the point means one doesn't have to wade through pages of flowery text before getting to the good bits. If you're seriously thinking about using Solr or are already using it and want to know more so you can take full advantage of it, I would definitely recommend this book.

Full disclosure: I was given a copy of this book free of charge by the publisher for review purposes. They placed no restrictions on what I could say and left me to be as critical as I wanted so the above review is my own honest opinion.

You can purchase Solr 1.4 Enterprise Search Server from amazon.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.
This discussion has been archived. No new comments can be posted.

Book Review: Solr 1.4 Enterprise Search Server

Comments Filter:
  • by Anonymous Coward on Monday March 14, 2011 @03:59PM (#35483558)

    For the people wondering what Solr is for, it runs off of Apache Lucene. You feed it data/text and it processes and indexes it. It has some really neat text processing things. After using it for a while I am constantly critiquing crappy search implementations on websites.

  • "Solr"? Sounds Web 2.0, I don't think I'd be interested. Web 2.0 shouldn't require a book to explain it - in fact, the summary of the book is a bit too long for a proper Web 2.0 application.

    • by Anonymous Coward

      Did you RTFT? It's a book review for a book about a search server, Solr, aimed at the enterprise market. That was a lot of information in not a lot of words.

    • You're right, of course. /. editors suck.

      SOLR is [related to] a text search technology that is often used in parallel with a database.

      http://lucene.apache.org/solr/#intro [apache.org]

      • by lwsimon (724555)

        Well crap, that sounds useful. Why on Earth did they do the stupid trendy "drop the e" thing with the name?

        I'm interested enough now that I'm going to go read about it - but I'm still not going to read the summary, out of spite.

        • because there is no e in solr, dumbass.

          Quit spouting off about things you know nothing about.

          Solr is actually quite powerful and is a very useful tool for creating awesome searches on your site.

          • by lwsimon (724555)

            Excuse me. I meant "drop the e" as a surrogate from "remove the last vowel in the word, preceeding the letter r, which must end the word."

            See also: Flickr

        • by kwerle (39371)

          Well crap, that sounds useful. Why on Earth did they do the stupid trendy "drop the e" thing with the name?

          Because the apache foundation is primarily interested in web 2.0-y things, so that's what they want their projects to look/sound like?

          Besides, e-solr may be a little over-the-top. :-)

      • by subk (551165)
        Teh editors do not, in fact, suck. They merely assumed that you--a wiz-kid, tech-mag reader--would be smart enough to perform a simple evaluation before jumping into a topic. 1) Check title for prefixes. This one says "book review". 2) Do I know what Solr (or ) is? If yes, read article. Maybe post a comment. If no, see step 3 if still interested. 3) Google/wiki the technology until you are ready answer "yes" to step 2.
    • Is this what Web 2.0 means, supporting an attention span where technologies must fit into little soundbites for people unwilling to actually read and understand the underlying complexity? Oh, I guess I'm not "agile enough". sigh....
      (And yeah this could be considered flamebait, but I really am pretty disgusted with the whole "I don't want to deal with complexity" notion. I think one thing that increasingly separates the few competent programmers from the great unwashed masses of hackers is the willingnes

    • "Solr"? Sounds Web 2.0, I don't think I'd be interested. Web 2.0 shouldn't require a book to explain it - in fact, the summary of the book is a bit too long for a proper Web 2.0 application.

      Is your google broken, or do you merely enjoy acting like a douchebag?

  • by jnelson4765 (845296) on Monday March 14, 2011 @06:00PM (#35484960) Journal
    Use it at work to replace all the MySQL fulltext indexes we were using for a (rather bad) search interface when we moved to InnoDB. Don't miss the old search at all. I may be grabbing this book, since my boss asked for predictive search in our app soon...
    • by nzadrozny (555073)

      For predictive search, you'll want to get friendly with the Solr TermsComponent [apache.org], which serves up the terms present in your index along with their frequency.

      If you want to get really fancy, you can log your popular queries—particularly the ones that have a high correlation with click-throughs.

  • In fact all the big shops use Solr searching and not Drupal's built in search. Awesome no?

    • by nzadrozny (555073)

      In fact all the big shops use Solr searching and not Drupal's built in search. Awesome no?

      Including, in fact, the White House [oreilly.com], which is on a LAMP stack of Open Source goodness, including Drupal and Solr. Awesome indeed.

  • http://zetacleartoenailfunguscures.info/ [slashdot.org]" >zetaclear and toenailfungus cures Zeta Clear Review Site describing how Zeta Clear is a natural cure for toenail fungus. Immediately after a nail caution session, carry out somewhat test. To find out how smooth your fingernails actually are run them down an outdated pair of tights or pantyhose. For more information please visit: - http://zetacleartoenailfunguscures.info/ [zetacleart...cures.info]
  • I was very encouraged to find this site. I wanted to thank you for this special read. I definitely savored every little bit of it. http://www.parislimousineorlando.com/ [parislimou...rlando.com] orlando limo services
  • Have you thought that living in a white-collar workers life? Do you want your life be more enjoyable? Come to our Louis Vuitton outlet store [buylouisvu...outlet.org] and Louis Vuitton outlet [buylouisvu...outlet.org] is where you want to go, your dreams will be realized here, look !Louis Vuitton handbags and purses, buy Louis Vuitton [buylouisvu...outlet.org] will make you be more beautiful . Oh ! Please take a action!
  • I've used it at work with the acts_as_solr plugin for Rails. Simply define in your models which fields in the database you'd like solr to index and it just does it, allowing you to build a nice, robust search capability into your website with not a lot of work. And I'm sure I'm only using about 10% of what Solr actually provides. Look forward to checking out this book and seeing what other tricks it's got.

Can't open /usr/fortunes. Lid stuck on cookie jar.

Working...