Book Review: Solr 1.4 Enterprise Search Server 43
MassDosage writes "Solr 1.4 Enterprise Search Server written by David Smiley and Eric Pugh provides in-depth coverage of the open source Solr search server. In some ways this book reads like the missing reference manual for the advanced usage of Solr. It is aimed at readers already familiar with Solr and related search concepts as well as those having some knowledge of programming (specifically Java). The book covers a lot of ground, some of it fairly challenging, and gives those working with Solr a lot of hands-on technical advice on how to use and fine-tune many parts of this powerful application." Keep reading for the rest of MassDosage's review.
Solr 1.4 Enterprise Search Server starts off with a brief description of what Solr is, how it is related to the Lucene libraries (which it is built around) and how it compares to other technologies such as databases. This book is not an introduction to search and this chapter covers only the basics and assumes the reader already knows what they are getting into or that they will read up on search concepts themselves before reading further. Solr is free, open-source technology licensed under the Apache license and is available here. This book covers the 1.4 version of Solr and was published before this version was actually released so it is a bit patchy in areas which were still undergoing change but the authors point this out very clearly in the text where applicable.
Solr 1.4 Enterprise Search Server | |
author | David Smiley and Eric Pugh |
pages | 317 |
publisher | Packt Publishing |
rating | 8/10 |
reviewer | Mass Dosage |
ISBN | 978-1-847195-88-3 |
summary | Enhance your search with faceted navigation, result highlighting, fuzzy queries, ranked scoring, and more. |
The book provides details on downloading and installing Solr, building it from source and the manifold options available for configuring and tweaking it. A freely available data set from Music Brainz is provided for download along with various code examples and a bundled version of Solr 1.4 which is used as the basis for many of the examples referred to throughout the text. In some ways this dataset is limited as it only allows for fairly simple usages compared with the challenges of indexing and searching large bodies of text. Again, the authors clearly mention these limits and briefly describe how certain concepts would be better applied to other data sources.
The basics of schema design, text analysis, indexing and searching are covered over the next three chapters and these include a wide-range of essential search concepts such as tokenizers, stemming, stop-words, synonyms, data import handlers, field qualifiers, filters, scoring, sorting etc. The reader is taken through the process of setting up Solr so it can be used to index data that is to be searched and then how this data can be imported into Solr from a variety of sources like XML and HTML documents, PDF's, databases, CSV files and many others. Using Solr to build search queries is covered with examples that the reader can run via the Solr web interface and provided sample data.
More advanced search techniques are covered next and at this point I felt a lot of what was being discussed went over my head. Perhaps this was because my own search experience hasn't extended very far and the behind-the-scenes algorithms powering search aren't something I've had to directly work with. There were sections here that definitely felt aimed at people with a much more thorough understanding of the theory underpinning search and how a knowledge of mathematics and the data being searched are essential for search algorithm design. Having said this, these chapters felt like they would be really useful to come back to at some point in the future and I'm sure that people working with search on a daily basis would find some useful advice here for how to get the best out of Solr.
Solr provides much more than just indexing and search and the fact that various components are available to do many other common search-related functions is one of its main benefits. These components provide things like the highlighting of search terms in returned results, spell-checking, related documents and so on. The authors cover components which ship with Solr to provide this functionality as well as a mentioning a few that are currently separate software projects. One can easily see how all of this would be directly applicable if one was adding search capability to one's own product or web site as there are a lot of wheels that Solr saves you from having to re-invent. The book also mentions the various parts of Solr that can be extended to modify or add new behaviours, which of course if one of the many advantages of its open source nature.
The final three chapters move on to the more practical side of actually using Solr in the "real world" and discuss various deployment options, how it can be monitored using JMX, security, integration and scaling. In addition to Java (which is the probably the most powerful and straightforward way of integrating with Solr) support for languages like JavaScript, PHP and Ruby is described. I felt the Ruby section was way too long, maybe one of the authors has a soft spot for the Ruby language? The sections on writing a web crawler and doing autocomplete were far more interesting and probably also more generally applicable. The book wraps up with a thorough discussion on how to scale Solr from scaling high (optimising a single server through techniques like caching, shingling and clever schema design and indexing strategies), scaling wide (using multiple Solr servers and replicating or sharding data between them) and scaling deep (a combination of the former two approaches).
On the whole this is a very thorough, detailed book and it is clear that the authors have a lot of experience with Solr and how it is used in practice. This book does not cover a lot of theory and assumes a fair amount of prior knowledge and is definitely aimed at those who need to get their hands dirty and get up and running with Solr in a production environment. The authors have a straightforward, open and honest writing style and aren't afraid of clearly stating where Solr has limitations or imperfections. While the book may have a somewhat steep learning curve, this is isolated to certain chapters which can be skipped and returned to later if necessary. The fact that the writing is concise and to the point means one doesn't have to wade through pages of flowery text before getting to the good bits. If you're seriously thinking about using Solr or are already using it and want to know more so you can take full advantage of it, I would definitely recommend this book.
Full disclosure: I was given a copy of this book free of charge by the publisher for review purposes. They placed no restrictions on what I could say and left me to be as critical as I wanted so the above review is my own honest opinion.
You can purchase Solr 1.4 Enterprise Search Server from amazon.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.
Re: (Score:1)
Re: (Score:2)
I wondered what was up. It's really hard to understand an article summary when I don't know any of the nouns they use. Checking back, Lucene barely had any previous /. coverage, and SOLR gets even less.
Solr is a search server (Score:3, Informative)
For the people wondering what Solr is for, it runs off of Apache Lucene. You feed it data/text and it processes and indexes it. It has some really neat text processing things. After using it for a while I am constantly critiquing crappy search implementations on websites.
Re: (Score:2)
Re: (Score:2)
Works fine, but hard to keep data updated.
That depends on how you are interacting with Solr. There are a number of good clients that integrate in to popular ORMs and can automatically post over updates to Solr as data changes in your application. I'm compiling a list of popular Solr clients over at https://websolr.com/guides/solr/clients [websolr.com].
For some popular example, there is RSolr or Sunspot for Ruby applications. Haystack is a good one for Django, and there are Drupal and Django extensions as well.
Re: (Score:1)
Re: (Score:1)
I've found that Lucene is useful for creating custom search/indexing solutions, but as a server, I much prefer Sphinx [sphinxsearch.com]. It's lighter-weight and compatible with pretty much any language. It's also remarkably fast.
Re: (Score:2)
Re: (Score:1)
Can't agree more. Its really a very useful tool.
Re:cant believe such perfect things here (Score:1)
How about a title that says WTF it is? (Score:2)
"Solr"? Sounds Web 2.0, I don't think I'd be interested. Web 2.0 shouldn't require a book to explain it - in fact, the summary of the book is a bit too long for a proper Web 2.0 application.
Re: (Score:1)
Did you RTFT? It's a book review for a book about a search server, Solr, aimed at the enterprise market. That was a lot of information in not a lot of words.
It's a search technology (Score:3)
You're right, of course. /. editors suck.
SOLR is [related to] a text search technology that is often used in parallel with a database.
http://lucene.apache.org/solr/#intro [apache.org]
Re: (Score:2)
Well crap, that sounds useful. Why on Earth did they do the stupid trendy "drop the e" thing with the name?
I'm interested enough now that I'm going to go read about it - but I'm still not going to read the summary, out of spite.
Re: (Score:2)
because there is no e in solr, dumbass.
Quit spouting off about things you know nothing about.
Solr is actually quite powerful and is a very useful tool for creating awesome searches on your site.
Re: (Score:2)
Excuse me. I meant "drop the e" as a surrogate from "remove the last vowel in the word, preceeding the letter r, which must end the word."
See also: Flickr
Re: (Score:2)
So you think the original name was "soler" rather than "solar"?
Re: (Score:2)
Well crap, that sounds useful. Why on Earth did they do the stupid trendy "drop the e" thing with the name?
Because the apache foundation is primarily interested in web 2.0-y things, so that's what they want their projects to look/sound like?
Besides, e-solr may be a little over-the-top. :-)
Re: (Score:1)
Re: (Score:2)
Is this what Web 2.0 means, supporting an attention span where technologies must fit into little soundbites for people unwilling to actually read and understand the underlying complexity? Oh, I guess I'm not "agile enough". sigh....
(And yeah this could be considered flamebait, but I really am pretty disgusted with the whole "I don't want to deal with complexity" notion. I think one thing that increasingly separates the few competent programmers from the great unwashed masses of hackers is the willingnes
Re: (Score:2)
"Solr"? Sounds Web 2.0, I don't think I'd be interested. Web 2.0 shouldn't require a book to explain it - in fact, the summary of the book is a bit too long for a proper Web 2.0 application.
Is your google broken, or do you merely enjoy acting like a douchebag?
Solr rocks! (Score:3)
Re: (Score:2)
For predictive search, you'll want to get friendly with the Solr TermsComponent [apache.org], which serves up the terms present in your index along with their frequency.
If you want to get really fancy, you can log your popular queries—particularly the ones that have a high correlation with click-throughs.
SOLR is no Drupal. (Score:2)
In fact all the big shops use Solr searching and not Drupal's built in search. Awesome no?
Re: (Score:1)
In fact all the big shops use Solr searching and not Drupal's built in search. Awesome no?
Including, in fact, the White House [oreilly.com], which is on a LAMP stack of Open Source goodness, including Drupal and Solr. Awesome indeed.
Re: (Score:2)
uh yeah. If you think they're actually using Drupal like a normal person would....wellllllll
zetaclear and toenailfungus cures (Score:1)
weldon (Score:1)
LV (Score:1)
Solr is great (Score:1)