My First Faceted Search Example

Unfortunately, before we get into an example there is just one brief thing that we need to understand about the architecture of Solr.

Solr is uses a client-server model. The server is a web application (bound to /solr/ in our {BASE_SOLR_INSTALL} that we just created). The page makes a request to the server and retrieves the results.

What this means is that you may have a Solr instance sitting on a different server to the one that serves up the HTML. Whilst this is seemingly unimportant detail at the moment - when it comes to scalability and security this can be of use to us. We will cover this (hopefully) in later parts of the tutorial.

The results between the client and server are sent/received in one of two ways, namely:

Most of the examples out there use an XML request/response mechanism, we on the other hand will look pretty much exclusively at the solr-j client which is faster and more likely than not will be the future of Solr communications.

On with the Example

For this very simple example, we will be looking at a generic widget website which has a list of widgets that belong to one of many categories and is produced in one of three sizes.

The example will configure the solr server for this and create two JSP pages, one to populate data to the server, the other to retrieve faceted search requests.

Initially, we will not be looking a keyword searching, just the faceted searching part of Solr.

Solr Base Setup

Now that we know what we are doing, in the {BASE_SOLR_INSTALL}/bin/conf/ directory we will be editing the schema.xml file. This file configures the fields for the search engine and we will be using a very cut-down version in this example.

Solr Configuration

Open up the {BASE_SOLR_INSTALL}/bin/conf/schema.xml file in you favourite editor and DELETE everything! We will be creating the configuration from scratch. At some point in time, it would be informative to read through the comments in the file to get a feel of what is going on, at this point we will go through each line in gruesome detail. (Don't worry - a download link of the final configuration is available at the end of this page!)

Now that you have a blank file to play with, the schema is broken into three parts:

  1. Field Types,
  2. Fields, and
  3. Configuration

The schema.xml File

The following lines is the simplest configuration that I could come up with, we will go through it line by line.

01:<?xml version="1.0" encoding="UTF-8" ?>
02:<schema name="example" version="1.1">
03:  <types>
04:   <fieldType name="string" class="solr.StrField" sortMissingLast="true" omitNorms="true"/>
05: </types>
06:
07:
08: <fields>
09:   <field name="id" type="string" indexed="true" stored="true" required="true" /> 
10:   <field name="category" type="string" indexed="true" stored="true" multiValued="true" omitNorms="true"/>
11:   <field name="size" type="string" indexed="true" stored="true"/>
12:  <field name="text" type="string" indexed="true" stored="false" multiValued="true"/>
13: </fields>
14:
15: <uniqueKey>id</uniqueKey>
16: <defaultSearchField>text</defaultSearchField>
17: <solrQueryParser defaultOperator="AND"/>
18:
19: <copyField source="id" dest="text"/>
20: <copyField source="category" dest="text"/>
21: <copyField source="size" dest="text"/>
22:
23:</schema>

01: The xml declaration:

02: The start of the schema element, note that the name attribute is only used for display purposes and can be anything you wish - we will leave it as "example" for the moment - we will see how this name is displayed in the administration console later.

03: The start of the field type definitions

04: The type definition of a string type

05: The end of the field type definitions

08: The start of the fields definitions, each field that is to be used in Solr must have a corresponding definition within this element.

09: The definition of the id field as a string - note how the type attribute's value of string matches the field type definition in line 04.

10: The definition of the category field as a string

11: The definition of the size field as a string

12: The definition of the text field as a string - Note that this is a special field used as a catch-all field for searching the data. We will be using this pattern frequently throughout the tutorial.

13: The end of the fields definitions

15: The unique identifier for each of the documents that will be stored in the Solr index.

16: The default field to search against - the chosen field text was defined as a special 'catch-all' field and will be referenced in the <copyField /> element (see lines 19 to 21 below).

17: Defines the default operator which may be either AND or OR. We are using AND as the default operator for reasons that we will go into later.

19 to 21: The copyField element (as the name would suggest) copies the information from one field to another. The text field is used so that any data that we wish to search on is indexed within this field. This saves us the hassle of querying three separate fields (i.e. the 'id', 'category' and 'size' fields) when looking for a keyword match.

23: The end of the schema element.

You may download the schema.xml file: schema.xml example file download and replace the existing {BASE_SOLR_INSTALL}/bin/conf/schema.xml file.

Starting the Server

Now that you have everything set up, it is time to start the server.

Go to the {BASE_SOLR_INSTALL}/bin/ directory and type

*NIX:./catalina.sh run

Windows:catalina.bat run

The window should print out something along the following lines:

start up configuration

XX:XX:XX XXX XXX XX $ ./catalina.sh run
Using CATALINA_BASE:   /java_servers/tomcat-solr
Using CATALINA_HOME:   /java_servers/tomcat-solr
Using CATALINA_TMPDIR: /java_servers/tomcat-solr/temp
Using JRE_HOME:       /System/Library/Frameworks/JavaVM.framework/Home
...

Deploying the solr server

...
XXX XX, XXXX XX:XX:XX XX org.apache.catalina.startup.HostConfig deployWAR
INFO: Deploying web application archive solr.war
XXX XX, XXXX XX:XX:XX XX org.apache.solr.servlet.SolrDispatchFilter init
INFO: SolrDispatchFilter.init()
XXX XX, XXXX XX:XX:XX XX org.apache.solr.core.SolrResourceLoader locateInstanceDir
INFO: No /solr/home in JNDI
XXX XX, XXXX XX:XX:XX XX org.apache.solr.core.SolrResourceLoader locateInstanceDir
INFO: solr home defaulted to 'solr/' (could not find system property or JNDI)
XXX XX, XXXX XX:XX:XX XX org.apache.solr.core.CoreContainer$Initializer initialize
INFO: looking for solr.xml: /java_servers/tomcat-solr/bin/solr/solr.xml
XXX XX, XXXX XX:XX:XX XX org.apache.solr.core.SolrResourceLoader <init>
INFO: Solr home set to 'solr/'
...

Setting of the solr home

...
XXX XX, XXXX XX:XX:XX XX org.apache.solr.core.SolrResourceLoader locateInstanceDir
INFO: No /solr/home in JNDI
XXX XX, XXXX XX:XX:XX XX org.apache.solr.core.SolrResourceLoader locateInstanceDir INFO: solr home defaulted to 'solr/' (could not find system property or JNDI)
XXX XX, XXXX XX:XX:XX XX org.apache.solr.core.SolrResourceLoader <init>
INFO: Solr home set to 'solr/'
...

setting up the data directory and configuration

...
XXX XX, XXXX XX:XX:XX XX org.apache.solr.core.SolrConfig 
INFO: Loaded SolrConfig: solrconfig.xml
XXX XX, XXXX XX:XX:XX XX org.apache.solr.core.SolrCore 
INFO: Opening new SolrCore at solr/, dataDir=./solr/data/
...

Starting the http listener

...
XXX XX, XXXX XX:XX:XX XX org.apache.coyote.http11.Http11Protocol start
INFO: Starting Coyote HTTP/1.1 on http-8080
...

The server has started and is ready to service incoming requests.

...
XXX XX, XXXX XX:XX:XX XX org.apache.catalina.startup.Catalina start
INFO: Server startup in 2056 ms

The following directories should have also been created in the {BASE_SOLR_INSTALL}/bin/solr/data/ directory.

Indexing The Data

Now that the server has been successfully started it is time to add some information to the Solr index.

To do this we will add a JSP file and place some code snippets into it (ugly yes, but it will show you the code which you can then refactor).

Setting up the ROOT context

  1. Shutdown the server (ctrl-c) in the server's terminal window
  2. Download this ROOT.war file and place it in the {BASE_SOLR_INSTALL}/webapps/ directory
  3. Restart the server

Adding Data To Be Indexed

The ROOT.war has many files and directories contained within it - at the moment we will look at the add-data-1.jsp file:

Apart from includes of classes, HTML and some methods - the area of interest to us is the following 21 lines of code:

01: SolrServer solrServer = new CommonsHttpSolrServer("http://localhost:8080/solr/");
02:
03: out.print("<ul>");
04:
05: for(int i = 0; i < 100; i++) {
06: 	SolrInputDocument solrInputDocument = new SolrInputDocument();
07: 	solrInputDocument.addField("id", new String("widget " + i));
08: 	// add three random categories
09: 	for(int j = 0; j < 3; j++) {
10: 		solrInputDocument.addField("category", getRandomCategory());
11: 	}
12: 	solrInputDocument.addField("size", getRandomSize());
13: 	// this is cheating below - but saves us from the query string...
14: 	solrInputDocument.addField("text", "a");
15: 	solrServer.add(solrInputDocument);
16: 	out.print("<li> Adding: " + solrInputDocument + "</li>\n");
17: }
18:
19: out.print("</ul>");
20:
21: solrServer.commit();

01: Get an HTTP connection to the server. Remember how we renamed the apache-solr-1.3.0.war to solr.war - this means that it was deployed on the server bound to the /solr/ context - which is the one that we want to connect to. We are also using the HTTP connection NOT the XML connection.

03: Some output of HTML within the code snippet - yes this is ugly - but it was done for speed and clarity.

05: We are going to add 100 documents to the index, so a simple loop will suffice.

06: Create a new SolrInputDocument so that we can start adding information to it

07: Add a field to the document - the id field to be precise. Remember that this is the primary key for the index - so we will name it widget 'n' where 'n' is the for loop index.

08 to 11: Add categories to the document - here we are adding 3 random categories (some may be duplicates) - this allows us a little more interaction with the search results.

12: Add one size to the widget.

13 to 14: As the comment indicates - this is indeed cheating a little bit. We are adding a single letter (a) to the field named text. This will help us with the solr query parameter which we will get to in more detail later.

15: Add the document to the SolrServer

16 to 19: More output HTML.

21: Commit the results to the server. You will not be able to search on the data until it is committed.

See how easy that all was! A couple of dozen lines of code and Solr has indexed all of the data - ready for you to search against.

Searching The Data

We will now have a look at the simple search results page also included in the ROOT.war file named search-1.jsp. This file is a little more complex and we will go through the page in a little more detail.

The code is pretty nasty - however it was designed to be quick and dirty and to have all of the internal goings-on easily accessible.

Now that we have been through the code, make sure that your serve has been started and open this page: http://localhost:8080/add-data-1.jsp

The page should look something like the following (the categories will be different for you as they are randomly generated).

Querying The Server

The page is designed to showcase faceted search - not a fully grown search engine - which we will be working towards, so let's dive straight in.

26 lines to query the server is not too bad - there are so many options which can be set to configure how the results come back to us

01: SolrQuery solrQuery = new SolrQuery();
02: solrQuery.setFacet(true);
03: solrQuery.setFacetMinCount(1);
04: solrQuery.setRows(new Integer(100));
05: solrQuery.setStart(new Integer(0));
06: solrQuery.addFacetField("category");
07: solrQuery.addFacetField("size");
08:
09: // here I am definitely cheating!
10: solrQuery.setQuery("a");
11:	
12: Map parameterMap = request.getParameterMap();
13: Set keySet = parameterMap.keySet();
14: Iterator parameterMapIterator = keySet.iterator();
15: while(parameterMapIterator.hasNext()) {
16: 	String key = (String)parameterMapIterator.next();
17: 	String[] values = (String[])parameterMap.get(key);
18: 	for(int i = 0; i < values.length; i++) {
19:		solrQuery.addFilterQuery(key + ":" + values[i]);
20: 	}
21: }
22: 
23: SolrServer solrServer = new CommonsHttpSolrServer("http://localhost:8080/solr/");
24:	
25: QueryResponse queryResponse = solrServer.query(solrQuery);
26: pageContext.setAttribute("queryResponse", queryResponse);

01: Create a new SolrQuery object

02: Turn faceting on

03: Set the minimum number of facet hits that are required before returning the facet. Sounds complicated - what this means is that if one of the facets has zero (0) documents in it, do NOT return this facet.

04: Set the number of rows to return per query - here it is 100 - the same number of documents that have been indexed.

05: Set the result number to start at - here it is zero (0), or the first result. As default the documents are returned in order of their addition to the index, so 'widget 0' is returned first, 'widget 1' is next etc.

06 to 07: Set the facet fields that we want to return with the results. This means that, if available, both the category and size facets will be returned.

09 to 10: Remember how I was cheating by adding the letter 'a' to every document in the text field. Solr requires that a query string is present for every query - there are ways around this - but we will look at a much nicer way of doing this later.

12 to 21: Go through all of the request query parameters and add them as a filterQuery to the solrQuery object (line 19). We are changing a query string of size=medium to something that the Solr server can understand i.e. size:medium.

23: Create a connection to the Solr server.

25: get the query response.

26: Save the above response into the page context for later use.

And there you have it - 26 lines of code to query the Solr server - once again, very straight-forward. Up next we will look at getting the data out of the QueryResponse object and displaying it on the page

Now that we have been through the code, make sure that your server has been started and open this page: http://localhost:8080/search-data-1.jsp

The page should look something like the following (the categories will be different for you as they are randomly generated). NOte that I have already selected the 'Solar' category which has a remove link next to it. Click around the links to see how the search results change. (Note: This is not a perfect example - the url parameters are not cleaned - but you should get the idea).

Displaying the Results

The rest of the page is dedicated to displaying the search results and the returned facets. This will be left as an exercise for the reader - all of the code is within the JSP page (although relatively messy).

What I will go through is the QueryResponse and how to retrieve the data out of this object.

Parsing The QueryResponse Object

Assuming that we have a QueryResponse object named queryResponse - the following methods are used to get the objects out of the structure:

Method Return Type Example
getElapsedTime() long 186
Note:The elapsed time of the full query in milliseconds.
getQTime() int 97
Note: The time it took for Solr to perform the query.
getStatus() int 0
Note: This will be 0 if everything is OK, 500 if there was an exception thrown during the query.
getFacetDates() List<FacetField> n/a
Note: This method returns all facets that are date based.
getFacetDate(String name) FacetField n/a
Note: This method returns a specific date facet named 'name' or null if it doesn't exist.
getFacetFields() List<FacetField> [category:[Magnetic (58), Mechanical (25), Solar (24), Hydro-Electric (22), Electronic (18)], size:[large (21), medium (20), small (17)]]
Note: This method returns all facet fields that are available - including those that are already limiting.
getFacetField(String name) FacetField category:[Magnetic (58), Mechanical (25), Solar (24), Hydro-Electric (22), Electronic (18)]
Note: This method returns a specific facet named 'name' or null if it doesn't exist.
getLimitingFacets() List<FacetField> [category:[Mechanical (25), Solar (24), Hydro-Electric (22), Electronic (18)], size:[large (21), medium (20), small (17)]]
Note: This method returns all of the fields that can still limit the query. For example, notice how 'Magnetic' is no longer available - when setting up this example, I have already limited the query with the category of 'Magnetic'
getResponse() NamedList<Object> {responseHeader={status=0, QTime=1, params={... rows=5, start=0, facet=true, facet.field=[category, size], facet.mincount=1, fq=category:Magnetic, q=a, ...}}, response={numFound=58, start=0, docs=[...}, facet_counts={...}}
Note: This is the raw response object - useful for debugging - you will get very good at reading the raw response object, thankfully, the good people at Apache have implemented the toString() method which makes reading it a breeze - (I have snipped it for brevity and emphasised some parts). It is easier to use the other methods to extract the information. In the above example you can see the query that was made - we are returning 5 rows, starting at 0, with a filtering query (fq) of category:Magnetic
getResponseHeader() NamedList {status=0, QTime=1, params={wt=javabin, rows=5, start=0, facet=true, facet.field=[category, size], facet.mincount=1, fq=category:Magnetic, q=a, version=2.2}}
Note: This is only retrieving the header section - which is included in the above getResponse() method above. This is equivalent to a getResponse().get("responseHeader") method call.
getResults() SolrDocumentList {numFound=58, start=0, docs=[SolrDocument[{category=[Magnetic, Solar, Mechanical], size=medium, id=widget 1}], SolrDocument[{category=[Solar, Magnetic, Solar], size=large, id=widget 3}], SolrDocument[{category=[Solar, Magnetic, Mechanical], size=large, id=widget 5}], SolrDocument[{category=[Magnetic, Magnetic, Hydro-Electric], size=small, id=widget 7}], SolrDocument[{category=[Hydro-Electric, Mechanical, Magnetic], size=large, id=widget 8}]]}
Note: The results are all of the documents that are found, and some metadata about the results (numFound, start). This is equivalent to a call of getResponse().get("response").

There are other methods available - getHighlighting(), getSortValues() and getSpellCheckResponse() which will be covered later.

Parsing the FacetField Object

Here we will be using the queryResponse.getFacetField("category") method call to extract a single facet field for the category. So assuming that we have a FacetField object named facetField the following methods are available.

Method Return Type Example
getName() String category
Note: The name of the facet field.
getValueCount() int 5
Note: The number of facet values for this facet - for this example the values are Magnetic, Mechanical, Solar, Hydro-Electric and Electronic
getValues() List<FacetField.Count> [Magnetic (58), Mechanical (25), Solar (24), Hydro-Electric (22), Electronic (18)]
Note: These are all of the facet names and number of hits stored in a Facet.Count bean

Parsing the Facet.Count Object

The Facet.Count object contains details about the selected facet. In order to use it a simple iteration is done on the list - for example.

List<FacetField.Count> facetFieldCounts = facetField.getValues();
Iterator<FacetField.Count> facetFieldCountIterator = facetFieldCounts.iterator();
while(facetFieldCountIterator.hasNext()) {
	FacetField.Count count = facetFieldCountIterator.next();
	
	// do something with the information 
	String name = count.getName();
	String count = count.getCount();
}

Parsing the SolrDocumentList Object

The SolrDocumentList object contains all of the documents that match the search criteria. Grabbing a simple iterator with the following:

SolrDocumentList solrDocumentList = queryResponse.getResults();
Iterator<SolrDocument> solrDocumentIterator =  solrDocumentList.iterator();
while(solrDocumentIterator.hasNext()) {
	SolrDocument solrDocument = solrDocumentIterator.next();
	// do something useful here
}

Will give you a SolrDocument object, the methods are as follows:

Method Return Type Example
getFieldNames() Collection<String> [category, size, id]
Note: This returns all of the fields that were returned with the document.
getFieldValue(String name) Object [Hydro-Electric, Mechanical, Magnetic]
Note: This will return an object - in this case a java.util.ArrayList of all of the values for the specified name.

There are other methods that can be called - which JavaDoc does a better job of indexing than I do.

Final Thoughts

At this point you should be able to index and retrieve data from a Solr server, we are just missing a few things that we need to fully appreciate the power of this server. We will be looking at the fields and field types in the next section.