ABAP source code search using Solr (Part 3) – Using multiple cores
In my last post I briefly introduced you to the search front-end that I developed to go along with my ABAP source code indexing solution. Well, it turns out there was a severe bug in the search program which would prevent you from using the program with multiple cores (collections). So before I start the discussion, perhaps you want to grab the latest source code from the GitHub repo.
Thanks also to reader Ken Kirby who posted a comment on the first post about this solution where he added a line to set the content type on the request on the indexing program. This came just at the time when I was looking to see how to get the solution to work with the relatively new Solr 4 release, and which saved me a lot of searching, as it seems that was just what I needed.
But anyway, this particular discussion is about collections, or cores as they are referred to, which I think we would probably (perhaps wrongly) call ‘indexes’. This should work, I think, regardless of whether you are using release 3 or 4 of Solr.
The point of having multiple cores is that you can use the same Solr instance to index multiple source of separate data, or, in our case, the source code of multiple ABAP systems, if we wanted. Of course it would be possible to use a single core to index several systems, and assign the source system ID to the key and then leave it up to the user to specify through which set of code they want to search.
Let us look first at Solr 3:
If you refer back to the first post in this series, you will recall that you had to copy a directory called ‘solr’ in the examples directory of the distribution somewhere on your drive, and then reference this location in the web.xml file of the Solr web application. If you open up the solr.xml file inside this directory, you will find a <cores> section in which you can configure any number of cores (or collections or whatever). By default there is one core called “collection1” which says that its directory is the current directory (i.e. it lives in the ‘solr’ directory). Inside the specified directory, Solr will look for a ‘conf’ and a ‘data’ directory.
Also, the config specifies a default core to use when none is specified. So when you reference a URL like /solr/select, it uses this configuration to determine the default core to use.
Now, in order to use multiple cores, you need to change the above config. Let’s say you want to use a separate core for each ABAP system whose source you want to index. You could do something like this:
Here we have named each core after our ABAP system’s system ID and given it a corresponding directory under the current directory with the same name. To make this work, you have to copy the ‘conf’ directory from the current directory to each of the subdirectories. (Remember that in the first part of this series, we made some changes to the config to cause source code to be stored in the index). You also need to create a directory called ‘data’ (or copy your existing data directory if it already has indexed data in it) in the subdirectories. (Or Solr might create it; I haven’t tried).
So, now you can use Solr to index the source code of more than one system. To actually use this setup, you now need to change the config for the Solr URL in the ZSOLR_ABAPSRC_INDEX program from:
http://solrhost.example.com:8080/solr
to
http://solrhost.example.com:8080/solr/npl
where the trailing ‘/npl’ represents the name of one of your cores. Now, when the indexing or search programs send an index or search command, it will explicitly refer to the core specified in the URL. If you don’t add it, the value of the default core specified will be used.
On my release 3.6 instance of Solr, I had to restart Tomcat for the changes to take effect. (Maybe I didn’t wait long enough for the system to pick up the change in the filesystem?)
Anyway, on release 4 of Solr, it’s much nicer. You don’t have to play around with config files, because cores can be administered with a slick web interface, under the “Core Admin” menu option.
Neat.
There is still room for improvement, like adding texts of programs to the index or (what would be particularly useful) being able to search by package, but that will all follow in good time.
Leave a Reply