Effective (and easy!) ABAP source code search using Solr (Part 1)
If you have been a long time reader of my blog, you may think I have a bee in my bonnet about being able to search through ABAP source code, as if it were some holy grail or something. Yet when I consider that, given the following use cases, I think there is a lot of scope for such a solution:
- Finding occurrences of certain hard-coded values
- Finding usages of messages issued without a MESSAGE statement
- If code is documented properly, assisting with troubleshooting by e.g. looking for references to e.g. ticket ID’s or certain functionality
- Looking for instances of direct table updates
- Implementing a To-do list by annotating code with e.g. #TODO or #FIXME or #STUB tags in the comments
UPDATE: Perhaps you would rather be interested in the following: ABAP Source Code Search using HANA Fulltext search.
Especially the last use case, in a large team on a large project would prove most useful, if you could convince developers of the benefit something like that would have for themselves.
Overall, I think having source code search functionality can be very beneficial to a development team, both for development and support purposes.
Since my last attempts documented in earlier blog posts, where I was using a combination of Ruby, the Netweaver RFC SDK and Ferret, I have actually been won over to Solr. The first time I tried Solr, I was not impressed by the speed, and for some reason, I managed to index source faster with Ferret. That however, has changed.
With Ferret being seemingly out of maintenance for some time, and because I am easily swayed by opinions expressed in forums and the like (in this case opinions against Ferret, not for Solr), I decided to give Solr another try, and though it is not as simple to set up out of the box as Ferret (which just involves the installation of a gem), it is certainly more powerful and customizable, and it suits our purposes nicely.
The best part of this new solution is that it is very easy to set up: all you need is an ABAP program (code provided here) and a running instance of Solr, which you can deploy to Tomcat running (even) on a desktop. You can probably get your 5-year old nephew to do set it up.
For the rest of the article, I am going to walk you through the steps for setting up an ABAP source code search using Tomcat, Solr, and your ABAP system. (I’m doing this on Windows, by the way; for a Linux/Unix installation things will be mostly the same anyway).
First, let’s grab a zip archive of the latest version of Tomcat. (You can choose to install Tomcat as a service, but for the purpose of demonstation we’ll do a non-invasive install). Unzip the file somewhere on your hard drive. I renamed the directory to “tomcat” for less typing. On the command line, navigate to \bin\ in the tomcat distribution and execute startup.bat.
Now open your browser and point it to http://localhost:8080 to check that it is running. The next step is to install Solr.
Grab the Solr distribution from an Apache mirror near you. I tested with 3.6.1. Unzip the archive somewhere and copy the .war file in the root of the distribution to the \webapps directory in your tomcat installation. I renamed the file to “solr.war” so the resulting URL is easier to access.
If everything went well, Tomcat would have auto-deployed the WAR file and you would end up with a correspondingly-named directory in the \webapps directory. Go to http://localhost:8080/solr to check.
Solr is up and running … almost
Now the important bit: Solr is not working just yet. You need to create a Solr home directory and change the config in your web app to point to this directory. Go back to your Solr archive, and look for examples\solr. Copy this “solr” directory, which is a sample Solr home directory to somewhere on your hard drive. I ended up with C:\temp\solr.
Finally go to the Solr web app, and inside the WEB-INF directory (tomcat\webapps\solr), edit the web.xml file and look for the solr/home env-entry value. Change the value to where-ever your example Solr home ended up, e.g. C:\temp\solr.
Setting the path to the Solr home directory
To test that it is working, go to http://localhost:8080/solr/admin. (The Tomcat servlet container may take a few seconds to register the change, so if it does not work right away, refresh the browser).
Solr admin page
Now we have a working indexing and search engine running, but no data.
The next step is to install the ABAP program. Go to SE38, create an online report (type 1) program and paste the code into the editor and compile. Done. You can access the source code from this gist on Github: https://gist.github.com/3823019 (The source comes complete with selection screen labels inside).
When you execute the program, you will see the following selection screen:
Selection screen of the indexing program
The most important thing is the URL. In this field, enter the URL to the Solr application on your desktop (minding to leave off the trailing slash as the label indicates). Remember to change the host name to your IP address. If you have problem connecting to app, you should check your firewall settings on your desktop. I had that problem.
Right now the program is quite crude, but at 326 lines of code (including comments and blank lines), I think it’s quite feature-full. Future versions of the program may include extra admin and configuration options and better error recovery, but I will keep you posted as it progresses.
You can test that the setup is successful by clicking the “Optimize Index” button, which sends an optimize command (very benign) to the server. If you see an info message pop up, you know the connection is working.
We can connect successfully
Here is a rundown of the other options on the selection screen:
- The package sizes and packages per commit are just options for fine-tuning performance. I think the default values are reasonable to me after having played around a bit. Package size means how many programs’ sources will be bundled together per update request to Solr, and after the packages per commit size has been reached, a commit on Solr is triggered.
- The “Resume after program” option allows you to resume indexing from a certain point; the last updated program is tracked while it is running.
- The deltas option will come in handy later, after you have carried out the initial indexing and wish to schedule a periodic program to add programs that have been created or changed since the last index operation.
- “Delete index before starting”, if set, will delete the entire index first when you run the program to start indexing. The same thing can be achieved by clicking the “Delete Index” button. Probably useful if you want to start a clean index.
- “Include line numbers in source” will do just that – include source code line numbering in the extracted source that is indexed; may be useful for showing in search results.
- If the program encounters as many errors during execution as specified in “Max. errors before stopping”, it will terminate the indexing and the program will stop.
- The “Store Solr URL” button stores the URL in the database (in the INDX cluster table; so no harm done); consider it a “config” option (more perhaps to follow in future versions of the program)
If you look at the code, you will notice that the report has a set of programs hard-coded that represents objects in the customer ‘Z’ and ‘Y’ namespaces. If want to export other source code objects, you will need to change the SELECT statement inside the program.
One more thing before you get going: Using the default Solr schema, the “text” field, which will store the program source is not stored (that is, it is indexed and you can perform a search on it, but you won’t see the source in a search result). You can change this by editing the file conf\schema.xml in your Solr home and looking for the definition of the “text” field, and changing the “stored” attribute to “true”. WARNING: This will dramatically increase the size of your index on disk, but for just indexing your custom objects, you should be fine.
Making the text stored
Once you are ready to start, you can execute the program in the background. Go to SM37 to check that it is running. You can monitor the progress on the Solr server by pointing your browser to http://localhost:8080/solr/admin/stats.jsp and watching the number of items in the index.
Monitoring documents added to the index from the stats page
As programs get added you can start to search through them. I am working on an ABAP report from which to do a search, which I will reveal in the next post, but for now it will suffice to just use your browser. From the admin page (http://localhost:8080/solr/admin/), you can enter a search term and do a search. (See the screenshot above).
You can additionally just play around with the URL. If you set the text to stored above, you will get the source code returned in the results, which allows you to do highlighting of the search terms. There is a plethora of options for searching which you can read about on the Solr wiki.
Searching ABAP code in the browser
Some issues I have found so far (when indexing the entire source modules on my Netweaver demo system): There are program names in REPOSRC that prove problematic (either for Solr or the XML parser) because of special characters. Also, some programs contain characters that are considered invalid XML. (I need to investigate that; the solution might be as simple as converting from UTF-16LE to UTF-8). However, neither of these will probably apply to your code if you are just indexing custom objects.
Performance wasn’t too bad though: On my home demo Netweaver system, it took about 1 1/2 hours to index the just over 1.1 million source objects (granted that there were a few failures due to the issues I just mentioned).
Anyway, as promised, in the next post I will provide an ABAP-based frontend for searching through the index. This will allow you to perform search from the SAP system, and additionally navigate directly to code modules.