ABAP source code search continued

It’s been about half a year since I wrote about building a custom ABAP source code search using Ruby and Ferret. The other day I had a little time to resurrect the project, and I thought I should tell you a little about my findings.

When I last worked on this, I was working on Windows, and intended to do a full index of all the source code on an ECC system. However, the indexing always bombed out at a certain point with some memory or IO error (the exact details of which I cannot recall).

This time, however, I decided to run the process on a Linux VM, using VirtualBox to host an Ubuntu Server server. It ran flawlessly. The whole indexing took the better part of an afternoon. On the second run, which I timed, it took  2 hours and 40 minutes to retrieve and index all of just over 1.6 million source code objects with their texts.

Some observations:  I mounted a separate partition of 20GB on my home directory to house the index, as the original partition on which I installed the VM was too small.  The final index was about 7.6GB in size, but during the indexing process, the usage went up to about 12GB as Ferret reorganized the index (at least I think that’s what it was doing – actually, I have no idea).

The Ubuntu server had only 384MB of memory, and both the CPU and memory usage were very low. When monitoring the resources with top, the memory usage sat at around 22%, whicle CPU stayed around around 20-30% for most of the time. This suggests that with some threading, one could perhaps speed up the entire process, depending of course on what Ferret allows. The RFC calls on the SAP system are certainly not a bottleneck.

When you do some spot tests on the finished index, it is pleasing to see that the index contains entries for all types of source units, including class components and type pools, for example, although screen flow logic sources are not included.

This is of course an extremely basic implementation of what could become a more advanced and comprehensive solution, most importantly incorporating deltas to update the index only with what has changed since the last run, rather than completely index everything each time.

I have been meaning to do a complete project out of this, with a web-based frontend to manage the indexes etc., but I always wonder how much real-world use there is for such a source code search. If nothing else, it is amusing to see what expletives have been used in SAP’s code!

After having learned a bit more about how Document Management works in SAP, I have decided to index documents from an ECC system, hopefully including everything from the IMG tree. But for that we will have to wait for another day.

Of course I must not forget to again say a big Thank You to Piers Harding, without whose hard work on the Ruby connectors for RFC, so much fun could not have been possible, as well as Dave Balmain, for his excelleng work on Ferret.

Tags: , , , ,