Custom ABAP source search with saprfc and Ferret
Today we are going to build our own search engine to search through ABAP source code on an SAP system using our favourite language – Ruby! (With the help of some nice libraries). Sure, there is the “Find in source code” option in SE38, and apparently you can use TREX as well, but this is much more fun.
UPDATE (19 June 2009): Refer to the next post for an improved version of the solution.
Synopsis
What we will be doing is creating a Ruby script to connect to a SAP server that will retrieve the source for many programs, and then create an index on your computer, which you can subsequently search. This requires, of course, that you first write an RFC-enabled function module on the SAP system to facilitate retrieving the code.
For starters, we are going to extract and index a subset of objects. I haven’t been brave enough yet to index all of the source code for all programs including the standard ones, although that was the initial intention. On an ECC system, that can amount to over a million objects!
By the way, I first tried this using Python and Whoosh and Pysaprfc, but after struggling a lot, I eventually gave up. That is to say, there were problems with Pysaprfc that I couldn’t resolve. The connector has not been maintained for a long time, it seems, so that may explain why. Besides, after having used Whoosh, Ferret was much simpler and more pleasant to use (and so is Ruby, for that matter).
So, here is a little recipe (but let’s hope not a recipe for disaster).
Prerequisites
My setup looks as follows:
- Windows XP SP2
- Ruby 1.8.6 (One-click installer)
- Ferret Search Libary
- saprfc Ruby extension by Piers Harding
- The librfc32.dll DLL
To start off with, you must have Ruby installed. You can grab the one-click installer from Rubyforge. After you have installed that, you need to install ferret, which you can simply do by issuing the following from the command line:
gem install ferret
Next, download the saprfc extension for Ruby from Piers Harding’s website. It seems also to be quite old (this was last maintained in June 2007 and made it to 0.37), but it worked for me.
From the command line, cd to the directory where you downloaded the gem, and issue:
gem install saprfc-0.37-mswin32.gem
Finally, you will need to obtain the librfc32.dll DLL. So head over to the SAP Software Distribution Center (SWDC) and go to Download -> Support Packages and Patches -> Search for Support Packages and Patches. Search for “RFC SDK” and choose the “SAP RFC SDK 7.00” from the result list (this is the classic RFC SDK, and not the Netweaver version). Download the Windows Server on IA32 32bit version (unless you are doing this on e.g. Linux).
You will also need SAPCAR to extract the file (it is compressed in SAP’s proprietary archiving format – SAP Archive or SAR). This you should also be able to find if you search for “SAPCAR” in the SWDC. Once you have that, you can extract the SAR from the command line with something like:
sapcar -xf packagefile.SAR
You now need to take the librfc32.dll DLL and copy it to somewhere in your PATH. The best place to put the is probably in the C:\WINDOWS\System32 directory.
Backend: Extractor function module
That was the prerequisites taken care of. Now we need to do some work on the backend. Log on to your SAP system and go to SE11 to create two structures:
Our first structure, YSRCEX2, has just one component, SOURCE_LINE, of type TEXT1000 to hold a line of source code (STRINGs are not allowed in RFC, otherwise we would have created a table with each entry holding an entire program’s source code).
YSRCEX3, our next structure, contains the attributes of each program we will retrieve. In this case, we are storing the creation and modification details, as we will be adding those to our index. The last two components, LINE_FROM and LINE_TO, indicate in which lines of the source table the code for the given program is to be found. In detail:
Component Type
PROGNAMEÂ PROGNAME
CNAMÂ Â Â Â Â CNAM
CDATÂ Â Â Â Â RDIR_CDATE
UNAMÂ Â Â Â Â UNAM
UDATÂ Â Â Â Â RDIR_UDATE
LINE_FROM INT4
LINE_TO INT4
The next thing to do is to create an RFC-enabled function module on your ABAP system that will extract the code. Below you will find the source of the function module that utilizes the two structures we just created. Just some things to note:
- Though doing an OPEN CURSOR WITH HOLD and doing a package select sounds nice in theory, I have found that it does not work, due to an implicit commit between RFC calls which closes your cursor. On our one box it seemed not have this effect, but eventually I ran into the same problem there too. So, although inelegant, the best solution I could think of was to keep track inside the function module of where we had gotten to with the selection.
- I have to warn you that READ REPORT does not read the source code for all objects, e.g. type pools. It looks like there is a bit more effort involved in getting every object’s source code, but we won’t worry about that too much now, but focus on the extraction of the source and indexing it for our search instead.
Now create a function group and add the following function module. (The comments in the code tell you what parameters with what types are required):
FUNCTION YSRCEX_EXTRACTOR.
*"----------------------------------------------------------------------
*"*"Local Interface:
*" IMPORTING
*" VALUE(PACKAGE_SIZE) TYPE I DEFAULT 200
*" EXPORTING
*" VALUE(NO_MORE_DATA) TYPE CHAR1
*" TABLES
*" SOURCES STRUCTURE YSRCEX2
*" PROGS STRUCTURE YSRCEX3
*"----------------------------------------------------------------------
data: line type YSRCEX3.
data: source type table of text1000 with header line.
data: line_from type i.
data: table_lines type i.
statics: last_progname type progname.
statics: s_no_more_data type char1.
clear: sources[], progs[].
* If we have previously (from last call) determined that there
* is no more data, exit the function
if s_no_more_data = 'X'.
no_more_data = 'X'. "Keep informing caller
return.
endif.
* Read a number of source objects specified by package_size
select progname cnam cdat unam udat from reposrc
into CORRESPONDING FIELDS OF TABLE progs
up to package_size rows
where progname > last_progname
* Note: The following list is probably not comprehensive,
* it's just for demonstration purposes:
and ( progname like 'Z%' or progname like 'Y%'
or progname like 'SAPMZ%' or progname like 'SAPMY%'
or progname like 'SAPLZ%' or progname like 'SAPLY%'
or progname like 'LZ%' or progname like 'LY%' )
and r3state = 'A'. "Active sources only
* Check whether we should stop selecting yet
describe table progs lines table_lines.
if table_lines lt package_size.
s_no_more_data = 'X'.
endif.
* Process the selected programs
loop at progs into line.
* The following does not work e.g. for type pools
read report line-PROGNAME into source.
* Add sources to table, keep track of from and to line numbers
loop at source.
add 1 to line_from.
at first. "Track from line number
line-line_from = line_from.
endat.
at last. "Track to line number
line-line_to = line_from.
endat.
sources-source_line = source.
append sources.
endloop.
* Omit programs for which we did not obtain sources
if sy-subrc ne 0. "No source
delete progs.
else.
modify progs from line.
endif.
last_progname = line-progname.
endloop.
* Return determined value of no_more_data indicator
no_more_data = s_no_more_data.
ENDFUNCTION.
As noted in the comments of the source code above, this will not produce a complete list of all custom ABAP source code objects, but we are just demonstrating here anyway.
The indexing client
Now we are done with the backend stuff, and we turn our attention back to our computer, where we installed all those goodies earlier.
Open up a text editor (or any editor; I personally prefer Netbeans for Ruby development) and create a ruby script called “abapsrcindex.rb” (the name doesn’t matter though) with the following content:
require 'SAP/Rfc'
require 'ferret'
require 'stringio'
include Ferret
PACKAGE_SIZE = 100
index = Index::Index.new(:path => 'abapsrc')
rfc = SAP::Rfc.new(:ashost => "myhost.com",
:sysnr => 00,
:lang => "EN",
:client => 800,
:user => "MYUSER",
:passwd => "password",
:trace => 0)
func = rfc.discover("YSRCEX_EXTRACTOR")
func.package_size = PACKAGE_SIZE
until func.no_more_data.value == "X"
func.progs.reset
func.sources.reset
rfc.call(func)
puts func.progs.rows[0]["PROGNAME"] if func.progs.rows[0]
puts func.sources.rows.size
func.progs.nextHashRow {|h|
sio = StringIO.new
func.sources.value[(h["LINE_FROM"]-1)..(h["LINE_TO"]-1)].each {|line|
sio << line.rstrip!
}
index << {:title=>h["PROGNAME"].rstrip!,
:content=>sio.string,
:created_by=>h["CNAM"].rstrip!,
:created_on=>h["CDAT"].rstrip!,
:changed_by=>h["UNAM"].rstrip!,
:changed_on=>h["UDAT"].rstrip!,
}
sio.close
}
end
rfc.close()
Admittedly this solution is a little “quick and dirty”, because if you were to run it subsequent times, it would add the same programs to the index repeatedly, instead of updating them. Ferret has extensive documentation on how you would go about doing this properly. So the best is, if you run this and need to rerun it, first delete the “abapsrc” directory it creates.
The above script does two things:
It creates an index called “abapsrc” (which is a folder with a lot of other files inside) where we will store our ABAP code and related properties.
It then connects to the SAP server, calls our function and retrieves programs with their sources in packages specified by PACKAGESIZE. The source of each program is concatenated into a single string (trailing whitespace is stripped off) and added to the index. This carries on until the function returns NO_MORE_DATA = “X”.
(The few puts littered througout merely help give us an indication of the progress we are making).
If you are new to Ruby, it may interest you to know that we are using a StringIO object, which is pretty much like a StringBuffer in Java, to concatenate the source code. This is to help with performance, otherwise, we would creating a lot of Strings in memory, which is probably highly undesirable.
Run the script:
ruby abapsrcindex.rb
When I ran this, it took roughly 10 minutes to index 6700 odd objects. During that time I saw the Ruby process memory consumption vary between 100 and 300 MB. It didn’t consume an awful lot of CPU though; it varied a lot too, but always in safe levels. I’m wondering whether with threading we can maybe reduce the time it takes to go through the lot; the RFC call is always very short, while the concatenation and indexing takes time (though, if there can only be one lock on the index, as is the case with Whoos, it depends to what extent the concatenation is the bottleneck).
Searching our ABAP code
If everything went well, you now have a searchable index of your ABAP source code (or at least a subset of it). You can create a simple script to search for through the index. Create a new ruby file (e.g. “searchabap.rb”) and put the following in it. This one just takes your input from the command line and uses that input as a search term to search the index.
require 'ferret'
include Ferret
index = Index::Index.new(:path => 'abapsrc')
print "Enter search term: "
gets #user enters search term on command line
index.search_each("content:(#{$_})") do |id, score|
puts "Document #{index[id][:title]} found with a score of #{score}"
highlights = index.highlight("content:(#{$_})", id,
:field => :content,
:pre_tag => "<<<",
:post_tag => ">>>")
highlights.each {|hi| puts hi}
puts "\n"
end
If you run it, you could get something like the following:
ruby searchabap.rb
Enter search term: miller
Document ZCOUNT_CODE found with a score of 0.340764164924622
** Download ABAP programs, function modules, classes to a text file** Author: SJ <<>> (SAP Africa)*** Todo:* - handle fugr top includes, etc*REPORT zabap_download.*----------------------------------------------------------------------** Constants...
Created by AMCKAY
Document ZABAP_DOWNLOAD found with a score of 0.204458490014076
** Download ABAP programs, function modules, classes to a text file** Author: SJ <<>> (SAP Africa)*** Todo:* - handle fugr top includes, etc*REPORT zabap_download.*----------------------------------------------------------------------** Constants...
Created by HVENTER
Now this isn’t glorious, but it’s functional. To make a quick web-based interface, we can use WebBrick which comes standard with Ruby, to write a little script that will do just that:
require 'webrick'
require 'ferret'
require 'stringio'
include WEBrick
include Ferret
$index = Index::Index.new(:path => 'abapsrc')
class SearchServlet < HTTPServlet::AbstractServlet
def do_GET(req, res)
body = StringIO.new
body << ""
body << "
"
body << "
Search ABAP Code
"
if req.query["q"]
srchterm = req.query["q"].to_s
puts "Searching for #{req.query["q"].to_s}"
hits = $index.search_each("content:(#{srchterm})") do |id, score|
body << "
(Just another note for those new to Ruby: I am using a global variable ($index) for the index object. In general, declaring global variables is poor form, but it was easier than trying to find out how to use Webrick properly).
When you run it:
ruby websearch.txt
You can expect something like the following:
Well, that’s all for now. Of course we could play with this a little more and make it fancy, but at least we can now find that elusive dynamic usage of message 038!
Before I forget, a quick thanks to Vitlalie Cherpec for his suggestion to use the WP-Syntax plugin for code highlighting. As you can see, it has done wonders for the appearance of this blog!
Leave a Reply