Improved ABAP Source Code Search
In my last post I showed you how to create your own searchable index of ABAP source code using Ruby in conjunction with the Ferret and saprfc extensions. Today I am going to show you a hugely improved version that will reduce the indexing time and give you a nicer search interface. (Amazingly, this whole thing came in rather handy for me in the last week!)
First things first: I need to backtrack on something I said earlier. I told you that it is not possible to use strings in an RFC interface. Well, that is not entirely true. You can’t use deep structures (which may include strings) for a TABLE parameter, but you can use them as IMPORT or EXPORT parameters.
The problem is that the saprfc extension does not cater for these (because, I think, the classic RFC SDK does not handle those). However, Piers Harding has also written a Ruby extension called sapnwrfc to use the Netweaver RFC SDK, which can in fact handle deep structures and Strings and things.
The benefit we gain from this is that instead of returning multiple lines of source to the client, who needs to then first concatenate everything to create a source listing, is that we offload all that work on to the ABAP server. Whether it is because ABAP is so good at doing this kind of crunching, or whether our server is so powerful, I don’t know, but our server did not feel the difference at all.
To get going, we will first install the sapnwrfc extension from Piers’ site. The last gem that was packaged for Windows is sapnwrfc-0.19-mswin32.gem, but that worked fine for me. The latest version is 0.21, but I gave up trying to compile it on Windows. I suspect the extconf.rb file just needs some tweaking.
CD to the directory where you downloaded it and issue the following to install it:
gem install 0.19-mswin32.gem
Now of course the extension will not work without the required DLLs, so we need to head over to the SAP Software Distribution Center again and this time download the Netweaver RFC SDK. To do so, choose Download -> Support Packages and Patches -> Search for Support Packages and Patches and search for “rfc sdk”. Choose SAP NW RFC SDK 7.10 from the result list and download the Windows Server on IA32 32bit version. (You will also have to find and download SAPCAR from SWDC to extract the archive).
Once you have extracted the archive, take the dlls from the \lib directory and put them somewhere in your PATH so that they can be found when you use the sapnwrfc extension. C:\WINDOWS\System32 is a good place.
The rest of the requirements are the same as described in the previous post. Make sure you are using Ruby 1.8.6 (the one-click installer) and have installed Ferret.
Now we are going to write a new function module on our SAP backend to retrieve program sources.
Firstly we need to create a structure, based on which we will create a table type, that we can pass in our RFC function.
What I did here was to define separate strings for the program source and texts we will be exporting, but in hindsight, there is nothing stopping you from just lumping everything together into one string.
Now we create a table type to use in our function module:
Next we create our function module. Remember to make it RFC-enabled. The generated comments at the top show you which parameters to create for the function module:
function zsrcex_extractor .
*"----------------------------------------------------------------------
*"*"Local Interface:
*" IMPORTING
*" VALUE(PACKAGE_SIZE) TYPE I DEFAULT 200
*" VALUE(SELECT_AFTER) TYPE PROGNAME OPTIONAL
*" VALUE(LANGU) TYPE SYLANGU DEFAULT SY-LANGU
*" EXPORTING
*" VALUE(EXTRACT) TYPE ZSRCEX_T
*" VALUE(NO_MORE_DATA) TYPE CHAR1
*"----------------------------------------------------------------------
data: table_lines type i.
statics: last_progname type progname.
statics: s_no_more_data type char1.
data: progs type table of progname with header line.
data: extract_line type zsrcex.
data: texts type table of textpool with header line.
data: source type table of text1000 with header line.
data: nl type abap_char1 value cl_abap_char_utilities=>newline.
data: tab type abap_char1 value cl_abap_char_utilities=>horizontal_tab.
clear: extract[].
* If we have previously (from last call) determined that there
* is no more data, exit the function
if s_no_more_data = 'X'.
no_more_data = 'X'. "Keep informing caller
return.
endif.
* Start selecting after specified program name, if supplied
if not select_after is initial.
last_progname = select_after.
endif.
* Read a number of source objects specified by package_size
select progname from reposrc
into table progs
up to package_size rows
where progname > last_progname
* Note: The following list is probably not comprehensive,
* it's just for demonstration purposes:
and ( progname like 'Z%' or progname like 'Y%'
or progname like 'SAPMZ%' or progname like 'SAPMY%'
or progname like 'SAPLZ%' or progname like 'SAPLY%'
or progname like 'LZ%' or progname like 'LY%' )
* To retrieve EVERYTHING, just comment out the above 4 lines
and r3state = 'A'. "Active sources only
* Check whether we should stop selecting yet
describe table progs lines table_lines.
if table_lines lt package_size.
s_no_more_data = 'X'.
endif.
* Process the selected programs
loop at progs.
clear: extract_line, texts[].
extract_line-progname = progs.
* The following does not work e.g. for type pools
read report progs into source.
read textpool progs into texts language langu.
* Don't pass back programs with neither texts not source
if source[] is initial and texts[] is initial.
continue.
endif.
* Put source into one string into EXTRACT
loop at source.
concatenate extract_line-source source nl into extract_line-source.
endloop.
* Put texts into single string
loop at texts.
concatenate extract_line-texts texts-id tab texts-key tab
texts-entry nl
into extract_line-texts.
endloop.
* Store program title separately
read table texts with key id = 'R'.
if sy-subrc = 0.
extract_line-title = texts-entry.
endif.
append extract_line to extract.
endloop.
* Return determined value of no_more_data indicator
no_more_data = s_no_more_data.
endfunction.
As indicated previously, this will return a subset of custom sources on the system (and probably not all, especially if you are developing in a namespace). By the way, I tried several times (varying the package size each time) retrieving all the source code by commenting out the 4 lines indicated, but at some undetermined point, Ruby would quit with a segmentation fault while executing the remote function call. If it had run successfully, it would have taken about 5-6 hours to retrieve and index the sources for all 1.6 million programs objects on the system.
Now we are ready to call our function from Ruby to retrieve the program sources and index them:
require 'sapnwrfc'
require 'ferret'
include Ferret
PACKAGE_SIZE = 100
index = Index::Index.new(:path => 'abapsrc')
conn = SAPNW::Base.rfc_connect(:ashost => "badnews.com",
:sysnr => 00,
:lang => "EN",
:client => 900,
:user => "DUDE",
:passwd => "passwort",
:trace => 0)
no_more_data = nil
last_prog = ""
func = conn.discover("ZSRCEX_EXTRACTOR")
until no_more_data
fc = func.new_function_call
fc.PACKAGE_SIZE = PACKAGE_SIZE
fc.SELECT_AFTER = last_prog
fc.invoke
fc.EXTRACT.each {|row|
print row["PROGNAME"]
progname = row["PROGNAME"].rstrip!
index << {:progname=>progname,
:title=>row["TITLE"].rstrip!,
:content=>(row["SOURCE"] ? row["SOURCE"].rstrip! : ""),
:texts=>(row["TEXTS"] ? row["TEXTS"].rstrip! : "")
}
last_prog = progname
}
no_more_data = true if fc.NO_MORE_DATA == "X"
index.flush
end
conn.close
Something to note: I noticed with the sapnwrfc extension (which wasn’t the case with the saprfc extension), that each call seems to be in a new session context (perhaps something to do with the required new_function_call method?) and as a result, the function was not keeping track of the last program read. Instead I had to keep track of it in my Ruby script and pass it in the SELECT_AFTER parameter, so the function would know from where to select.
Another thing is that, not knowing Ferret too well, I don’t know when and how to use the Index’s flush method. (Last time I didn’t even use it, and it seemed to work fine). It’s akin to a commit, and in fact has an aliased “commit” method. Even if you leave it out, it will work OK. Ferret’s documentation does seem to indicate that you shouldn’t commit too much as it will hamper performance.
If you have successfully managed to index the source code, you will of course require a way to search it. For that, I wrote an improved little web-based search based on Webrick as before, but this providing better search results; previously I showed only 10 results; now I show all of them and paging when necessary.
(Some lines have been split up for readability).
require 'webrick'
require 'ferret'
require 'stringio'
include WEBrick
include Ferret
$index = Index::Index.new(:path => 'abapsrc')
class SearchServlet < HTTPServlet::AbstractServlet
def do_GET(req, res)
body = StringIO.new
body << ""
body << "
"
body << "
Search ABAP Code
"
page_size = 10 #No. of results per page, 10 is the default in Ferret anyway
if req.query["q"]
# Highlighting code for returned search terms
highlight_pre = ""
highlight_post = ""
# Carry out serach
srchterm = req.query["q"].to_s
query = "content:(#{srchterm}) texts:(#{srchterm})"
# Was an offset passed? If not, make it 0:
(req.query["offset"] ? offset = req.query["offset"].to_i : offset = 0)
topdocs = $index.search(query, {:offset=>offset})
body << "
When you run this script, you can point your browser to http://localhost:2000/ and have a nice little web interface to search your index.
The nice thing, which I also didn’t mention last time, is that you have access to Ferret’s advanced query language, which allows you to search for complex terms. A simple example is searching for word1 AND word2 to look for program sources containing both words.
This actually proved useful to me!
Leave a Reply