Custom ABAP source search with saprfc and Ferret

Today we are going to build our own search engine to search through ABAP source code on an SAP system using our favourite language – Ruby! (With the help of some nice libraries). Sure, there is the “Find in source code” option in SE38, and apparently you can use TREX as well, but this is much more fun.

UPDATE (19 June 2009): Refer to the next post for an improved version of the solution.

Synopsis

What we will be doing is creating a Ruby script to connect to a SAP server that will retrieve the source for many programs, and then create an index on your computer, which you can subsequently search. This requires, of course, that you first write an RFC-enabled function module on the SAP system to facilitate retrieving the code.

For starters, we are going to extract and index a subset of objects. I haven’t been brave enough yet to index all of the source code for all programs including the standard ones, although that was the initial intention. On an ECC system, that can amount to over a million objects!

By the way, I first tried this using Python and Whoosh and Pysaprfc, but after struggling a lot, I eventually gave up. That is to say, there were problems with Pysaprfc that I couldn’t resolve. The connector has not been maintained for a long time, it seems, so that may explain why. Besides, after having used Whoosh, Ferret was much simpler and more pleasant to use (and so is Ruby, for that matter).

So, here is a little recipe (but let’s hope not a recipe for disaster).

Prerequisites

My setup looks as follows:

  • Windows XP SP2
  • Ruby 1.8.6 (One-click installer)
  • Ferret Search Libary
  • saprfc Ruby extension by Piers Harding
  • The librfc32.dll DLL

To start off with, you must have Ruby installed. You can grab the one-click installer from Rubyforge. After you have installed that, you need to install ferret, which you can simply do by issuing the following from the command line:

gem install ferret

Next, download the saprfc extension for Ruby from Piers Harding’s website. It seems also to be quite old (this was last maintained in June 2007 and made it to 0.37), but it worked for me.

From the command line, cd to the directory where you downloaded the gem, and issue:

gem install saprfc-0.37-mswin32.gem

Finally, you will need to obtain the librfc32.dll DLL. So head over to the SAP Software Distribution Center (SWDC) and go to Download -> Support Packages and Patches -> Search for Support Packages and Patches. Search for “RFC SDK” and choose the “SAP RFC SDK 7.00″ from the result list (this is the classic RFC SDK, and not the Netweaver version). Download the Windows Server on IA32 32bit version (unless you are doing this on e.g. Linux).

You will also need SAPCAR to extract the file (it is compressed in SAP’s proprietary archiving format – SAP Archive or SAR). This you should also be able to find if you search for “SAPCAR” in the SWDC. Once you have that, you can extract the SAR from the command line with something like:

sapcar -xf packagefile.SAR

You now need to take the librfc32.dll DLL and copy it to somewhere in your PATH. The best place to put the is probably in the C:\WINDOWS\System32 directory.

Backend: Extractor function module

That was the prerequisites taken care of. Now we need to do some work on the backend. Log on to your SAP system and go to SE11 to create two structures:

Container for source code

Container for source code

Our first structure, YSRCEX2, has just one component, SOURCE_LINE, of type TEXT1000 to hold a line of source code (STRINGs are not allowed in RFC, otherwise we would have created a table with each entry holding an entire program’s source code).

Container for program attributes

Container for program attributes

YSRCEX3, our next structure, contains the attributes of each program we will retrieve. In this case, we are storing the creation and modification details, as we will be adding those to our index. The last two components, LINE_FROM and LINE_TO, indicate in which lines of the source table the code for the given program is to be found. In detail:

Component Type
PROGNAME  PROGNAME
CNAM      CNAM
CDAT      RDIR_CDATE
UNAM      UNAM
UDAT      RDIR_UDATE
LINE_FROM INT4
LINE_TO   INT4

The next thing to do is to create an RFC-enabled function module on your ABAP system that will extract the code. Below you will find the source of the function module that utilizes the two structures we just created. Just some things to note:

  • Though doing an OPEN CURSOR WITH HOLD and doing a package select sounds nice in theory, I have found that it does not work, due to an implicit commit between RFC calls which closes your cursor. On our one box it seemed not have this effect, but eventually I ran into the same problem there too. So, although inelegant, the best solution I could think of was to keep track inside the function module of where we had gotten to with the selection.
  • I have to warn you that READ REPORT does not read the source code for all objects, e.g. type pools. It looks like there is a bit more effort involved in getting every object’s source code, but we won’t worry about that too much now, but focus on the extraction of the source and indexing it for our search instead.

Now create a function group and add the following function module. (The comments in the code tell you what parameters with what types are required):

FUNCTION YSRCEX_EXTRACTOR.
*"----------------------------------------------------------------------
*"*"Local Interface:
*"  IMPORTING
*"     VALUE(PACKAGE_SIZE) TYPE  I DEFAULT 200
*"  EXPORTING
*"     VALUE(NO_MORE_DATA) TYPE  CHAR1
*"  TABLES
*"      SOURCES STRUCTURE  YSRCEX2
*"      PROGS STRUCTURE  YSRCEX3
*"----------------------------------------------------------------------

  data: line type YSRCEX3.
  data: source type table of text1000 with header line.
  data: line_from type i.
  data: table_lines type i.
  statics: last_progname type progname.
  statics: s_no_more_data type char1.

  clear: sources[], progs[].

* If we have previously (from last call) determined that there
* is no more data, exit the function
  if s_no_more_data = 'X'.
    no_more_data = 'X'. "Keep informing caller
    return.
  endif.

* Read a number of source objects specified by package_size
  select progname cnam cdat unam udat from reposrc
    into CORRESPONDING FIELDS OF TABLE progs
    up to package_size rows
    where progname > last_progname
* Note: The following list is probably not comprehensive,
* it's just for demonstration purposes:
      and ( progname like 'Z%' or progname like 'Y%'
       or progname like 'SAPMZ%' or progname like 'SAPMY%'
       or progname like 'SAPLZ%' or progname like 'SAPLY%'
       or progname like 'LZ%' or progname like 'LY%' )
      and r3state = 'A'. "Active sources only

* Check whether we should stop selecting yet
  describe table progs lines table_lines.
  if table_lines lt package_size.
    s_no_more_data = 'X'.
  endif.

* Process the selected programs
  loop at progs into line.
* The following does not work e.g. for type pools
    read report line-PROGNAME into source.
* Add sources to table, keep track of from and to line numbers
    loop at source.
      add 1 to line_from.
      at first. "Track from line number
        line-line_from = line_from.
      endat.
      at last. "Track to line number
        line-line_to = line_from.
      endat.
      sources-source_line = source.
      append sources.
    endloop.
* Omit programs for which we did not obtain sources
    if sy-subrc ne 0. "No source
      delete progs.
    else.
      modify progs from line.
    endif.
    last_progname = line-progname.
  endloop.

* Return determined value of no_more_data indicator
  no_more_data = s_no_more_data.

ENDFUNCTION.

As noted in the comments of the source code above, this will not produce a complete list of all custom ABAP source code objects, but we are just demonstrating here anyway.

The indexing client

Now we are done with the backend stuff, and we turn our attention back to our computer, where we installed all those goodies earlier.

Open up a text editor (or any editor; I personally prefer Netbeans for Ruby development) and create a ruby script called “abapsrcindex.rb” (the name doesn’t matter though) with the following content:

require 'SAP/Rfc'
require 'ferret'
require 'stringio'
include Ferret

PACKAGE_SIZE = 100
index = Index::Index.new(:path => 'abapsrc')

rfc = SAP::Rfc.new(:ashost => "myhost.com",
  :sysnr  => 00,
  :lang   => "EN",
  :client => 800,
  :user   => "MYUSER",
  :passwd => "password",
  :trace  => 0)

func = rfc.discover("YSRCEX_EXTRACTOR")
func.package_size = PACKAGE_SIZE
until func.no_more_data.value == "X"
  func.progs.reset
  func.sources.reset
  rfc.call(func)
  puts func.progs.rows[0]["PROGNAME"] if func.progs.rows[0]
  puts func.sources.rows.size
  func.progs.nextHashRow {|h|
    sio = StringIO.new
    func.sources.value[(h["LINE_FROM"]-1)..(h["LINE_TO"]-1)].each {|line|
      sio << line.rstrip!
    }
    index << {:title=>h["PROGNAME"].rstrip!,
      :content=>sio.string,
      :created_by=>h["CNAM"].rstrip!,
      :created_on=>h["CDAT"].rstrip!,
      :changed_by=>h["UNAM"].rstrip!,
      :changed_on=>h["UDAT"].rstrip!,
    }
    sio.close
  }
end

rfc.close()

Admittedly this solution is a little “quick and dirty”, because if you were to run it subsequent times, it would add the same programs to the index repeatedly, instead of updating them. Ferret has extensive documentation on how you would go about doing this properly. So the best is, if you run this and need to rerun it, first delete the “abapsrc” directory it creates.

The above script does two things:

It creates an index called “abapsrc” (which is a folder with a lot of other files inside) where we will store our ABAP code and related properties.

It then connects to the SAP server, calls our function and retrieves programs with their sources in packages specified by PACKAGESIZE. The source of each program is concatenated into a single string (trailing whitespace is stripped off) and added to the index. This carries on until the function returns NO_MORE_DATA = “X”.

(The few puts littered througout merely help give us an indication of the progress we are making).

If you are new to Ruby, it may interest you to know that we are using a StringIO object, which is pretty much like a StringBuffer in Java, to concatenate the source code. This is to help with performance, otherwise, we would creating a lot of Strings in memory, which is probably highly undesirable.

Run the script:

ruby abapsrcindex.rb

When I ran this, it took roughly 10 minutes to index 6700 odd objects. During that time I saw the Ruby process memory consumption vary between 100 and 300 MB. It didn’t consume an awful lot of CPU though; it varied a lot too, but always in safe levels. I’m wondering whether with threading we can maybe reduce the time it takes to go through the lot; the RFC call is always very short, while the concatenation and indexing takes time (though, if there can only be one lock on the index, as is the case with Whoos, it depends to what extent the concatenation is the bottleneck).

Searching our ABAP code

If everything went well, you now have a searchable index of your ABAP source code (or at least a subset of it). You can create a simple script to search for through the index. Create a new ruby file (e.g. “searchabap.rb”) and put the following in it. This one just takes your input from the command line and uses that input as a search term to search the index.

require 'ferret'
include Ferret

index = Index::Index.new(:path => 'abapsrc')

print "Enter search term: "
gets #user enters search term on command line

index.search_each("content:(#{$_})") do |id, score|
  puts "Document #{index[id][:title]} found with a score of #{score}"
  highlights = index.highlight("content:(#{$_})", id,
                               :field => :content,
                               :pre_tag => "<<<",
                               :post_tag => ">>>")
  highlights.each {|hi| puts hi}
  puts "\n"
end

If you run it, you could get something like the following:

ruby searchabap.rb
Enter search term: miller
Document ZCOUNT_CODE found with a score of 0.340764164924622

** Download ABAP programs, function modules, classes to a text file** Author: SJ <<>> (SAP Africa)*** Todo:*  - handle fugr top includes, etc*REPORT  zabap_download.*----------------------------------------------------------------------** Constants...

Created by AMCKAY

Document ZABAP_DOWNLOAD found with a score of 0.204458490014076

** Download ABAP programs, function modules, classes to a text file** Author: SJ <<>> (SAP Africa)*** Todo:*  - handle fugr top includes, etc*REPORT  zabap_download.*----------------------------------------------------------------------** Constants...

Created by HVENTER

Now this isn’t glorious, but it’s functional. To make a quick web-based interface, we can use WebBrick which comes standard with Ruby, to write a little script that will do just that:

require 'webrick'
require 'ferret'
require 'stringio'

include WEBrick
include Ferret

$index = Index::Index.new(:path => 'abapsrc')

class SearchServlet < HTTPServlet::AbstractServlet

  def do_GET(req, res)
    body = StringIO.new
    body << ""
    body << "
" body << "

Search ABAP Code

" body << " " body << "
" if req.query["q"] srchterm = req.query["q"].to_s puts "Searching for #{req.query["q"].to_s}" hits = $index.search_each("content:(#{srchterm})") do |id, score| body << "
" body << "#{$index[id][:title]} -" highlights = $index.highlight("content:(#{srchterm})", id, :field => :content, :pre_tag => "", :post_tag => "") # Show only first three occurrences in each program: highlights.each {|hig| body << hig } body << "
" end body << "
Your search returned #{hits} hits.
" end body << "" res.body = body.string body.close #Close StringIO res['Content-Type'] = "text/html" end end s = HTTPServer.new( :Port => 2000 ) s.mount("/", SearchServlet) trap("INT"){ s.shutdown } s.start

(Just another note for those new to Ruby: I am using a global variable ($index) for the index object. In general, declaring global variables is poor form, but it was easier than trying to find out how to use Webrick properly).

When you run it:

ruby websearch.txt

You can expect something like the following:

Search results from browser search

Search results from browser search

Well, that’s all for now. Of course we could play with this a little more and make it fancy, but at least we can now find that elusive dynamic usage of message 038!

Before I forget, a quick thanks to Vitlalie Cherpec for his suggestion to use the WP-Syntax plugin for code highlighting. As you can see, it has done wonders for the appearance of this blog!

Tags: , ,

  • Hey,
    though i dunno ABAP, neither any programming language, what you have done impressed me :)
    why not sharing this in SAP SDN Blogs ?

    https://www.sdn.sap.com/irj/scn/weblogs

    i think may people response ur guide.

    wish u my best

    cheers~

  • admin

    Hey eddai, that is in fact just what I did! https://www.sdn.sap.com/irj/scn/weblogs?blog=/pub/wlg/14665 . Thanks very much!

  • Hey Martin, you are too clever! And must have loads of spare time, especially since you caould have done this in SE80 “find in source” as you mention!

  • admin

    Hey John, nice of you to drop by! I hope you read the next post, as it gets better! The thing with “find in source” is that you can only search for a single string, whereas this allows you to use complex search criteria (e.g. foo AND bar; searching for two terms in a piece of source code).

  • I just followed to https://www.sdn.sap.com/irj/scn/weblogs?blog=/pub/wlg/14665.. anyway, I can not display this page.. will try again

  • admin

    The link is still valid (I just looked at it), but maybe because of the full stop you’ve got at the end?
    Anyway, it just links back to this blog entry, and this one has been superseded by the next post.