Improved ABAP Source Code Search

In my last post I showed you how to create your own searchable index of ABAP source code using Ruby in conjunction with the Ferret and saprfc extensions. Today I am going to show you a hugely improved version that will reduce the indexing time and give you a nicer search interface. (Amazingly, this whole thing came in rather handy for me in the last week!)

First things first: I need to backtrack on something I said earlier. I told you that it is not possible to use strings in an RFC interface. Well, that is not entirely true. You can’t use deep structures (which may include strings) for a TABLE parameter, but you can use them as IMPORT or EXPORT parameters.

The problem is that the saprfc extension does not cater for these (because, I think, the classic RFC SDK does not handle those). However, Piers Harding has also written a Ruby extension called sapnwrfc to use the Netweaver RFC SDK, which can in fact handle deep structures and Strings and things.

The benefit we gain from this is that instead of returning multiple lines of source to the client, who needs to then first concatenate everything to create a source listing, is that we offload all that work on to the ABAP server. Whether it is because ABAP is so good at doing this kind of crunching, or whether our server is so powerful, I don’t know, but our server did not feel the difference at all.

To get going, we will first install the sapnwrfc extension from Piers’ site. The last gem that was packaged for Windows is sapnwrfc-0.19-mswin32.gem, but that worked fine for me. The latest version is 0.21, but I gave up trying to compile it on Windows. I suspect the extconf.rb file just needs some tweaking.

CD to the directory where you downloaded it and issue the following to install it:

gem install 0.19-mswin32.gem

Now of course the extension will not work without the required DLLs, so we need to head over to the SAP Software Distribution Center again and this time download the Netweaver RFC SDK. To do so, choose Download -> Support Packages and Patches -> Search for Support Packages and Patches and search for “rfc sdk”. Choose SAP NW RFC SDK 7.10 from the result list and download the Windows Server on IA32 32bit version. (You will also have to find and download SAPCAR from SWDC to extract the archive).

Once you have extracted the archive, take the dlls from the \lib directory and put them somewhere in your PATH so that they can be found when you use the sapnwrfc extension. C:\WINDOWS\System32 is a good place.

The rest of the requirements are the same as described in the previous post. Make sure you are using Ruby 1.8.6 (the one-click installer) and have installed Ferret.

Now we are going to write a new function module on our SAP backend to retrieve program sources.

Firstly we need to create a structure, based on which we will create a table type, that we can pass in our RFC function.

Structure to pass program sources and texts

Structure to pass program sources and texts

What I did here was to define separate strings for the program source and texts we will be exporting, but in hindsight, there is nothing stopping you from just lumping everything together into one string.

Now we create a table type to use in our function module:

Table type to use in function

Table type to use in function

Next we create our function module. Remember to make it RFC-enabled. The generated comments at the top show you which parameters to create for the function module:

function zsrcex_extractor .
*"----------------------------------------------------------------------
*"*"Local Interface:
*"  IMPORTING
*"     VALUE(PACKAGE_SIZE) TYPE  I DEFAULT 200
*"     VALUE(SELECT_AFTER) TYPE  PROGNAME OPTIONAL
*"     VALUE(LANGU) TYPE  SYLANGU DEFAULT SY-LANGU
*"  EXPORTING
*"     VALUE(EXTRACT) TYPE  ZSRCEX_T
*"     VALUE(NO_MORE_DATA) TYPE  CHAR1
*"----------------------------------------------------------------------

  data: table_lines type i.
  statics: last_progname type progname.
  statics: s_no_more_data type char1.
  data: progs type table of progname with header line.
  data: extract_line type zsrcex.
  data: texts type table of textpool with header line.
  data: source type table of text1000 with header line.
  data: nl type abap_char1 value cl_abap_char_utilities=>newline.
  data: tab type abap_char1 value cl_abap_char_utilities=>horizontal_tab.

  clear: extract[].

* If we have previously (from last call) determined that there
* is no more data, exit the function
  if s_no_more_data = 'X'.
    no_more_data = 'X'. "Keep informing caller
    return.
  endif.

* Start selecting after specified program name, if supplied
  if not select_after is initial.
    last_progname = select_after.
  endif.

* Read a number of source objects specified by package_size
  select progname from reposrc
    into table progs
    up to package_size rows
    where progname > last_progname
* Note: The following list is probably not comprehensive,
* it's just for demonstration purposes:
      and ( progname like 'Z%' or progname like 'Y%'
       or progname like 'SAPMZ%' or progname like 'SAPMY%'
       or progname like 'SAPLZ%' or progname like 'SAPLY%'
       or progname like 'LZ%' or progname like 'LY%' )
* To retrieve EVERYTHING, just comment out the above 4 lines
      and r3state = 'A'. "Active sources only

* Check whether we should stop selecting yet
  describe table progs lines table_lines.
  if table_lines lt package_size.
    s_no_more_data = 'X'.
  endif.

* Process the selected programs
  loop at progs.
    clear: extract_line, texts[].
    extract_line-progname = progs.
* The following does not work e.g. for type pools
    read report progs into source.
    read textpool progs into texts language langu.
* Don't pass back programs with neither texts not source
    if source[] is initial and texts[] is initial.
      continue.
    endif.
* Put source into one string into EXTRACT
    loop at source.
      concatenate extract_line-source source nl into extract_line-source.
    endloop.
* Put texts into single string
    loop at texts.
      concatenate extract_line-texts texts-id tab texts-key tab
                  texts-entry nl
                  into extract_line-texts.
    endloop.
* Store program title separately
    read table texts with key id = 'R'.
    if sy-subrc = 0.
      extract_line-title = texts-entry.
    endif.
    append extract_line to extract.
  endloop.

* Return determined value of no_more_data indicator
  no_more_data = s_no_more_data.

endfunction.

As indicated previously, this will return a subset of custom sources on the system (and probably not all, especially if you are developing in a namespace). By the way, I tried several times (varying the package size each time) retrieving all the source code by commenting out the 4 lines indicated, but at some undetermined point, Ruby would quit with a segmentation fault while executing the remote function call. If it had run successfully, it would have taken about 5-6 hours to retrieve and index the sources for all 1.6 million programs objects on the system.

Now we are ready to call our function from Ruby to retrieve the program sources and index them:

require 'sapnwrfc'
require 'ferret'
include Ferret

PACKAGE_SIZE = 100

index = Index::Index.new(:path => 'abapsrc')

conn = SAPNW::Base.rfc_connect(:ashost => "badnews.com",
  :sysnr  => 00,
  :lang   => "EN",
  :client => 900,
  :user   => "DUDE",
  :passwd => "passwort",
  :trace  => 0)

no_more_data = nil
last_prog = ""
func = conn.discover("ZSRCEX_EXTRACTOR")
until no_more_data
  fc = func.new_function_call
  fc.PACKAGE_SIZE = PACKAGE_SIZE
  fc.SELECT_AFTER = last_prog
  fc.invoke
  fc.EXTRACT.each {|row|
    print row["PROGNAME"]
    progname = row["PROGNAME"].rstrip!
    index << {:progname=>progname,
      :title=>row["TITLE"].rstrip!,
      :content=>(row["SOURCE"] ? row["SOURCE"].rstrip! : ""),
      :texts=>(row["TEXTS"] ? row["TEXTS"].rstrip! : "")
    }
    last_prog = progname
  }
  no_more_data = true if fc.NO_MORE_DATA == "X"
  index.flush
end

conn.close

Something to note: I noticed with the sapnwrfc extension (which wasn’t the case with the saprfc extension), that each call seems to be in a new session context (perhaps something to do with the required new_function_call method?) and as a result, the function was not keeping track of the last program read. Instead I had to keep track of it in my Ruby script and pass it in the SELECT_AFTER parameter, so the function would know from where to select.

Another thing is that, not knowing Ferret too well, I don’t know when and how to use the Index’s flush method. (Last time I didn’t even use it, and it seemed to work fine). It’s akin to a commit, and in fact has an aliased “commit” method. Even if you leave it out, it will work OK. Ferret’s documentation does seem to indicate that you shouldn’t commit too much as it will hamper performance.

If you have successfully managed to index the source code, you will of course require a way to search it. For that, I wrote an improved little web-based search based on Webrick as before, but this providing better search results; previously I showed only 10 results; now I show all of them and paging when necessary.

(Some lines have been split up for readability).

require 'webrick'
require 'ferret'
require 'stringio'

include WEBrick
include Ferret

$index = Index::Index.new(:path => 'abapsrc')

class SearchServlet < HTTPServlet::AbstractServlet

  def do_GET(req, res)
    body = StringIO.new
    body << ""
    body << "
" body << "

Search ABAP Code

" body << " " body << "
" page_size = 10 #No. of results per page, 10 is the default in Ferret anyway if req.query["q"] # Highlighting code for returned search terms highlight_pre = "" highlight_post = "" # Carry out serach srchterm = req.query["q"].to_s query = "content:(#{srchterm}) texts:(#{srchterm})" # Was an offset passed? If not, make it 0: (req.query["offset"] ? offset = req.query["offset"].to_i : offset = 0) topdocs = $index.search(query, {:offset=>offset}) body << "
Your search returned #{topdocs.total_hits} hits." body << "Showing #{offset+1} to #{offset+topdocs.hits.size}:
" #Prepare paging code which will appear at top and bottom of results paging_code = "
" # Previous Link prev_offset = offset >= page_size ? offset - page_size : 0 paging_code << \ "<< Previous " \ if offset > 0 # Next Link next_offset = offset + topdocs.hits.size < topdocs.total_hits ? offset \ + page_size : 0 paging_code << \ " Next >>" \ if offset + page_size < topdocs.total_hits paging_code << "
" body << paging_code # Show page of results topdocs.hits.each {|hit| # Output of single result body << "
" body << \ "#{$index[hit.doc][:progname]} -" highlights = $index.highlight("content:(#{srchterm})", hit.doc, :field => :content, :pre_tag => highlight_pre, :post_tag => highlight_post) highlights.each {|hig| body << hig } if highlights highlights = $index.highlight("texts:(#{srchterm})", hit.doc, :field => :texts, :pre_tag => highlight_pre, :post_tag => highlight_post) highlights.each {|hig| body << hig } if highlights body << "
" } body << paging_code end body << "" res.body = body.string body.close #Close StringIO res['Content-Type'] = "text/html" end end s = HTTPServer.new( :Port => 2000 ) s.mount("/", SearchServlet) trap("INT"){ s.shutdown } s.start

When you run this script, you can point your browser to http://localhost:2000/ and have a nice little web interface to search your index.

Results with paging

Results with paging

The nice thing, which I also didn’t mention last time, is that you have access to Ferret’s advanced query language, which allows you to search for complex terms. A simple example is searching for word1 AND word2 to look for program sources containing both words.

Search with AND

Search with AND

This actually proved useful to me!

Tags: , ,

  • Hi Martin.

    Two things:
    1. My first try at comment was blocked because I am apperantly behind a proxy. I have no proxy configured and accessed via Vodacom 3G.
    2. You may want to remove the [client name removed] references from your source code…

    Cheers

    Johan.

  • admin

    Oops, that was a major oversight. Thanks for pointing that out!

  • Hi Martin,

    is it possible that you provide your Function Module Source code as a SAPlink Nugget or Slinkee?

    Best regards
    Gregor

  • admin

    Hi Gregor,
    That is a good idea. Before doing that, however, I would like to add at least a feature to retrieve deltas for indexing. I will bundle everything together and offer it as a download in a new post.
    Thanks,
    Martin