REGEX

Updated March 2017; see History

Pre-compiling only:

xcall REGEX, pattern, patno

General use:

xcall REGEX, pattern, status, subject {,flags, stpos, match { ,subcnt, submatch1, ... submatchn}}

xcall REGEX, pattern, status, subject ,flags, stpos, match ,subcnt, submatch(n)

REGEX.SBR provides a detailed XCALL interface to A-Shell's regular expression processor. Other interfaces to the same processor are provided in the INSTR() function and INFLD.

The first general syntax shown above is the traditional form, which uses individual submatch variables to receive the submatch results. The second, introduced in A-Shell 5.1.1210 of March 2011, uses an array to receive the submatch results.

Parameters

pattern  (String)  [in]

Regular expression, without the Perl-style leading/trailing slash. To specify a pre-compiled pattern, set pattern= chr$(1) thru chr$(20) for the 20 precompiled patterns. Note that when using the same pattern string consecutively, explicit pre-compiling is superfluous; all pattern strings get internally pre-compiled, and whenever the current pattern string is the same as the previous one, the previously pre-compiled pattern is used automatically. Explicit pre-compilation only makes sense when you are repetitively using more than one pattern string in alternation.

status  (F or I)  [out]

Value

Returned status values

>0

Matching success (starting position of match)

-101 to -199

Pattern Complilation Errors

-99

Too few parameters

-98

Unable to load PCRE library (pcre3.dll under Windows)

-97

Unable to allocate memory for library

-96

Unable to link to pcre_compil2 or pcre_exit function in lib

-95

Invalid pre-compiled pattern number (1-20)

-94

No such precompiled pattern (not previously compiled)

-93

Error outputting to dynamic variable (out of memory?)

-2 to -25

Matching Errors

 

patno  (F or I)  [in/out]

For precompiling, on input must be set to an integer 1-20 (for the 20 numbered precompiled patterns). On output, will still be set to the same value for success, or <=0 on error; see status codes above. See note under pattern regarding pre-compilation.

subject  (String)  [in]

Subject string to test against the pattern.

flags  (F or B,4)  [in]

A bitmap of option flags whose symbols are defined in ASHINC:REGEX.DEF. See Options Flags. Note that you can use Perl-style internal option settings within a pattern to change options that would otherwise require using the flags parameter.

stpos  (Num)  [in]

Optional starting position in subject, base 1. If not specified or zero, treated same as 1 (i.e. start at beginning of subject).

match  (String)  [out]

Returns the string within the subject that was matched. On errors, it may return the text of the error description.

subcnt  (Num)  [in/out]

For the array syntax (where the subexpression matches are returned into an array instead of individual parameters), you must set subcnt to the number of elements in the array, starting from the element supplied in the parameter. In other words, if you pass submatch(1), then set subcnt to the total number of elements in the submatch() array; if you pass submatch(2), set subcnt to one less than the total.

For the non-array syntax, subcnt is ignored on input; the maximum number of possible matches is limited instead by the number of submatch1 .. submatchn parameters passed.

For either syntax, on return, subcnt will be set to the number of sub-expression matches in the subject, even if it exceeds the number of submatch parameters passed or the specified size of the submatch(n) array. Therefore, when processing the returned sub-expression matches, care must be taken that you don't get fooled by the return value of subcnt into trying to access more submatch parameters than were available to the routine or especially more array elements than exist. For example:

subcnt = .extent(submatch())                       ! limit matches to extent of array

xcall REGEX, pattern, status, subject, flags, match, subcnt, submatch(1)

if status > 0 then                                 ! on success...

    for i = 1 to (subcnt min .extent(submatch()))  ! process sub-matches (up to limit)

        ...

 

submatch1 ... submatchn  (String)  [out]

These return subexpression matches (up to the number specified by subcnt, up to a maximum number of 100 (increased from 20 in A-Shell 5.1.1210). Note that null matches are quite possible; that is, subcnt may return N, but of those N, some of them, and not necessarily only the ones at the end, may be empty.

submatch(n)  (String array)  [out]

A single starting element in a string array, e.g. submatch(1), may be used in place of several individual submatchn parameters. The array must have a fixed number of elements, but its elements can be dynamic strings (e.g. map1 submatch(50),s,0). This is particularly useful where you are dealing with many expressions and thus don't know at the time of writing the code how many or how large they might be.

Note: Since the xcall interface does not pass information about the entire array, in order to use the array syntax, you must set subcnt to the number of elements in the array (see subcnt above), AND you must also set the PCREX_SUBMATCH_ARRAY (&h40000000) flag in the flags parameter. If the flag is not set, REGEX will treat your submatch(1) parameter as a single variable (as in the first syntax).

 

History

2017 March, A-Shell 6.4.1546:  remove 1024 byte limit on subject string length.

2011 March, A-Shell 5.1.1210:  Allow an array in place of individual sub-match parameters. Increase number of allowed sub-matches from 20 to 100. Revise and review documentation.

2008 April, A-Shell 5.1.1108:  Re-vamped, re-organized and re-documented.

2007 Nov, A-Shell 5.1.1100:  Add routine to A-Shell