Previous Thread
Next Thread
Print Thread
Substring search right-to-left #19161 17 Sep 11 02:27 PM
Joined: Jun 2001
Posts: 11,767
J
Jack McGregor Offline OP
Member
OP Offline
Member
J
Joined: Jun 2001
Posts: 11,767
As everyone knows, INSTR(spos,string,pattern) searches left-to-right, starting at spos, for the first occurrence of pattern in string. But how do you search for the LAST occurrence (i.e. how do you search right-to-left)?

This problem comes up frequently in directory/file processing (i.e. separating the full path into the directory and file). There is an existing SOSLIB function, Fn'Name'Ext$(path) which returns the file.exe part of the path, from which you can then, working backwards, get the directory. But in general, what's the most straightforward way to do this?

It's tempting to implement a new variation of INSTR() that supports a "find last match" option, but any such new function would introduce backward/forward compatibility issues, making it somewhat less desirable.

Perhaps a standard function should be added to the SOSFUNC: collection? Or is that redundant, given REGEX capabilities?

For example, to solve the problem listed above, of extracting the directory from the full path, you could use (assuming Windows-style backward slash directory separators)

Code
pattern$ = "(^.*\\){0,1}([^\\]+$)"
xcall REGEX,pattern$,status,path$,flags,1,match$,subcnt,dir$,file$
The REGEX call above should split a path$ (e.g. "a\b\c\file.ext") into the dir$ ("a\b\c") and the file$ ("file.ext").

The pattern is actually composed of two subgroups, one for the dir$ - "(^.*\\)" - and one for the file$ - "([^\\]+$)". The repetition option, "{0,1}", following the first subgroup indicates we can match it zero or one times. (That is, there can be a directory or no directory, but not multiple directories).

The first subgroup matches zero or more characters up to a trailing backslash, starting at the beginning of the path$. The second pattern matches one or more characters from the set including everything except backslash, and ending at the end of the string.

That accomplishes the goal of separating a path into the directory and file.ext, but takes us a bit away from the original question of how to make INSTR() return the position of the last match, not the first match.

For that, since we only care about the position (and don't have to return any submatch strings), we can simply construct a new pattern which starts with the original pattern, followed by any number of characters that don't contain the old pattern. In the simple case of a single character pattern (like the last backslash in the string), we could use:

Code
pattern$ = "\\[^\\]*$"
x = INSTR(1,path$,pattern$,0)
The pattern above starts with a backslash (escaped as \\), followed by zero or more (*) characters from the set of anything but backslash ([^\\]), followed by an end of string.

(The 4th argument, 0, is needed just to force INSTR to treat the pattern as a regular expression.)

For a multi-character pattern, we need to fix up the second part of the pattern (i.e. the part that matches a string NOT containing the pattern). I think the following works in general:

last$ = pattern$ + "(?!(.*" + pattern$ + ".*))"
INSTR(1,string$,last$,0)

The part following the original pattern$ is a lookahead expression matching any string which does not include the original pattern. As an example, if the original pattern is "abc", the new pattern would be "abc(?!(.*abc.*))" which reads as "abc" followed by a string that fails to match (?! ... ) the pattern .*abc.* (i.e. zero or more character followed by "abc" followed by zero or more characters.)

The caveat being that the pattern be a simple string (not itself a regular expression) and that any special regex characters within it be 'escaped' (preceded by backslash).

Because of those complications (adjusting the pattern and escaping it), in order to achieve the convenience of a built-in function to return the last match, you'd probably want to create a function anyway, i.e. Fn'Last'Instr(spos,string,pattern), and in that case, you may just decide that implementing the last match by repeating the normal next match logic until it fails is just as easy. (But what fun would that be?)

Re: Substring search right-to-left #19162 17 Sep 11 04:36 PM
Joined: Jun 2001
Posts: 11,767
J
Jack McGregor Offline OP
Member
OP Offline
Member
J
Joined: Jun 2001
Posts: 11,767
I posted a Fn'Instr(spos,string,pattern{flags}) here which supports the "last match" (or right-to-left searching) if you specify a negative starting position.

Example:

Fn'Instr(-1,"\vm\miame\miame.ini","") returns -10


Moderated by  Jack McGregor, Ty Griffin 

Powered by UBB.threads™ PHP Forum Software 7.7.3