STRTOK

Updated April 2010

xcall STRTOK, opcode, string, fdelim, fld1 {, rdelim, fld2, ...fldn}

STRTOK.SBR is both convenient and very efficient (compared to the equivalent logic in BASIC) for parsing string data, but may take a little practice to get used to. See the following topics for a couple of real examples of using it.

STRTOK.SBR parses the string variable according to the delimiters specified, and returns one or more fields in the fld1...fldn variables. You can call it once for each field (and change the field delimiter characters as you go), or have it return several parsed fields in one call, ending either when it hits a delimiter in the rdelim field or runs out of string or of fldx parameters.

Two examples, Almost Comma Delimited and Large Packets, Multiple Delimiters are provided.

Parameters

opcode  (Num)  [in/out]

should be set to 0 for the initial call, and 1 for subsequent calls using the original string. (It will be updated automatically from 0 to 1 to make this easy.) See History, below, for information on the +2 and +4 values.

string  (String)  [in/out]

is the string to be parsed. Must be null terminated! string will be modified by the routine, which will replace the delimiter characters with null bytes.

fdelim  (String)  [in]

is a list of one or more field delimiter characters.

rdelim  (String)  [in/out]

is a list of one or more "record" delimiter characters.

(The subroutine will terminate when it hits one of these, whereas it will keep going after each field delimiter until all the FLDx parameters are used up.) Parameter is irrelevant if you just want to get one field. Even in the case of multiple fields, you can set it to "". Note, however, that if specified, it will be returned updated with the actual last delimiter character processed (i.e. the one that terminated the xcall). This can be very useful when there are more than one possible delimiter, or when you want to determine whether you got a complete record or just ran out of FLDx parameters. If you do not care about record delimiters, then specify it as a literal "" (in which case it cannot be updated) or remember to clear it prior to each XCALL STRTOK. (Otherwise, it will get updated to match the field delimiter. The XCALL should be smart enough to ignore record delimiters that are also field delimiters, but it could nonetheless lead to confusion.)

Note that prior to Build 905.3, rdelim was assumed to be 2 or more bytes and the second byte was getting cleared, which would have clobbered the next variable if rdelim only mapped as one byte.

fld1...fldn  (String)  [out]

will return the parsed fields or tokens from string, according to the specified delimiters.

 

History

2009 October, A-Shell 5.1.1163:  opcode flag +4 enables "CSV mode", which attempts to parse the source string as if it were comma separated (CSV) or Tab separated fields. This mode is similar to the existing quoted mode (+2), but with following differences:

  While +2 mode recognizes quoted sub-fields anywhere within a field, +4 (CSV) mode only considers quotes as possible delimiters if the first non-blank character in the field is a quote. Any quotes which occur mid-field are treated as data characters. A subsequent quote will only be considered as the trailing quote of the field if it is followed by a field or record delimiter.

  Leading and trailing spaces are removed from the returned field

  Pairs of adjacent double-quote characters ("") are coalesced into single ones, i.e. 6"" Banana becomes 6" Banana.

2006 November, A-Shell 4.9.971:  The opcode parameter now supports a +2 flag to cause it to recognize quoted arguments. For example, consider the following string:

"DSK0:A.B[111,222]",ARG1,ARG2

If the field delimiter is a comma, normally STRTOK would parse the arguments like this:

"DSK0:A.B[111

222]"

ARG1

ARG2

 

By adding +2 to the opcode parameter, it will parse it like this:

DSK0:A.B[111,222]

ARG1

ARG2

 

Note that it removes the quotes.

2003 November, A-Shell 4.9.854:  routine added to A-Shell