INPUT CSV

Rewritten October 2011

INPUT CSV #CH, var1 {,var2 {,…varn}}}

INPUT CSV #CH acts much line INPUT #CH (input from a file), except that it contains additional parsing logic designed to deal with various forms of CSV (Comma Separated Values) format, which can sometimes be troublesome for the standard INPUT #CH statement. (Despite the name, CSV files sometimes use delimiters other than commas, such as tab, colon, or semi-colon. The INPUT CSV operation scans the first line of the file to identify the delimiter character, based on the most commonly appearing character not within pairs of quotes, from the set of: comma, tab, colon, and semi-colon. The standard INPUT #CH statement only supports comma delimiters.)

Another important difference between the two input statements is that INPUT CSV is line oriented rather the field oriented. Each INPUT CSV statement will process a single line of the input file, regardless of how many variables are listed in the INPUT CSV statement or how many fields appear in the actual data. If there are more variables than data fields, the extra variables are set to null. If there are more data fields than variables, the extra data fields are ignored. (Note that line breaks which occur within quoted fields are treated as mere data characters and do not terminate the INPUT CSV operation.)

Unfortunately, there seems to be no universal standard for CSV files, particularly as it relates to the handling of quotes within fields. INPUT CSV uses various heuristics to attempt to determine whether each double-quote character is a merely a field delimiter or part of the data. For example, the quotes surrounding a field such as { "Smith, John" } would be treated as delimiters and removed from the data returned in the associated field. Without those quotes, the comma would probably be treated as a delimiter, such that the data would be considered as two fields. When a quote appears mid-field, such as { "6" bolt" }, with or without the quotes around the outside of the field, it will be treated as a raw data character; the associated return variable would receive { 6" bolt }. Double double-quotes are generally treated as single double-quotes, as in BASIC strings, so { "7' 5"" center" } would be interpreted as { 7' 5" center }.

Comments

You can disable all of the special quote handling logic by using the AS_CSVNOQUOTE flag in the ASFLAG.SBR subroutine.

The maximum logical line length is 4K.

Warning: the variables receiving the fields must each be long enough to hold the entire field as it appears in the data file, including outer quotes. Otherwise, fields which are quoted in the data may end up being returned with the leading quote character included in the returned variable, with a corresponding character truncated from the end. Dynamic string variables (s,0) are supported.

History

2011 May, A-Shell 5.1.1215:  The auto-detection of the delimiter character has been expanded to support comma, tab, colon and semi-colon. The decision is based which of the four characters is the most common (outside of quoted fields) on the first line of the file. Previously, only comma and tab were supported, and the decision was based on which character appeared first in the file.

2009 October, A-Shell 5.1.1163:  AF_CSVNOQUOTE flag now stifles the automatic removal of the outer quotes surrounding a field.

2008 August, A-Shell 5.1.1122:  Fields enclosed in quotes are now automatically de-quoted, even when AF_CSVNOQUOTE is set. AF_CSVNOQUOTE is still useful for eliminating all other quote-matching logic in files that have non-standard quoting. Also, AF_CSVSMARTQUOTE no longer does anything. Its logic is now automatic. Previously, it only served to prevent the improper concatenation of additional input lines in order to try to match up quotes that occur in the middle of a field (i.e. don't require matches). There doesn't seem to be any value in not doing that all the time.

2008 February, A-Shell 995.0.7:  Fields with embedded line breaks are now supported. The total extended input line length is still limited to 4K.