INPUT CSV

Updated November 2019; recognize pipe as delimiter

INPUT CSV #CH, var1 {,var2 {,…varn}}}

INPUT CSV #ch, ary()

The second syntax option was added in A-Shell 6.5.1629 and is discussed in detail in the subtopic INPUT CSV Into Array. This topic applies in general to both variations.

INPUT CSV #CH acts much like INPUT #CH (input from a file), except that it contains additional parsing logic designed to deal with various forms of CSV (Comma Separated Values) format, which can sometimes be troublesome for the standard INPUT #CH statement.

Despite the name, CSV files sometimes use delimiters other than commas, such as tab, colon, semi-colon, or vertical bar (pipe). The INPUT CSV operation scans the first line of the file to identify the delimiter character, based on the most commonly appearing character not within pairs of quotes, from the set of: comma, tab, colon, semi-colon, and pipe. The standard INPUT #CH statement only supports comma delimiters.

Another important difference between the two input statements is that INPUT CSV is line oriented rather the field oriented. Each INPUT CSV statement will process a single line of the input file, regardless of how many variables are listed in the INPUT CSV statement or how many fields appear in the actual data. If there are more variables than data fields, the extra variables are set to null. If there are more data fields than variables, the extra data fields are ignored. (Note that line breaks which occur within quoted fields are treated as mere data characters and do not terminate the INPUT CSV operation.)

Unfortunately, there seems to be no universal standard for CSV files, particularly as it relates to the handling of quotes within fields. INPUT CSV uses various heuristics to attempt to determine whether each double-quote character is a merely a field delimiter or part of the data. For example, the quotes surrounding a field such as { "Smith, John" } would be treated as delimiters and removed from the data returned in the associated field. Without those quotes, the comma would probably be treated as a delimiter, such that the data would be considered as two fields. When a quote appears mid-field, such as { "6" bolt" }, with or without the quotes around the outside of the field, it will be treated as a raw data character; the associated return variable would receive { 6" bolt }. Double double-quotes are generally treated as single double-quotes, as in BASIC strings, so { "7' 5"" center" } would be interpreted as { 7' 5" center }.

Comments

You can disable all of the special quote handling logic by using the AS_CSVNOQUOTE flag in the ASFLAG.SBR subroutine.

There is no specific limit to the number of fields or the overall length of a line. When using the first syntax option on lines containing an extremely large number of fields and/or when using variables with very long names, you could run up against the compiler's maximum source code line length though. See "Limits and Capacities" in the Appendix.

Warning: the variables receiving the fields must each be long enough to hold the entire field as it appears in the data file, including outer quotes. Otherwise, fields which are quoted in the data may end up being returned with the leading quote character included in the returned variable, with a corresponding character truncated from the end. Dynamic string variables (s,0) are supported.

Notes on Parsing

As it stands, the CSV extension on INPUT just affects the parsing of the line into individual fields. When it comes to converting/storing each individual field into a variable, we revert to the age-old INPUT rules, which, for numeric variables, essentially consists of var = val(field$). In that context, the conversion from string format to number skips leading spaces and then stops at the first character that isn't a digit, a decimal point, or a plus or minus sign or E or e (for exponential notation). That has been the standard since way back when.

Note that the determination of the decimal point character is based on the LDF, so comma could be interpreted as the decimal point.

A-Shell's version of INPUT does however recognize one extension: you can prefix the number with "&h" or "&o" in which case it will interpret the string as hex or octal, respectively.

See Also

•   MX_CSVDELIM for a way to change the delimiter

History

2019 November, A-Shell 6.5.1671:  Recognize the pipe character "|" as a delimiter.

2018 February, A-Shell 6.5.1629:  Add second syntax line INPUT CSV #ch, ary().

2011 May, A-Shell 5.1.1215:  The auto-detection of the delimiter character has been expanded to support comma, tab, colon and semi-colon. The decision is based which of the four characters is the most common (outside of quoted fields) on the first line of the file. Previously, only comma and tab were supported, and the decision was based on which character appeared first in the file.

2009 October, A-Shell 5.1.1163:  AF_CSVNOQUOTE flag now stifles the automatic removal of the outer quotes surrounding a field.

2008 August, A-Shell 5.1.1122:  Fields enclosed in quotes are now automatically de-quoted, even when AF_CSVNOQUOTE is set. AF_CSVNOQUOTE is still useful for eliminating all other quote-matching logic in files that have non-standard quoting. Also, AF_CSVSMARTQUOTE no longer does anything. Its logic is now automatic. Previously, it only served to prevent the improper concatenation of additional input lines in order to try to match up quotes that occur in the middle of a field (i.e. don't require matches). There doesn't seem to be any value in not doing that all the time.

2008 February, A-Shell 995.0.7:  Fields with embedded line breaks are now supported. The total extended input line length is still limited to 4K.