This page is for an old version of Hexaly Optimizer. We recommend that you update your version and read the documentation for the latest stable release.

CSV module

This module implements the CSV (Comma-Separated Values) format, partly specified by RFC 4180.

Note

To use the features of this module, you have to put a special import statement at the begining of your LSP file: use csv;

Module functions

csv.parse(reader)
csv.parse(reader, options)
csv.parse(filename)
csv.parse(filename, options)
csv.parse(filename, charset)
csv.parse(filename, charset, options)

Reads the CSV file and returns a CsvContent. You can provide the filename to parse or you can directly provide the stream to read, previously opened with io.openRead(). When using a filename, you can also specify the encoding. If no encoding is provided, ISO-8859-1 is assumed.

Several options can be used to customize the behavior of the parser. These options must be specified in a map. The supported options are detailed at the end of this page.

Parameters:
  • filename (string) – Path to the CSV file to convert.

  • stream (streamreader) – Stream previously opened with io.openRead().

  • charset (charset) – Encoding used to convert bytes to characters.

  • options (map) – Optional parameters to customize the behavior of the parser.

Return type:

CsvContent

csv.deserialize(content)
csv.deserialize(content, options)

Identical to parse() but the CSV content is taken from a string instead of a file.

Parameters:
  • content (string) – CSV content to parse.

  • options (map) – Optional parameters to customize the behavior of the parser.

Return type:

CsvContent

csv.read(reader)
csv.read(reader, options)
csv.read(filename)
csv.read(filename, options)
csv.read(filename, charset)
csv.read(filename, charset, options)

Opens the CSV file and returns a CsvReader useful to read the file line by line.

Similarly to the parser method, you can provide the filename to parse or you can directly provide the stream to read, previously opened with io.openRead(). When using a filename, you can also specify the encoding. If no encoding is provided, ISO-8859-1 is assumed.

Several options can be used to customize the behavior of the parser. These options must be specified in a map. The supported options are detailed at the end of this page.

Parameters:
  • filename (string) – Path to the CSV file to convert.

  • stream (streamreader) – Stream previously opened with io.openRead().

  • charset (charset) – Encoding used to convert bytes to characters.

  • options (map) – Optional parameters to customize the behavior of the parser.

Return type:

CsvReader

csv.readString(content)
csv.readString(content, options)

Identical to read() but the CSV content is taken from a string instead of a file.

Parameters:
  • content (string) – CSV content to parse.

  • options (map) – Optional parameters to customize the behavior of the parser.

Return type:

CsvReader

Classes

class CsvContent
nbRows

Returns the number of rows.

Return type:

int

nbCols

Returns the number of columns.

Return type:

int

colNames

Returns the columns names. If no column names were found or provided, nil is returned.

Return type:

Map (array of strings) or nil

cols

Returns all columns as a map, indexed by column number. For each column, rows are indexed by number.

Return type:

Map

colsByName

Returns all columns as a map, indexed by column name. For each column, rows are indexed by number.

If no column names were found or provided, this method will throw an error.

Return type:

Map

rows

Returns all rows as a map, indexed by row number. For each row, columns are indexed by column number.

Return type:

Map

rowsByColName

Returns all rows as a map, indexed by row number. For each row, columns are indexed by column name.

If no column names were found or provided, this method will throw an error.

Return type:

Map

class CsvReader
rowNumber

Returns the number of read rows.

Return type:

int

colNames

Returns the columns names. If no column names were found or provided, nil is returned.

Return type:

Map (array of strings) or nil

nextRow()

Read the next line of the CSV file. It returns the read row indexed by column numbers, or nil if the end of file is reached.

Return type:

Map or nil

nextRowByColName()

Read the next line of the CSV file. It returns the read row indexed by column names, or nil if the end of file is reached.

If no column names were found or provided, this method will throw an error.

Return type:

Map or nil

Options summary

Global options

The following options apply to the entire CSV file.

Option name

Type

Default value

Description

delimiter

string (length 1)

nil

Character used to delimit columns. When the value is nil, the parser will automatically guess the most likely delimiter among {,, ;, \t, |}.

decimal

string (length 1)

.

Character to recognize as decimal point. If the column delimiter is ; (guessed or defined) and this parameter is not overrided, the character used will be ,.

quote

string (length 1)

"

Character used to denote the start and end of a quoted item. If the quoted items include column and/or row delimiters, they will be added to the string and their delimiter meaning will not be applied.

escape

string (length 1)

"

Character used to escape other characters.

headerRow

int

0

Row number used as the column names, and the start of the data. If column names are specified in the columnOptions, the names found in this row will be overrided. If a negative number is set, no column headers will be parsed from the file and only the names specified in the columnOptions will be used (if present).

skipLines

map (array of ints)

nil

Line numbers to skip (0-indexed).

skipEmptyLines

bool

true

True to ignore the empty lines between the records, false to translate empty lines to empty records. An empty line is a blank line with no fields or a line with only empty fields. Note that if the useDefaultEmpty policy is activated on at least one column, the fields will not be considered empty and thus the line will not be ignored.

trimWhitespace

bool

false

Trim leading and trailing spaces for each string field that is not between quotes.

longLinePolicy

string

ignoreCols

Specifies what to do upon encountering a line with too many fields. Allowed values are :

  • ignoreCols: ignore columns;

  • addMissingCols: add new columns with nil values to the other rows;

  • skipLine: skip all line;

  • throwError: throw an error.

shortLinePolicy

string

fillMissingCols

Specifies what to do upon encountering a line with too few fields. Allowed values are :

  • fillMissingCols: fill missing columns with values matching the emptyValuePolicy of the column;

  • skipLine: skip all line;

  • throwError: throw an error.

nanValues

map (array of strings)

Values to consider as nan. Default is {“#N/A”, “#N/A N/A”, “#NA”, “-NaN”, “-nan”, “<NA>”, “N/A”, “NA”, “NaN”, “n/a”, “nan”}.

infValues

map (array of strings)

Values to consider as inf. Default is {“inf”, “Inf”}.

trueValues

map (array of strings)

Values to consider as true. Default is {“true”, “True”, “1”}.

falseValues

map (array of strings)

Values to consider as false. Default is {“false”, “False”, “0”}.

nilValues

map (array of strings)

Values to consider as nil. Default is {“nil”, “null”, “NULL”}.

columnOptions

map

nil

Options for each column, indexed by column number (see below).

internStrings

bool

false

Tells the parser to reuse the same strings rather than creating new duplicated ones. This option reduces the memory consumption of large CSVs but slightly decreases the parsing speed.

For the parser to work properly, the options must meet the following:

  • The characters used in delimiter, decimal and quote must all be different. In addition, they must not represent a line break. The line breaks supported are LF (\n) and CRLF (\r\n).

  • If a string is present in one of the arrays representing the values true, false, nan, inf or nil, then it cannot be present in another of these arrays.

  • The column names must be all different and of type string. If nil is found in the header row, the column name will be created as Unnamed: {column_index}.

Column options

The following options are applied per column, and must be specified with the global option columnOptions.

Option name

Type

Default value

Description

name

string

nil

Column name to use. When the value is nil, the value is automatically parsed from the header row.

type

string

nil

Type of values expected in the column. Allowed values are “bool”, “int”, “float” and “string”. If the type is nil, the parser will automatically guess the type according to the parsed value.

errorValuePolicy

string

setNil

Specifies what to do upon encountering a value that cannot be parsed in the specified type. Allowed values are :

  • useDefaultError: use the value specified in the defaultErrorValue parameter;

  • setNil: set the value as nil;

  • throwError: throw an error.

Note that this parameter will have no effect if the type of the column is not specified, or if the type is “string”.

defaultErrorValue

Value to be used when the policy useDefaultError is activated and an error is encountered. The type of the default value must be the same as the column type. If no overloaded, the following values will be used according to the column type:

  • double: NaN;

  • int: 0;

  • bool: false.

Note that this parameter will have no effect if the type of the column is not specified, or if the type is “string”.

emptyValuePolicy

string

setNil

Specifies what to do upon encountering an empty value without quotes. Allowed values are :

  • useDefaultEmpty: use the value specified in the defaultEmptyValue parameter;

  • setNil: set the value as nil;

  • throwError: throw an error.

defaultEmptyValue

Value to be used when the policy useDefaultEmpty is activated and an empty value is encountered. The type of the default value must be the same as the column type. If no overloaded, the following values will be used according to the column type:

  • double: NaN;

  • int: 0;

  • bool: false;

  • string: an empty string (“”);

  • type not specified: nil.