Syntax and lexical analysis

Global structure

Hexaly modeling language is a strongly typed, block structured programming language. An Hexaly Modeler program is made of a sequence of functions or classes located in one or multiple HXM file(s) called “module(s)”. Each function contains a block of code consisting of a list of expressions or statements surrounded by braces {}. The modeler is case-sensitive. Each instruction is separated from others by character ;.

Encoding

The modeler is able to handle many encodings and different charsets (UTF-8, all ISO-8859 standards and many windows code pages). For the complete list, please consult the Charset module.

The default encoding for a file is ISO-8859-1 (latin1), except if it starts with a byte-order mark (BOM). In that case, Hexaly Modeler assumes that the file is encoded with UTF-8 or UTF-16 accordingly. If an unmappable character or an invalid byte sequence is encountered by the parser, an error will be thrown.

You can declare a different encoding with a special comment to put at the beginning of your file. This comment must start with #, contains the word coding followed by a colon or an equals sign followed by the name of the encoding you want. This special comment must appear on the first or the second line of the file. If it is the second line, the first line must be a shebang comment. The following pattern is an example of a valid encoding declaration:

# coding: <encoding-name>

where <encoding-name> must be a valid and recognized encoding name. For the complete list of supported encodings and their aliases, please consult the Charset module.

Comments

Comments have no impact on the execution of the program. Three kinds of comments are allowed:

  • Mono-line comments. They are prefixed with characters //. Anything between these two characters and the end of the line is ignored.

  • Multi-line comments. They start with characters /* and end when characters */ are encountered. A multi-line comment must be closed. Nesting multi-line comments are forbidden, thus /* Comment 1 /* Comment 2 */ */ is forbidden.

  • Special declaration comments. These declarations must start with # and are only allowed at the beginning of HXM files. For now, two kinds of declarations are supported:

    1. The ‘shebang’ declaration starting with #!. This declaration is only allowed on the first line of the program. Under Unix systems it allows specifying the interpreter executing the program (here Hexaly Modeler).

    2. The encoding declaration that must follow the regex pattern coding[=:]\s*[-\w]+. As detailed above, this declaration switch the encoding of the file.

Identifiers

Identifiers are used as variable, function names, class names or module names in HXM files. An identifier can only be composed of alphanumeric characters (latin letters) or underscores. It cannot start with a digit. Identifiers, as the rest of the modeler, are case-sensitive. There are described by the following lexical definition:

identifier :  ("_" | letter) ("_" | letter | digit)*
letter     :  lowercase | uppercase
lowercase  :  "a".."z"
uppercase  :  "A".."Z"
digit      :  "0".."9"

Thus,

  • identifier is different from IdeNtiFier

  • _ident is a valid identifier

  • 0ident is not a valid identifier (it starts with a digit)

  • àÀéÉùÛ is not a valid identifier (accented characters are not supported for identifiers).

  • 안녕하세요 is not a valid identifier (non-latin letters are not supported for identifiers).

  • for is not a valid identifier (reserved keyword, see below)

Keywords

Keywords are reserved words having a specific significance for the modeler. You cannot use these keywords as variable or function name. Their use is subject to syntaxic rules described later in this document. Some keywords are reserved for future use.

Keywords having a specific significance:

true        false       nil         nan         inf
function    local       return      this        use
while       do          break       continue
for         in          if          else
class       override    final       static      constructor
new         super
minimize    maximize    constraint
try         throw       catch
is          typeof      with

Changed in version 3.5: Keywords try, throw and catch added to implement exceptions.

Changed in version 5.5: use keyword added to implement modules, this keyword used to refer to current object, is and typeof keywords added to implement type introspection.

Changed in version 11.5: with keyword added.

Changed in version 12.0: pragma keyword added.

Changed in version 12.5: class, constructor, override, final, static, new, super keywords added to implement the class system.

Keywords reserved for future use:

const       var         import
goto        switch      case        object

Contextual keywords

Contextual keywords are words that in certain contexts have a particular meaning, but which the rest of the time will be considered as identifiers (usable as variable, class, module or function names).

Contextual keywords having a specific significance:

as          from        extends     pragma

Changed in version 12.5: as, from and extends introduced as contextual keywords. pragma became a contextual keyword.

Literals

Literals represent constant values of some built-in types.

String literals

A string starts with character " and ends with character ". A string can span on several lines. No limit is set on the length of the string. Unlike identifiers, any unicode character is allowed between the two quotes of the string, except backslashes and quotes which must be introduced through escape sequences (see below).

Thus,

  • “Simple literal” is valid

  • “こんにちは (hello)” is valid.

  • “안녕하세요 (hello)” is valid.

  • “string literal \ invalid” is not valid: backslash is forbidden in a string literal.

Escape sequences

Some characters can be introduced through escape sequences. Escape sequences are also the only way to write backslashes or quotes in a string. An escape sequence starts with character \ (backslash) followed by a letter or ASCII character.

The recognized escape sequences are:

Escape sequence

Associated character

\\

Backslash (\)

\'

Single quote

\"

Double quotes

\b

ASCII Backspace (U+0008)

\t

ASCII Horizontal Tabulation (U+0009)

\n

ASCII Linefeed (U+000A)

\f

ASCII Formfeed (U+000C)

\r

ASCII Carriage return (U+000D)

\uxxxx

Unicode character with 16-bit hexadecimal value

\Uxxxxxxxx

Unicode character with 32-bit hexadecimal value

If the parser encounters an unrecognized escape sequence or an invalid unicode character, it will throws an error. Thus "foo \c" will throw an error since "\c" is not recognized as a valid escape sequence. Same thing for "\uDBFF" which will throw an error since it is not a valid unicode character.

Integer literals

An integer is a sequence of 0-9 digits which does not start with 0. Only the decimal form is allowed and can be written. They are described by the following lexical definition:

integer        :  nonzerodigit digit* | "0"
digit          :  "0".."9"
nonzerodigit   :  "1".."9"

If a number written in the HXM file exceeds the allowed capacity, an error will be thrown when parsing this number. Note that integer literals do not include a sign. Thus, -42 is actually an expression composed of the unary operator - and the integer literal 42.

Thus,

  • 1234 is a valid integer literal

  • 01234 is not a valid integer literal (it starts with 0)

  • 100000000000000000000000 is not a valid integer literal (exceeds allowed capacity)

Floating point literals

Hexaly Modeler handles double precision floating point numbers with point notation (e.g. 3.467) or exponential notation (e.g. 8.75e-11). They are described by the following lexical definition:

float          :  pointfloat | expfloat | "inf" | "nan"
pointfloat     :  digit* "." digit+
expfloat       :  (digit+ | pointfloat) "e" ["+" | "-"] digit+
digit          :  "0".."9"

The literal inf denotes the infinity. The literal nan denotes the special floating value “not a number” (NaN) representing an undefined or unrepresentable value (see IEEE 754 floating-point standard for more explanations on this). Note that floating point literals do not include a sign: -42.45 is actually an expression composed of the unary operator ‘-’ and the floating point literal 42.45.

Thus,

  • 12.45 is a valid floating point literal

  • .4522 is a valid floating point literal

  • 4566e-12 is a valid floating point literal

  • .e-45 is not a valid floating point literal

White spaces

White spaces and line breaks have no particular meaning in the modeler. They are merely ignored. Nonetheless, a white character is necessary to split two keywords, two identifiers or two literals.