# Chapter 2 Lexical Structure

(July 13, 2021)

This chapter describes several of the basic building blocks of Modelica such as characters and lexical units including identifiers and literals. Without question, the smallest building blocks in Modelica are single characters belonging to a character set. Characters are combined to form lexical units, also called tokens. These tokens are detected by the lexical analysis part of the Modelica translator. Examples of tokens are literal constants, identifiers, and operators. Comments are not really lexical units since they are eventually discarded. On the other hand, comments are detected by the lexical analyzer before being thrown away.

The information presented here is derived from the more formal specification in appendix A.

## 2.1 Character Set

The character set of the Modelica language is Unicode, but restricted to the Unicode characters corresponding to 7-bit ASCII characters in several places; for details see section A.1.

There are two kinds of comments in Modelica which are not lexical units in the language and therefore are treated as white-space by a Modelica translator. The white-space characters are space, tabulator, and line separators (carriage return and line feed); and white-space cannot occur inside tokens, e.g., <= must be written as two characters without space or comments between them. The following comment variants are available:

// comment & Characters from // to the end of the line are ignored.
/* comment */ & Characters between /* and */ are ignored, including line terminators.

[The comment syntax is identical to that of C++.]

Modelica comments do not nest, i.e., /* */ cannot be embedded within /* */. The following is invalid:

/* Commented out - erroneous comment, invalid nesting of comments!
/* This is an interesting model */
model interesting
$\ldots$
end interesting;
*/

There is also a description-string, that is part of the Modelica language and therefore not ignored by the Modelica translator. Such a description-string may occur at the end of a declaration, equation, or statement or at the beginning of a class definition. For example:

model TempResistor "Temperature dependent resistor"
$\ldots$
parameter Real R "Resistance for reference temp.";
$\ldots$
end TempResistor;

## 2.3 Identifiers, Names, and Keywords

Identifiers are sequences of letters, digits, and other characters such as underscore, which are used for naming various items in the language. Certain combinations of letters are keywords represented as reserved words in the Modelica grammar and are therefore not available as identifiers.

### 2.3.1 Identifiers

Modelica identifiers, used for naming classes, variables, constants, and other items, are of two forms. The first form always starts with a letter or underscore (‘_’), followed by any number of letters, digits, or underscores. Case is significant, i.e., the identifiers Inductor and inductor are different. The second form (Q-IDENT) starts with a single quote, followed by a sequence of any printable ASCII character, where single-quote must be preceded by backslash, and terminated by a single quote, e.g. ’12H’, ’13\’H’, ’+foo’. Control characters in quoted identifiers have to use string escapes. The single quotes are part of the identifier, i.e., ’x’ and x are distinct identifiers. The redundant escapes (’\?’ and ’\"’) are the same as the corresponding non-escaped variants (’?’ and ’"’), but are only for use in Modelica source code. A full BNF definition of the Modelica syntax and lexical units is available in appendix A.

IDENT = NON-DIGIT { DIGIT | NON-DIGIT } | Q-IDENT
Q-IDENT = "’" { Q-CHAR | S-ESCAPE } "’"
NON-DIGIT = "_" | letters "a" $\ldots$ "z" | letters "A" $\ldots$ "Z"
DIGIT = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"
Q-CHAR = NON-DIGIT | DIGIT | "!" | "#" | "\$" | "%" | "&" | "(" | ")"
| "*" | "+" | "," | "-" | "." | "/" | ":" | ";" | "<" | ">" | "="
| "?" | "@" | "[" | "]" | "^" | "{" | "}" | "|" | "~" | " " | """
S-ESCAPE = "\’" | "\"" | "\?" | "\\"
| "\a" | "\b" | "\f" | "\n" | "\r" | "\t" | "\v"

### 2.3.2 Names

A name is an identifier with a certain interpretation or meaning. For example, a name may denote an Integer variable, a Real variable, a function, a type, etc. A name may have different meanings in different parts of the code, i.e., different scopes. The interpretation of identifiers as names is described in more detail in chapter 5. The meaning of package names is described in more detail in chapter 13.

[Example: A name: Ele.Resistor]

A component reference is an expression containing a sequence of identifiers and indices. A component reference is equivalent to the referenced object, which must be a component. A component reference is resolved (evaluated) in the scope of a class (section 4.4), or expression for the case of a local iterator variable (section 10.6.9).

[Example: A component reference: Ele.Resistor.u[21].r]

### 2.3.3 Modelica Keywords

The following Modelica keywords are reserved words and shall not be used as identifiers, except as listed in section A.1:

algorithm discrete false loop pure
and each final model record
annotation else flow not redeclare
elseif for operator replaceable
block elsewhen function or return
break encapsulated if outer stream
class end import output then
connect enumeration impure package true
connector equation in parameter type
constant expandable initial partial when
constrainedby extends inner protected while
der external input public within

## 2.4 Literal Constants

Literals (or literal constants) are unnamed constants used to build expressions, and have different forms depending on their type. Each of the predefined types in Modelica has a way of expressing unnamed constants of the corresponding type, which is presented in the ensuing subsections. Additionally, array literals and record literals can be expressed.

### 2.4.1 Floating Point Numbers

A floating point number is expressed as a decimal number in the form of a sequence of decimal digits followed by a decimal point, followed by decimal digits, followed by an exponent indicated by E or e followed by a sign and one or more decimal digits. The various parts can be omitted, see UNSIGNED-REAL in section A.1 for details and also the examples below. The minimal recommended range is that of IEEE double precision floating point numbers, for which the largest representable positive number is $1.7976931348623157\times 10^{308}$ and the smallest positive number is $2.2250738585072014\times 10^{-308}$. For example, the following are floating point number literal constants:

22.5, 3.141592653589793, 1.2E-35

The same floating point number can be represented by different literals. For example, all of the following literals denote the same number:

13., 13E0, 1.3e1, 0.13E2, .13E2

The last variant shows that that the leading zero is optional (in that case decimal digits must be present). Note that 13 is not in this list, since it is not a floating point number, but can be converted to a floating point number.

### 2.4.2 Integer Literals

Literals of type Integer are sequences of decimal digits, e.g. as in the integer numbers 33, , 100, 30030044. The range of supported Integer literals shall be at least large enough to represent the largest positive IntegerType value, see section 4.8.2.

[Negative numbers are formed by unary minus followed by an integer literal.]

### 2.4.3 Boolean Literals

The two Boolean literal values are true and false.

### 2.4.4 Strings

String literals appear between double quotes as in "between". Any character in the Modelica language character set (see section A.1 for allowed characters) apart from double quote ("") and backslash (\), including new-line, can be directly included in a string without using an escape sequence. Certain characters in string literals can be represented using escape sequences, i.e., the character is preceded by a backslash (\) within the string. Those characters are:

Character Description
\ Single quote, may also appear without backslash in string constants
\"" Double quote
\? Question-mark, may also appear without backslash in string constants
\\ Backslash itself
\a Alert (bell, code 7, ctrl-G)
\b Backspace (code 8, ctrl-H)
\f Form feed (code 12, ctrl-L)
\n Newline (code 10, ctrl-J), same as literal newline
\r Carriage return (code 13, ctrl-M)
\t Horizontal tab (code 9, ctrl-I)
\v Vertical tab (code 11, ctrl-K)

For example, a string literal containing a tab, the words: This is, double quote, space, the word: between, double quote, space, the word: us, and new-line, would appear as follows:

"\tThis is\" between\" us\n"

Concatenation of string literals in certain situations (see the Modelica grammar) is denoted by the + operator in Modelica, e.g. "a" + "b" becomes "ab". This is useful for expressing long string literals that need to be written on several lines.

The "\n" character is used to conceptually indicate the end of a line within a Modelica string. Any Modelica program that needs to recognize line endings can check for a single "\n" character to do so on any platform. It is the responsibility of a Modelica implementation to make any necessary transformations to other representations when writing to or reading from a text file.

[For example, a "\n" is written and read as-is in a Unix or Linux implementation, but written as "\r\n" pair, and converted back to "\n" when read in a Windows implementation.]

[For long string comments, e.g., the info annotation to store the documentation of a model, it would be very inconvenient, if the string concatenation operator would have to be used for every line of documentation. It is assumed that a Modelica tool supports the non-printable newline character when browsing or editing a string literal. For example, the following statement defines one string that contains (non-printable) newline characters:

assert(noEvent(length > s_small),
"The distance between the origin of frame_a and the origin of frame_b
of a LineForceWithMass component became smaller as parameter s_small
(= a small number, defined in the