Lecture 15 Data type - implicit context for operations - limit the set of operations that may be performed in a semantically valid program Type system - a mechanism to define types and associate them with certain language constructs - a set of rules for type equivalence, type compatibility, and type inference. > Type equivalence: determine whether the types of two values are the same. > Type compability: determine whether a value of a given type can be used in a given context. > Type inference: determine the type of an expression based on the types of its constituent parts or the surrounding context. Definition of types - denotational: a type is simply a set (or domain) of values. Therefore, a value has a given type if it belongs to the set. > a data type is a set of values - constructive: a type is either one of a small collection of built-in types, or composite type created by applying a type constructor to simpler types - abstraction-based: a type is an interface consisting of a set of operations with well-defined and mutually consistent semantics > a data type is a set of values, together with a set of operations on those values having certain properties. Classification of types - built-in (or primitive) types: integers, characters, Booleans, and real numbers. - enumeration types - subrange types - composite types > records (Cartesian Product) > variant records (unions) > arrays > sets > pointers > lists > files Note: In declarations of types, there are two kinds: o named types, and o anonymous types (type constructors are applied without giving them names) e.g., (in the following C code, variable a has a data type with two names; variable b's type has one name RecB; and varible c's type does not have name.) struct RecA { char x; int y; } a; typedef struct RecA RecA; typedef struct { char x; int y; } RecB; RecB b; struct { char x; int y; } c; Type checking - static C, ML - dynamic Lisp, Scheme, Smalltalk - strong ML, Ada - weak C, Scheme benefits of strongly-typed: o static type information allows compilers to allocate memory efficiently o save compilers from generating code for run-time type checking o readability o reliability -type equivalence: > structural equivalence: two types are the same if they consist of the same components, put together in the same way. e.g., in Algol-68, C, and ML. how to determine: step 1. expand each type definition recursively by replacing embeded type names with respective definitions until only built-in types step 2. compare the two expanded strings, if they are the same, then the types are equivalent e.g., type foo = record a : integer b : integer end; type bar = record b : integer a : integer end; If structural equivalence is adopted, foo and bar are not the same type. Similarly, struct RecA { char x; int y; } and struct RecB { char a; int b; } are not considered equivalent. pros: straightforward and easy to implement cons: - implementation-oriented - can not distinguish between types that programmers may think of as distinct but which happen by coincidence to have the same internal struture. e.g., type student = record name, address : string age : integer type school = record name, address : string age : integer x : student; y : school; ..... x := y; > name equivalence: each definition introduces a new type. (This is stricker than structural equivalence) e.g., in Java, Pascal, and Ada e.g., struct RecA { char x; int y; }; typedef struct RecA RecA; struct RecA a; RecA b; struct RecA c; struct { char x; int y; } d; all of the variables a, b, c, and d are structurally equivalent. a and c are name equivalent, and they are not name equivalent to b and d. b and d are not name equivalent to any other variables caveat: aliased types distinct --> strict name equivalence equivalent --> loose name equivalence > type conversion and casts: use value of one type where another type is expected e.g., x = (int) (2.3 + (double) (x /2)); - Type compatibility and coercion S is compatible with T iif one of the following is true 1) S and T are equivalent, 2) one is a subtype of the other, or 3) both are arrays with the same numbers and types of elements in each dimension. a type coercion is just an implicit conversion allowed by type compatiblity. e.g., int x = 3; ... x = 2.3 + x /2; variable x has the value 3 after the assignment, which involves two implicit conversions: double (1 -> 1.0) widening int (3.3 -> 3) narrowing - Type Inference can be checked using syntax trees e.g., (in C-like syntax) the expression a[i] + i + / \ / \ [] i / \ / \ a i Hindley-Milner type checking > assign type variables to all the names that do not have known types yet. > instantiation: once a type variable is replaced by an actual type, all instances of that variable (subject to scope rules) must be updated to that same type. > unification: e.g., E1 : 'a * int E2 : string * 'b if x then E1 else E2 By unifying what it is known about E1 and E2, we would arrive at both E1 and E2 are of type: string * int It is a kind of pattern matching. In Hindley-milner type checking, types can remain as general as possible ==> polymorphism e.g., a[i] = b[i] type checking ==> the types of a and b are narrowed to "array of 'a", but 'a is still generic Note: 1) This is parametric polymorphism. (versus overloaded functions) 2) This is implicit parametric polymorphism (versus explicit polymorphism e.g., typedef struct StackNode { ?? data; // how to make this polymorphic struct StackNode * next; } * Stack; In ML this can be done as follows datatype 'a Stack = EmptyStack |Stack pf 'a * ('a Stack); 3) subtype polymorphism ( will be discussed in OO programming in next 3 lectures)