Data types

Data types#

Basic data types#

Let’s briefly look at the most common data types in Python. All data is represented by objects. The type of an object, type(some_object), determines which operations are supported. For instance all sequence objects (eg lists) have a length len(some_list) defined. Another important distinction is mutability. Depending on whether the value of an object can be changed after its creation, the object is called mutable or immutable.

NoneType#

The built-in name None is used to signify the absence of a value. As an example it’s returned by any function without explicit return statement. A common mistake is to try to loop over a variable that may be empty,

data_rows = [[1, 2], None]

for row in data_rows:
    for element in row:
        print(element)

which fails with the TypeError: 'NoneType' object is not iterable error message.

Numbers#

The numeric types include int for integers, its subclass bool for Booleans, as well as float and complex for real and complex numbers respectively.

ten_thousand = 10_000     # large int with optional thousands separator
z = 1.0 + 2.0j            # complex numbers
assert type(7/2) == float # floating point division of two integers
assert int(7/2) == 3      # convert float to integer

Sequences#

Sequences represent finite ordered sets indexed by non-negative numbers. They support len() and index slicing a[i:j]. Examples for immutable sequences are str (strings), tuple and range. Conversely list (and array) are mutable sequences.

a_string = "Lorem Ipsum"
a_tuple = (1,)            # tuple with single element (defined by comma)
a_list = [1, 2, 3]

Set types#

Set types represent unordered, finite sets of unique, immutable objects. Common uses are fast membership testing, removing duplicates or set operations like intersection, union or difference. The type set is mutable, frozenset immutable.

empty_set = set()       # as {} is reserved for empty dict
a_set = set("AABC")     # set of unique characters in a string
b_set = set(["AABC"])   # set with whole string as single element

Mapping types#

Mapping types are mutable objects that map hashable keys to arbitrary objects as values. Most of Python’s immutable built-in objects are hashable. Mutable objects however are not hashable. The dict (dictionary) is the only mapping type.

a_dict = { "a": 1, "b": 2}
assert a_dict["a"] == 1    # retrieve value of key "a"
a_dict["c"]                # throws "KeyError" exception
a_dict.get("c", "3")       # returns value or the default "3"
a_dict.keys()              # returns keys in a dict_keys object
a_dict.values()            # returns values in a dict_values object

for key, value in a_dict.items():   # loop over key value tuples
    print(f"{key=} -> {value=}")

nested_dict = {}
nested_dict["first"] = {}  # nested keys need to be initialized first
nested_dict["first"]["second"] = "value"

Data structure#

In practice, the actual data structure is often some nested mixture of the above basic data types.

john = {
    'firstname': 'John',
    'surname': 'Doe',
    'age': 42,
    'emails': [{'email': 'johndoe@example.com', 'type': 'private'}],
    'married': False,
    'children': None,
}

Dataclass#

When opting for object-oriented programming in scientific computing, writing out the class definition with the initialization of all attributes involves a lot of boilerplate code. A noteworthy addition since Python 3.7+ are dataclasses, which provide a short-hand notation. The @dataclass decorator implicitly adds an __init__() method (among others) for the attributes.

from dataclasses import dataclass


@dataclass
class Person:
    name: str
    weight: float
    children: int = 0


john = Person(name="John", weight=74.0)
john.children = 1

Scientific data types#

The scientific libraries bring a variety of additional data types for numerical computing and data analysis. The details are beyond our scope and presented in lectures on scientific computing.

NumPy data types
NumPy ndarray
Pandas data structures Series and DataFrame