Data types#
Basic data types#
Let’s briefly look at the most common data types in Python. All data is represented by objects. The type of an object, type(some_object)
, determines which operations are supported. For instance all sequence objects (eg lists) have a length len(some_list)
defined. Another important distinction is mutability. Depending on whether the value of an object can be changed after its creation, the object is called mutable or immutable.
See also
The Python documentation contains a section on the data model providing a good overview of the data types and their hierarchy.
NoneType#
The built-in name None
is used to signify the absence of a value. As an example it’s returned by any function without explicit return
statement. A common mistake is to try to loop over a variable that may be empty,
data_rows = [[1, 2], None]
for row in data_rows:
for element in row:
print(element)
which fails with the TypeError: 'NoneType' object is not iterable
error message.
Numbers#
The numeric types include int
for integers, its subclass bool
for Booleans, as well as float
and complex
for real and complex numbers respectively.
ten_thousand = 10_000 # large int with optional thousands separator
z = 1.0 + 2.0j # complex numbers
assert type(7/2) == float # floating point division of two integers
assert int(7/2) == 3 # convert float to integer
Sequences#
Sequences represent finite ordered sets indexed by non-negative numbers. They support len()
and index slicing a[i:j]
. Examples for immutable sequences are str
(strings), tuple
and range
. Conversely list
(and array
) are mutable sequences.
a_string = "Lorem Ipsum"
a_tuple = (1,) # tuple with single element (defined by comma)
a_list = [1, 2, 3]
Set types#
Set types represent unordered, finite sets of unique, immutable objects. Common uses are fast membership testing, removing duplicates or set operations like intersection, union or difference. The type set
is mutable, frozenset
immutable.
empty_set = set() # as {} is reserved for empty dict
a_set = set("AABC") # set of unique characters in a string
b_set = set(["AABC"]) # set with whole string as single element
Mapping types#
Mapping types are mutable objects that map hashable keys to arbitrary objects as values. Most of Python’s immutable built-in objects are hashable. Mutable objects however are not hashable. The dict
(dictionary) is the only mapping type.
a_dict = { "a": 1, "b": 2}
assert a_dict["a"] == 1 # retrieve value of key "a"
a_dict["c"] # throws "KeyError" exception
a_dict.get("c", "3") # returns value or the default "3"
a_dict.keys() # returns keys in a dict_keys object
a_dict.values() # returns values in a dict_values object
for key, value in a_dict.items(): # loop over key value tuples
print(f"{key=} -> {value=}")
nested_dict = {}
nested_dict["first"] = {} # nested keys need to be initialized first
nested_dict["first"]["second"] = "value"
Data structure#
In practice, the actual data structure is often some nested mixture of the above basic data types.
john = {
'firstname': 'John',
'surname': 'Doe',
'age': 42,
'emails': [{'email': 'johndoe@example.com', 'type': 'private'}],
'married': False,
'children': None,
}
Dataclass#
When opting for object-oriented programming in scientific computing, writing out the class definition with the initialization of all attributes involves a lot of boilerplate code. A noteworthy addition since Python 3.7+ are dataclasses, which provide a short-hand notation. The @dataclass
decorator implicitly adds an __init__()
method (among others) for the attributes.
from dataclasses import dataclass
@dataclass
class Person:
name: str
weight: float
children: int = 0
john = Person(name="John", weight=74.0)
john.children = 1
Scientific data types#
The scientific libraries bring a variety of additional data types for numerical computing and data analysis. The details are beyond our scope and presented in lectures on scientific computing.
Pandas data structures
Series
andDataFrame