Text formats#
A first choice, when printing results to the console is no longer sufficient, is to export them to some variant of text files. The advantage of text formats is that they are human-readable, non-proprietary, cross-platform and future-proof. We will discuss how Python can interact with some common text formats.
CSV#
The canonical choice is to dump all numbers with a chosen separator between, resulting in either Comma-Separated Values (CSV) or Tab-Separated Values (TSV). This format is best for flat, tabular data structures. It’s best practice to include a header as first line with the names of the columns.
A,B,C
1,2,3
Standard Library#
The csv module providers reader
and writer
objects for CSV files.
Tip
The below example uses an io.StringIO
object as proxy for a real CSV file. In a nutshell this is a string-like object that behaves like a file, but in-memory and not on the filesystem. This allows our snippets to contain the data directly inside the code, thereby making the examples self-contained and hopefully easier to grasp. Feel free to adapt the with csv_file as f:
contexts to with open("file.csv") as f:
in order to work with actual files.
import csv
import io
csv_file = io.StringIO(
"A,B,C\n"
"1,2,3\n"
"a,\"b,b\nb\",c\n"
)
with csv_file as f:
csv_reader = csv.reader(f)
# Save the first row with the headers
csv_header = next(csv_reader)
# Save all others rows as nested list
data = [row for row in csv_reader]
assert data == [["1", "2", "3"], ["a", "b,b\nb", "c"]]
# Export again as csv (with the same contents as csv_file)
with open("file.csv", "w") as f:
csv.writer(f).writerow(csv_header)
csv.writer(f).writerows(data)
Note how the second element of the second row is quoted, in order to support nested newlines and commas.
There are also dedicated DictReader
and DictWriter
methods to work with rows as dictionaries instead of lists.
import csv
data = [
{"A": 1, "B": 2, "C": 3},
{"A": 4, "B": 5, "C": 6},
]
with open("file_dict.csv", "w") as f:
# Use the dict keys of the first row as field names
csv_writer = csv.DictWriter(f, fieldnames=data[0].keys())
# Write header with field names and the data
csv_writer.writeheader()
csv_writer.writerows(data)
NumPy#
When working with scientific libraries, it’s recommended to use their helper methods to import/export text data, instead of the csv
standard library module. The numpy
library offers np.loadtxt()
as a faster reader for simple text files. In addition there’s np.genfromtxt()
with more sophisticated handling of missing values. Conversely np.savetxt()
can export data to a text file.
import numpy as np
import io
csv_file = io.StringIO(
"value_x,value_y\n"
"2.0,3.0\n"
"2.2,3.3\n"
)
data = np.loadtxt(
csv_file, # csv file to import as ndarray
delimiter=",", # use comma as separator
skiprows=1, # skip the one line with the header
)
Pandas#
The data analysis library pandas
offers read_csv()
and to_csv()
to import/export CSV files. These methods can offer a better performance and more configuration options than the numpy
equivalents. They can also readily handle and parse datetime values.
import pandas as pd
import io
csv_file = io.StringIO(
"date , value_x, value_y\n"
"1-2-2023, 2.0, 3.0\n"
"2-2-2023, 2.2, 3.3\n"
)
df = pd.read_csv(
csv_file, # csv file to import as dataframe
header=0, # treat the first csv line as header
names=["datecol", "col1", "col2"], # override header and assign custom names
skipinitialspace=True, # ignore space after comma in csv
engine="c", # use faster C engine for parsing
parse_dates=["datecol"], # parse given column as date
dayfirst=True, # dates start with day (eg DD-MM-YYYY)
compression=None, # on-the-fly decompression (zip, tar, ...)
)
JSON#
The JavaScript Object Notation (JSON) allows to encode more complicated and nested data structures. Its notation is very close to the basic data types in Python. Moreover it has become the standard data exchange format of most HTTP and REST APIs, for instance submitting measurement data to the InfluxDB time series database.
{
"firstname": "John",
"surname": "Doe",
"age": 42,
"married": false,
"children": null,
"emails": [
{
"type": "private",
"email": "johndoe@example.com"
}
]
}
The following Python snippet uses the builtin json
module to import the above JSON, modify it, and export the changes.
import json
from pprint import pprint
with open("johndoe.json") as f:
johndoe = json.load(f)
pprint(johndoe)
# {'age': 42,
# 'children': None,
# 'emails': [{'email': 'johndoe@example.com', 'type': 'private'}],
# 'firstname': 'John',
# 'married': False,
# 'surname': 'Doe'}
johndoe["married"] = True
johndoe["spouse"] = "Jane Doe"
with open("johndoe.json", "w") as f:
json.dump(johndoe, f, sort_keys=False, indent=4)
Watch out that not all data types can directly be serialized as JSON. In particular, when expanding the above example to naively include a date, the json.dump
fails with TypeError: Object of type date is not JSON serializable
. A simple fix could be to add the default=str
argument, so that str()
is applied first to non-serializable objects, yielding the “YYYY-MM-DD” string representation of the date.
import json
from datetime import datetime
johndoe = {}
johndoe["weddingdate"] = datetime.now().date()
with open("johndoe.json", "w") as f:
json.dump(johndoe, f, sort_keys=False, indent=4, default=str)
XML#
The Extensible Markup Language (XML) is a close relative to the well-known HTML of the world wide web.
<?xml version="1.0" encoding="utf-8"?>
<data>
<row>
<shape>square</shape>
<sides>4</sides>
</row>
<row>
<shape>triangle</shape>
<sides>3</sides>
</row>
<row>
<shape>circle</shape>
<sides/>
</row>
</data>
Different XML libraries exist in the Python world:
The Standard Library ships the xml library containing the
xml.etree.ElementTree
parser.Noteworthy mentions of PyPI are the very lightweight xmltodict and the full-fledged lxml
Pandas includes the
read_xml()
andto_xml()
methods to import/export dataframes to/from XML.
Others#
Python Standard Library includes the ConfigParser module to read/write Initialization (INI) config files.
Since Python 3.11 tomllib is included to parse Tom’s Obvious Minimal Language (TOML) files.
The third-party ruamel.yaml provides a powerful interface to Yet Another Markup Language (YAML) files.