pathlib#

In order to access files, one must first define the path on the file system to that file, as not all files reside in the same folder as the Python script. When released in 2014, Python 3.4 introduced the pathlib module as high-level and object-oriented approach. As such it has a considerable overlap with the previously existing os.path. However, pathlib is nowadays the recommended library to handle file system paths.

Path#

Note

The following code snippets presume that Path has been imported.

from pathlib import Path

The most basic usage is to define a path object from a string representation.

PROJECT_DIR = Path("path/to/project")

Strictly speaking, the path separator depends on the operating system, namely / on Linux/macOS and \ on Windows. Nevertheless the above code will work on both platforms, but return pathlib.PosixPath and pathlib.WindowsPath objects respectively. There are also dedicated methods to join paths without having to worry about the separator.

PROJECT_DIR = Path("path") / Path("to") / Path("project")
PROJECT_DIR = Path("path", "to", "project")
PROJECT_DIR = Path("path").joinpath("to", "project")

Tip

Use raw string literals for Windows paths.

Given that the Windows path separator \ is also used as an escape character, it’s best to use raw strings r"C:\new_folder" to always treat the backslash sign as literal and avoid any confusion with non-printable characters like newline \n.

Path-like objects#

Before Python 3.6, file system paths have been represented as strings. But with the addition of the object-oriented pathlib representation, most of the Python methods were adapted to accept path-like objects as arguments, which can either be the path represented as str or an object implementing the os.PathLike protocol. This means that for example the open() function now works with file.txt as well as Path("file.txt"), as we will shortly see when discussing file handling.

Path methods#

We briefly mention a selection of methods that path objects support.

some_path = Path.cwd()      # path of current working directory

some_path.exists()          # return True if path exists
some_path.is_dir()          # return True if path is a directory
some_path.resolve()         # resolve as absolute path
some_path.parent()          # parent directory
some_path.parents()         # immutable sequence of all ancestors
some_path.name()            # file or directory name
some_path.suffix()          # file extension (or empty string)
some_path.stem()            # file name without suffix

Some yield generator objects that can be looped over.

# Loop over the contents of a given directory
for contents_path in some_path.iterdir()
    print(contents_path)

# Loop over files with given extension inside subfolders
for img_path in some_path.glob("**/*.jpg"):
    print(img_path)

Another common usecase is to reference the base directory of the project /path/to/project/, by starting from the location of the executed script /path/to/project/src/cli/cli.py.

BASE_DIR = Path(__file__).parents[2]

Also note that, since Python 3.9+, __file__ directly returns the absolute path of the script, we no longer need to add resolve().

Replacing os with pathlib#

The following table summarizes the correspondence between pathlib and os methods.

pathlib

os

Path.cwd()

os.getcwd()

Path.readlink()

os.readlink(path)

Path.unlink()

os.remove(path)

Path.resolve()

os.realpath(path)

Path.stat()

os.stat(path)

Path.home()

os.path.expanduser("~")

Path.chmod(mode)

os.chmod(path, mode)

Path.mkdir()

os.mkdir(path)

path1.join(path2)

os.path.join(path1, path2)

path1 / path2

os.path.join(path1, path2)