pathlib#
In order to access files, one must first define the path on the file system to that file, as not all files reside in the same folder as the Python script. When released in 2014, Python 3.4 introduced the pathlib module as high-level and object-oriented approach. As such it has a considerable overlap with the previously existing os.path. However, pathlib
is nowadays the recommended library to handle file system paths.
Path#
Note
The following code snippets presume that Path
has been imported.
from pathlib import Path
The most basic usage is to define a path object from a string representation.
PROJECT_DIR = Path("path/to/project")
Strictly speaking, the path separator depends on the operating system, namely /
on Linux/macOS and \
on Windows. Nevertheless the above code will work on both platforms, but return pathlib.PosixPath
and pathlib.WindowsPath
objects respectively. There are also dedicated methods to join paths without having to worry about the separator.
PROJECT_DIR = Path("path") / Path("to") / Path("project")
PROJECT_DIR = Path("path", "to", "project")
PROJECT_DIR = Path("path").joinpath("to", "project")
Tip
Use raw string literals for Windows paths.
Given that the Windows path separator \
is also used as an escape character, it’s best to use raw strings r"C:\new_folder"
to always treat the backslash sign as literal and avoid any confusion with non-printable characters like newline \n
.
Path-like objects#
Before Python 3.6, file system paths have been represented as strings. But with the addition of the object-oriented pathlib
representation, most of the Python methods were adapted to accept path-like objects as arguments, which can either be the path represented as str
or an object implementing the os.PathLike
protocol. This means that for example the open()
function now works with file.txt
as well as Path("file.txt")
, as we will shortly see when discussing file handling.
Path methods#
We briefly mention a selection of methods that path objects support.
some_path = Path.cwd() # path of current working directory
some_path.exists() # return True if path exists
some_path.is_dir() # return True if path is a directory
some_path.resolve() # resolve as absolute path
some_path.parent() # parent directory
some_path.parents() # immutable sequence of all ancestors
some_path.name() # file or directory name
some_path.suffix() # file extension (or empty string)
some_path.stem() # file name without suffix
Some yield generator objects that can be looped over.
# Loop over the contents of a given directory
for contents_path in some_path.iterdir()
print(contents_path)
# Loop over files with given extension inside subfolders
for img_path in some_path.glob("**/*.jpg"):
print(img_path)
Another common usecase is to reference the base directory of the project /path/to/project/
, by starting from the location of the executed script /path/to/project/src/cli/cli.py
.
BASE_DIR = Path(__file__).parents[2]
Also note that, since Python 3.9+, __file__
directly returns the absolute path of the script, we no longer need to add resolve()
.
Replacing os with pathlib#
The following table summarizes the correspondence between pathlib
and os
methods.
|
|
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|