Developer Guide#
This document describes the internal architecture and the main concepts behind
PyScaffold. It assumes the reader has some experience in using PyScaffold
(specially its command line interface, putup
) and some familiarity with Python’s
package ecosystem.
Please notice this document does not target PyScaffold’s users, instead it provides internal documentation for those who are involved in PyScaffold’s development.
Architecture#
As indicated in the figure below, PyScaffold can be divided in two main execution blocks: a pure Python API and the command line interface wrapping it as an executable program that runs on the shell.
The CLI is responsible for defining all arguments putup
accepts and parsing the user input accordingly. The result is a dict
that contains options expressing the user preference and can be fed
into PyScaffold’s main API, create_project
.
This function is responsible for combining the provided options dict
with pre-existing project configurations that might be available in the project
directory (the setup.cfg
file, if present) and globally defined default
values (via PyScaffold’s own configuration file).
It will then create an (initially empty) in-memory representation of the
project structure and run PyScaffold’s action pipeline, which in turn will
(between other tasks) write customized versions of PyScaffold’s templates to
the disk as project files, according to the combined scaffold options.
The project representation and the action pipeline are two key concepts in PyScaffold’s architecture and are described in detail in the following sections.
Project Structure Representation#
Each Python package project is internally represented by PyScaffold as a tree
data structure, that directly relates to a directory entry in the file system.
This tree is implemented as a simple (and possibly nested) dict
in which
keys indicate the path where files will be generated, while values indicate
their content. For instance, the following dict:
{
"folder": {
"file.txt": "Hello World!",
"another-folder": {
"empty-file.txt": ""
}
}
}
represents a project directory in the file system that contains a single
directory named folder
. In turn, folder
contains two entries.
The first entry is a file named file.txt
with content Hello World!
while the second entry is a sub-directory named another-folder
. Finally,
another-folder
contains an empty file named empty-file.txt
.
Note
Changed in version 4.0: Prior to version 4.0, the project structure included the top level directory of the project. Now it considers everything under the project folder.
Additionally, tuple values are also allowed in order to specify a
file operation (or simply file op) that will be used to produce the file.
In this case, the first element of the tuple is the file content, while the
second element will be a function (or more generally a callable
object)
responsible for writing that content to the disk. For example, the dict:
from pyscaffold.operations import create
{
"src": {
"namespace": {
"module.py": ('print("Hello World!")', create)
}
}
}
represents a src/namespace/module.py
file, under the project directory,
with content print("Hello World!")
, that will written to the disk.
When no operation is specified (i.e. when using a simple string instead of a
tuple), PyScaffold will assume create
by default.
Note
The create
function simply creates a text file
to the disk using UTF-8 encoding and the default file permissions. This
behaviour can be modified by wrapping create
within other functions/callables, for example:
from pyscaffold.operations import create, no_overwrite
{"file": ("content", no_overwrite(create))}
will prevent the file
to be written if it already exists. See
pyscaffold.operations
for more information on how to write your own
file operation and other options.
Finally, while it is simple to represent file contents as a string directly,
most of the times we want to customize them according to the project
parameters being created (e.g. package or author’s name). So PyScaffold also
accepts string.Template
objects and functions (with a single dict
argument and a str
return value) to be used as contents. These templates
and functions will be called with PyScaffold's options
when its time to create the file to the
disk.
Note
string.Template
objects will have safe_substitute
called (not simply substitute
).
This tree representation is often referred in this document as project structure or simply structure.
Action Pipeline#
PyScaffold organizes the generation of a project into a series of steps with
well defined purposes. As shown in the figure below,
each step is called action and is implemented as a
simple function that receives two arguments: a project structure and a dict
with options (some of them parsed from command line arguments, other from
default values).
An action MUST return a tuple also composed by a project structure and a
dict
with options. The return values, thus, are usually modified versions
of the input arguments. Additionally an action can also have side effects, like
creating directories or adding files to version control. The following
pseudo-code illustrates a basic action:
def action(project_structure, options):
new_struct, new_opts = modify(project_structure, options)
some_side_effect()
return new_struct, new_opts
The output of each action is used as the input of the subsequent action,
forming a pipeline. Initially the structure argument is just an empty dict
.
Each action is uniquely identified by a string in the format
<module name>:<function name>
, similarly to the convention used for a
setuptools entry point.
For example, if an action is defined in the action
function of the
extras.py
file that is part of the pyscaffoldext.contrib
project,
the action identifier is pyscaffoldext.contrib.extras:action
.
By default, the sequence of actions taken by PyScaffold is:
(as given by pyscaffold.actions.DEFAULT
)
The project structure is usually empty until define_structure
This action just loads the in-memory dict
representation, that is only written
to disk by the create_structure
action.
Note that, this sequence varies according to the command line options.
To retrieve an updated list, please use putup --list-actions
or
putup --dry-run
.
Extensions#
Extensions are a mechanism provided by PyScaffold to modify its action pipeline
at runtime and the preferred way of adding new functionality.
There are built-in extensions (e.g. pyscaffold.extensions.cirrus
)
and external extensions (e.g. pyscaffoldext-dsproject), but both types
of extensions work exactly in the same way.
This division is purely based on the fact that some of PyScaffold features are
implemented as extensions that ship by default with the pyscaffold
package,
while other require the user to install additional Python packages.
Extensions are required to add at least one CLI argument that allow the users
to opt-in for their behaviour. When putup
runs, PyScaffold’s will
dynamically discover installed extensions via setuptools entry points and
add their defined arguments to the main CLI parser. Once activated, a
extension can use the helper functions defined in pyscaffold.actions
to
manipulate PyScaffold’s action pipeline and therefore the project structure.
For more details on extensions, please consult our Extending PyScaffold guide.
Code base Organization#
PyScaffold is organized in a series of internal Python modules, the main ones being:
api
: top level functions for accessing PyScaffold functionality, by combining together the other modulescli
: wrapper around the API to create a command line executable programactions
: default action pipeline and helper functions for manipulating itstructure
: functions specialized in defining the in-memory project structure representation and in taking this representation and creating it as part of the file system.update
: steps required for updating projects generated with old versions of PyScaffoldextensions
: main extension mechanism and subpackages corresponding to the built-in extensions
Additionally, a series of internal auxiliary libraries is defined in:
dependencies
: processing and manipulating of package dependencies and requirementsexceptions
: custom PyScaffold exceptions and exception handlersfile_system
: wrappers around file system functions that make them easy to be used from PyScaffold.identification
: creating and processing of project/package/function names and other general identifiersinfo
: general information about the system, user and package being generatedlog
: custom logging infrastructure for PyScaffold, specialized in its verbose executionoperations
: file operations that can be embedded in the in-memory project structure representationrepo
: wrapper around thegit
commandshell
: helper functions for working with external programstermui
: basic support for ANSI code formattingtoml
: thin adapter layer around third-party TOML parsing libraries, focused in API stability
For more details about each module and its functions and classes, please consult our module reference.
When contributing to PyScaffold, please try to maintain this overall
project organization by respecting each module’s own purpose.
Moreover, when introducing new files or renaming existing ones, please
try to use meaningful naming and avoid terms that are too generic, e.g.
utils.py
(when in doubt, Peter Hilton has a great article about naming
smells and a nice presentation aboug how to name things).