Feb 12th, 2021 - written by Kimserey with .
Last week we looked at how compilers worked in general. We saw that they were mostly composed of two parts, the front end and back end. The front end being the compiler from programming source code to intermediate representation and the back end being the runtime. In today’s post we will look specifically into how Python gets compiled and interpreted with the default implementation, CPython.
In this post, we assume that we are using CPython, installed and available in our bin
whether on computer or virtual environment. For example for my machine:
1
2
3
4
5
❯ where python3
/Library/Frameworks/Python.framework/Versions/3.8/bin/python3
❯ file /Library/Frameworks/Python.framework/Versions/3.8/bin/python3
/Library/Frameworks/Python.framework/Versions/3.8/bin/python3: Mach-O 64-bit executable x86_64
We can see that python3
is a Mach-O 64-bit executable x86_64 file which is the executable format for MacOS.
On a Ubuntu machine I would see the following:
1
2
❯ file ./python3.8
./python3.8: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=02526282ea6c4d6eec743ad74a1eeefd035346a3, for GNU/Linux 3.2.0, stripped
When you download Python from the official site https://www.python.org/downloads/, you would get the CPython implementation. An easy way to verify that is to fire up the interpreter and run the following:
1
2
3
4
5
6
7
❯ python3
Python 3.8.4 (v3.8.4:dfa645a65e, Jul 13 2020, 10:45:06)
[Clang 6.0 (clang-600.0.57)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import platform
>>> platform.python_implementation()
'CPython'
We can see that platform.python_implementation()
returns CPython
which is the default implementation of Python. CPython is an implementation of Python written in C. The executable created from the compilation of CPython contains a compiler, which compiles the programming code to Python Bytecode, and an interpreter which interprets the Python Bytecode and executes machine code intructions based on the resulting interpretation.
CPython can be downlaoded from the official GitHub repository and if we build it locally, we’ll get python.exe
which would be the exact same executable installed from the official documentation (minus the version difference).
In order to run a script, we do:
1
❯ python3 my_file.py
or to run a module as script, we do:
1
❯ python3 -m my_module
Python will then first compile to Bytecode, then interpret the Bytecode in order to execute the commands requested.
Now that we understand what is python3
executable, we can look at what the compilation and interpretation steps are.
The first compilation step converts Python programming source code into Python Bytecode. The Python Bytecode is then stored in a file with .pyc
extension. Those files can be found under __pycache__/
folders. The Python Bytecode is the machine code understood by CPython VM - or CPython interpreter.
When we do python3 -m my_module
, if the Bytecode was already generated and the files haven’t changed, the compilation step is skipped and the interpretation step start right away. Usually compilation and interpretation are done in one time but we can also force the compilation separately using compileall
:
1
❯ python3 -m compileall .
Python Bytecode is a special language understood my the machine, but we also have access to a human readable version with dis
module which stands for dissassembler
. A dissassembler, it is a program that converts machine code to assembly language - as opposed to an assembler, which converts to machine code. The assembly language in question here is a human readable form of the Python Bytecode. For example, we can create a function:
1
2
3
>>> def say_hello():
... print("Hello World")
...
And we can then use dis
to look into the generated Python Bytecode:
1
2
3
4
5
6
7
8
>>> from dis import dis
>>> dis(say_hello)
2 0 LOAD_GLOBAL 0 (print)
2 LOAD_CONST 1 ('Hello World')
4 CALL_FUNCTION 1
6 POP_TOP
8 LOAD_CONST 0 (None)
10 RETURN_VALUE
Each line correspond of an opcode
which will be understood by the CPython VM.
The code used to compile the Python code is present in CPython under cpython/Python/compile.c
. It contains the compiler and assembler.
Following the generation of the Python Bytecode, it is then fed into the VM which runs it through the eval function. The evaluation can be found under cpython/Python/ceval.c
which would contain the evaluation loop, taking care of frames, setting up the environment with variables, and interpreting the opcodes with adequate calls to methods.
Python is known for being “Batteries included”, referring to product being shipped with their batteries making them usable straight away, shipping with a large set of modules precompiled and available to be used. This set of modules is known as the Python Standard Library, and its full implementation can be found under cpython/Lib
.
Extra modules can be installed with pip install
when we create application, for regular Python modules, at runtime, their respective Bytecode are imported into the process by the interpreter when they are import
‘d. In the case of Python module having C implementation, the packages ship with .so
files which are compiled shared libraries that are dynamically linked at runtime.
And that concludes how Python code gets compiled and ran!
CPython is the default compiler and interpreter used for Python. It is used to run Python scripts or modules. In this post we saw a general picture of what happens when we run a Python script. We saw that we use CPython executable, compile into Python Bytecode which then get interpreted by the CPython VM. Hope you like this post and I see you on the next one!