Packaging a python application into a single binary

2024-01-15 19:23

I've been working on a new CLI tool for Corgea and I was curious if we could write it in Python and still distribute it easily.

If you've ever used the awscli along side another Python application you know that packaging Python code up for ease of installation and use is not always the simplest thing. The python ecosystem is not without its warts. I've spent countless hours helping coworkers try to debug their Python environment over the years. Despite its unfriendliness at times, it's still one of the most popular languages right now.

Can we package your application into a single binary that doesn't depend on any external environment?

Yes!

Sort of... mostly... with some caveats.

This is no new problem to the Python ecosystem and there are various projects that will help you accomplish this such as py2app, pyinstaller, and nutika.

For this example, I chose Nutika because it seems well-maintained and pretty popular.

Here is a simple python application that calls a URL, parses JSON and spits the output.

main.py
import lib.api as api

print("This is a binary.")

data = api.get_data()
print(data[0]["id"])

lib/api.py
import requests

URL = "https://api.github.com/users/abronte/repos"


def get_data():
    resp = requests.get(URL)
    return resp.json()

Simple right? Let's try to compile it with:
python -m nuitka --include-package=requests --onefile --standalone main.py

Nuitka-Options:INFO: Used command line options: --include-package=requests --onefile --standalone main.py
Nuitka:INFO: Starting Python compilation with Nuitka '1.9.7' on Python '3.11' commercial grade 'not installed'.
Nuitka-Plugins:INFO: anti-bloat: Not including '_bisect' automatically in order to avoid bloat, but this may cause: may slow down by using fallback implementation.
Nuitka-Plugins:INFO: anti-bloat: Not including '_json' automatically in order to avoid bloat, but this may cause: may slow down by using fallback implementation.
Nuitka:INFO: Completed Python level compilation and optimization.
Nuitka:INFO: Generating source code for C backend compiler.
Nuitka:INFO: Running data composer tool for optimal constant value handling.
Nuitka-DataComposer:WARNING: Problem with constant file 'module.urllib3.response.const'.
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/usr/local/lib/python3.11/site-packages/nuitka/tools/data_composer/__main__.py", line 40, in <module>
    main()
  File "/usr/local/lib/python3.11/site-packages/nuitka/tools/data_composer/DataComposer.py", line 422, in main
    with open(fullpath, "rb") as const_file:
         ^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: 'main.build/module.urllib3.response.const'
FATAL: Error executing data composer, please report the above exception.

Uh oh, looks like that didn't work. Nutika doesn't seem to play nice with urllib3. Let's try with only native python modules (with the help of chatgpt here).

import json
import http.client

URL = "https://api.github.com/users/abronte/repos"

def get_data():
    conn = http.client.HTTPSConnection("api.github.com")

    # Create the request header
    headers = {"User-Agent": "Python HTTP Client"}

    # Send a GET request
    resource = f"/users/abronte/repos"
    conn.request("GET", resource, headers=headers)

    # Get the response
    response = conn.getresponse()

    # Check if the response status is OK
    if response.status == 200:
        # Read and decode the response
        data = response.read().decode()
        # Convert the JSON data to a Python object
        repos = json.loads(data)
        return repos
    else:
        return f"Error: {response.status}"

Now compile with:
python -m nuitka --onefile --standalone main.py

Yay it worked!
Nuitka-Options:INFO: Used command line options: --onefile --standalone main.py
Nuitka:INFO: Starting Python compilation with Nuitka '1.9.7' on Python '3.11' commercial grade 'not installed'.
Nuitka-Plugins:INFO: anti-bloat: Not including '_bisect' automatically in order to avoid bloat, but this may cause: may slow down by using fallback implementation.
Nuitka-Plugins:INFO: anti-bloat: Not including '_json' automatically in order to avoid bloat, but this may cause: may slow down by using fallback implementation.
Nuitka:INFO: Completed Python level compilation and optimization.
Nuitka:INFO: Generating source code for C backend compiler.
Nuitka:INFO: Running data composer tool for optimal constant value handling.
Nuitka:INFO: Running C compilation via Scons.
Nuitka-Scons:INFO: Backend C compiler: gcc (gcc 12).
Nuitka-Scons:INFO: Backend linking program with 9 files (no progress information available for this stage).
Nuitka-Scons:WARNING: You are not using ccache, re-compilation of identical code will be slower than necessary. Use your OS package manager to install it.
Nuitka-Postprocessing:INFO: Creating single file from dist folder, this may take a while.
Nuitka-Onefile:INFO: Running bootstrap binary compilation via Scons.
Nuitka-Scons:INFO: Onefile C compiler: gcc (gcc 12).
Nuitka-Scons:INFO: Onefile linking program with 1 files (no progress information available for this stage).
Nuitka-Scons:WARNING: You are not using ccache, re-compilation of identical code will be slower than necessary. Use your OS package manager to install it.
Nuitka-Onefile:INFO: Using compression for onefile payload.
Nuitka-Onefile:INFO: Onefile payload compression ratio (26.41%) size 43454699 to 11477256.
Nuitka-Onefile:INFO: Keeping onefile build directory 'main.onefile-build'.
Nuitka:INFO: Keeping dist folder 'main.dist' for inspection, no need to use it.
Nuitka:INFO: Keeping build directory 'main.build'.
Nuitka:INFO: Successfully created 'main.bin'.

Now we have a single 12mb file main.bin which we can run and see it output something.
root@93c6753650cd:/app# ./main.bin
This is a binary.
88233154

Great, but python is still installed here. Is this truly portable? Let's test this on a fresh Debian docker container.
root@4ff614b44b25:/app# ./main.bin
This is a binary.
Traceback (most recent call last):
  File "/tmp/onefile_9_1705345900_197091/main.py", line 5, in <module>
    data = api.get_data()

  File "/tmp/onefile_9_1705345900_197091/lib/api.py", line 29, in get_data
  File "/tmp/onefile_9_1705345900_197091/http/client.py", line 1286, in request
  File "/tmp/onefile_9_1705345900_197091/http/client.py", line 1332, in _send_request
  File "/tmp/onefile_9_1705345900_197091/http/client.py", line 1281, in endheaders
  File "/tmp/onefile_9_1705345900_197091/http/client.py", line 1041, in _send_output
  File "/tmp/onefile_9_1705345900_197091/http/client.py", line 979, in send
  File "/tmp/onefile_9_1705345900_197091/http/client.py", line 1458, in connect
  File "/tmp/onefile_9_1705345900_197091/ssl.py", line 517, in wrap_socket
  File "/tmp/onefile_9_1705345900_197091/ssl.py", line 1075, in _create
  File "/tmp/onefile_9_1705345900_197091/ssl.py", line 1346, in do_handshake
ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1002)

Uh oh, looks like we failed making the HTTP request due to some missing SSL CA certs. Lets install those:
apt-get install ca-certificates

Trying again...
root@4ff614b44b25:/app# ./main.bin
This is a binary.
88233154

Yay, it works! Now we should be able to distribute this single binary to run on any machine without needing an existing python installation. 

Is this actually a viable option?

Maybe. It seems like it really depends on your use case and what third party libraries you may rely on and if they are compatible with your compiler.