PEP 711 – PyBI: a standard format for distributing Python Binaries (2024)

Author:
Nathaniel J. Smith <njs at pobox.com>
PEP-Delegate:
TODO
Discussions-To:
Discourse thread
Status:
Draft
Type:
Standards Track
Topic:
Packaging
Created:
06-Apr-2023
Post-History:
06-Apr-2023
Table of Contents
  • Abstract
  • Motivation
  • Examples
  • Specification
    • Filename
    • File contents
      • Pybi-specific core metadata
    • Symlinks
      • Representing symlinks in zip files
      • Representing symlinks in RECORD files
      • Storing symlinks in pybi files
      • Limitations
  • Non-normative comments
    • Why not just use conda?
    • Sdists (or not)
    • What packages should be bundled inside a pybi?
  • Backwards Compatibility
  • Security Implications
  • How to Teach This
  • Copyright

Abstract

“Like wheels, but instead of a pre-built python package, it’s apre-built python interpreter”

Motivation

End goal: Pypi.org has pre-built packages for all Python versions on allpopular platforms, so automated tools can easily grab any of them andset it up. It becomes quick and easy to try Python prereleases, pinPython versions in CI, make a temporary environment to reproduce a bugreport that only happens on a specific Python point release, etc.

First step (this PEP): define a standard packaging file format to hold pre-builtPython interpreters, that reuses existing Python packaging standards as much aspossible.

Examples

Example pybi builds are available at pybi.vorpus.org. They’re zip files, so you can unpack them and pokearound inside if you want to get a feel for how they’re laid out.

You can also look at the tooling I used to create them.

Specification

Filename

Filename: {distribution}-{version}[-{build tag}]-{platform tag}.pybi

This matches the wheel file format defined in PEP 427, except dropping the{python tag} and {abi tag} and changing the extension from .whl.pybi.

For example:

  • cpython-3.9.3-manylinux_2014.pybi
  • cpython-3.10b2-win_amd64.pybi

Just like for wheels, if a pybi supports multiple platforms, you canseparate them by dots to make a “compressed tag set”:

  • cpython-3.9.5-macosx_11_0_x86_64.macosx_11_0_arm64.pybi

(Though in practice this probably won’t be used much, e.g.the abovefilename is more idiomatically written ascpython-3.9.5-macosx_11_0_universal2.pybi.)

File contents

A .pybi file is a zip file, that can be unpacked directly into anarbitrary location and then used as a self-contained Python environment.There’s no .data directory or install scheme keys, because thePython environment knows which install scheme it’s using, so it can justput things in the right places to start with.

The “arbitrary location” part is important: the pybi can’t contain anyhardcoded absolute paths. In particular, any preinstalled scripts MUSTNOT embed absolute paths in their shebang lines.

Similar to wheels’ <package>-<version>.dist-info directory, the pybi archivemust contain a top-level directory named pybi-info/. (Rationale: calling itpybi-info instead dist-info makes sure that tools don’t get confusedabout which kind of metadata they’re looking at; leaving off the{name}-{version} part is fine because only one pybi can be installed into agiven directory.) The pybi-info/ directory contains at least the followingfiles:

  • .../PYBI: metadata about the archive itself, in the sameRFC822-ish format as METADATA and WHEEL files:
    Pybi-Version: 1.0Generator: {name} {version}Tag: {platform tag}Tag: {another platform tag}Tag: {...and so on...}Build: 1 # optional
  • .../RECORD: same as in wheels, except see the note aboutsymlinks, below.
  • .../METADATA: In the same format as described in the current coremetadata spec, except that the following keys are forbidden becausethey don’t make sense:
    • Requires-Dist
    • Provides-Extra
    • Requires-Python

    And also there are some new, required keys described below.

Pybi-specific core metadata

Here’s an example of the new METADATA fields, before we give the full details:

Pybi-Environment-Marker-Variables: {"implementation_name": "cpython", "implementation_version": "3.10.8", "os_name": "posix", "platform_machine": "x86_64", "platform_system": "Linux", "python_full_version": "3.10.8", "platform_python_implementation": "CPython", "python_version": "3.10", "sys_platform": "linux"}Pybi-Paths: {"stdlib": "lib/python3.10", "platstdlib": "lib/python3.10", "purelib": "lib/python3.10/site-packages", "platlib": "lib/python3.10/site-packages", "include": "include/python3.10", "platinclude": "include/python3.10", "scripts": "bin", "data": "."}Pybi-Wheel-Tag: cp310-cp310-PLATFORMPybi-Wheel-Tag: cp310-abi3-PLATFORMPybi-Wheel-Tag: cp310-none-PLATFORMPybi-Wheel-Tag: cp39-abi3-PLATFORMPybi-Wheel-Tag: cp38-abi3-PLATFORMPybi-Wheel-Tag: cp37-abi3-PLATFORMPybi-Wheel-Tag: cp36-abi3-PLATFORMPybi-Wheel-Tag: cp35-abi3-PLATFORMPybi-Wheel-Tag: cp34-abi3-PLATFORMPybi-Wheel-Tag: cp33-abi3-PLATFORMPybi-Wheel-Tag: cp32-abi3-PLATFORMPybi-Wheel-Tag: py310-none-PLATFORMPybi-Wheel-Tag: py3-none-PLATFORMPybi-Wheel-Tag: py39-none-PLATFORMPybi-Wheel-Tag: py38-none-PLATFORMPybi-Wheel-Tag: py37-none-PLATFORMPybi-Wheel-Tag: py36-none-PLATFORMPybi-Wheel-Tag: py35-none-PLATFORMPybi-Wheel-Tag: py34-none-PLATFORMPybi-Wheel-Tag: py33-none-PLATFORMPybi-Wheel-Tag: py32-none-PLATFORMPybi-Wheel-Tag: py31-none-PLATFORMPybi-Wheel-Tag: py30-none-PLATFORMPybi-Wheel-Tag: py310-none-anyPybi-Wheel-Tag: py3-none-anyPybi-Wheel-Tag: py39-none-anyPybi-Wheel-Tag: py38-none-anyPybi-Wheel-Tag: py37-none-anyPybi-Wheel-Tag: py36-none-anyPybi-Wheel-Tag: py35-none-anyPybi-Wheel-Tag: py34-none-anyPybi-Wheel-Tag: py33-none-anyPybi-Wheel-Tag: py32-none-anyPybi-Wheel-Tag: py31-none-anyPybi-Wheel-Tag: py30-none-any

Specification:

  • Pybi-Environment-Marker-Variables: The value of all PEP 508environment marker variables that are static across installs of thisPybi, as a JSON dict. So for example:
    • python_version will always be present, because a Python 3.10 packagealways has python_version == "3.10".
    • platform_version will generally not be present, because it givesdetailed information about the OS where Python is running, for example:
      #60-Ubuntu SMP Thu May 6 07:46:32 UTC 2021

      platform_release has similar issues.

    • platform_machine will usually be present, except for macOS universal2pybis: these can potentially be run in either x86-64 or arm64 mode, and wedon’t know which until the interpreter is actually invoked, so we can’trecord it in static metadata.

    Rationale: In many cases, this should allow a resolver running on Linuxto compute package pins for a Python environment on Windows, or vice-versa,so long as the resolver has access to the target platform’s .pybi file. (Notethat Requires-Python constraints can be checked by using thepython_full_version value.) While we have to leave out a few keyssometimes, they’re either fairly useless (platform_version,platform_release) or can be reconstructed by the resolver(platform_machine).

    The markers are also just generally useful information to haveaccessible. For example, if you have a pypy3-7.3.2 pybi, and youwant to know what version of the Python language that supports, thenthat’s recorded in the python_version marker.

    (Note: we may want to deprecate/remove platform_version andplatform_release? They’re problematic and I can’t figure out any caseswhere they’re useful. But that’s out of scope of this particular PEP.)

  • Pybi-Paths: The install paths needed to install wheels (same keysas sysconfig.get_paths()), as relative paths starting at the rootof the zip file, as a JSON dict.

    These paths MUST be written in Unix format, using forward slashes asa separator, not backslashes.

    It must be possible to invoke the Python interpreter by running{paths["scripts"]}/python. If there are alternative interpreterentry points (e.g. pythonw for Windows GUI apps), then theyshould also be in that directory under their conventional names, withno version number attached. (You can also have a python3.11symlink if you want; there’s no rule against that. It’s just thatpython has to exist and work.)

    Rationale: Pybi-Paths and Pybi-Wheel-Tags (see below) aretogether enough to let an installer choose wheels and install them into anunpacked pybi environment, without invoking Python. Besides, we need to writedown the interpreter location somewhere, so it’s two birds with one stone.

  • Pybi-Wheel-Tag: The wheel tags supported by this interpreter, inpreference order (most-preferred first, least-preferred last), exceptthat the special platform tag PLATFORM should replace anyplatform tags that depend on the final installation system.

    Discussion: It would be nice™ if installers could compute a pybi’scorresponding wheel tags ahead of time, so that they could installwheels into the unpacked pybi without needing to actually invoke thepython interpreter to query its tags – both for efficiency and toallow for more exotic use cases like setting up a Windows environmentfrom a Linux host.

    But unfortunately, it’s impossible to compute the full set ofplatform tags supported by a Python installation ahead of time,because they can depend on the final system:

    • A pybi tagged manylinux_2_12_x86_64 can always use wheelstagged as manylinux_2_12_x86_64. It also might be able touse wheels tagged manylinux_2_17_x86_64, but only if the finalinstallation system has glibc 2.17+.
    • A pybi tagged macosx_11_0_universal2 (= x86-64 + arm64 supportin the same binary) might be able to use wheels tagged asmacosx_11_0_arm64, but only if it’s installed on an “AppleSilicon” machine and running in arm64 mode.

    In these two cases, an installation tool can still work out theappropriate set of wheel tags by computing the local platform tags,taking the wheel tag templates from Pybi-Wheel-Tag, and swappingin the actual supported platforms in place of the magic PLATFORMstring.

    However, there are other cases that are even more complicated:

    • You can (usually) run both 32- and 64-bit apps on 64-bit Windows. So a pybi
      installer might compute the set of allowable pybi tags on the currentplatform as [win32, win_amd64]. But you can’t then just take thatset and swap it into the pybi’s wheel tag template or you get nonsense:
      [ "cp39-cp39-win32", "cp39-cp39-win_amd64", "cp39-abi3-win32", "cp39-abi3-win_amd64", ...]

      To handle this, the installer needs to somehow understand that amanylinux_2_12_x86_64 pybi can use a manylinux_2_17_x86_64 wheelas long as those are both valid tags on the current machine, but awin32 pybi can’t use a win_amd64 wheel, even if those are bothvalid tags on the current machine.

    • A pybi tagged macosx_11_0_universal2 might be able to usewheels tagged as macosx_11_0_x86_64, but only if it’sinstalled on an x86-64 machine or it’s installed on an ARMmachine and the interpreter is invoked with the magicincantation that tells macOS to run a binary in x86-64 mode. Sohow the installer plans to invoke the pybi matters too!

    So actually using Pybi-Wheel-Tag values is less trivial than itmight seem, and they’re probably only useful with fairlysophisticated tooling. But, smart pybi installers will already haveto understand a lot of these platform compatibility issues in orderto select a working pybi, and for the cross-platformpinning/environment building case, users can potentially providewhatever information is needed to disambiguate exactly what platformthey’re targeting. So, it’s still useful enough to include in the PyBImetadata – tools that don’t find it useful can simply ignore it.

You can probably generate these metadata values by running this script on thebuilt interpreter:

import packaging.markersimport packaging.tagsimport sysconfigimport os.pathimport jsonimport sysmarker_vars = packaging.markers.default_environment()# Delete any keys that depend on the final installationdel marker_vars["platform_release"]del marker_vars["platform_version"]# Darwin binaries are often multi-arch, so play it safe and# delete the architecture marker. (Better would be to only# do this if the pybi actually is multi-arch.)if marker_vars["sys_platform"] == "darwin": del marker_vars["platform_machine"]# Copied and tweaked version of packaging.tags.sys_tagstags = []interp_name = packaging.tags.interpreter_name()if interp_name == "cp": tags += list(packaging.tags.cpython_tags(platforms=["xyzzy"]))else: tags += list(packaging.tags.generic_tags(platforms=["xyzzy"]))tags += list(packaging.tags.compatible_tags(platforms=["xyzzy"]))# Gross hack: packaging.tags normalizes platforms by lowercasing them,# so we generate the tags with a unique string and then replace it# with our special uppercase placeholder.str_tags = [str(t).replace("xyzzy", "PLATFORM") for t in tags](base_path,) = sysconfig.get_config_vars("installed_base")# For some reason, macOS framework builds report their# installed_base as a directory deep inside the framework.while "Python.framework" in base_path: base_path = os.path.dirname(base_path)paths = {key: os.path.relpath(path, base_path).replace("\\", "/") for (key, path) in sysconfig.get_paths().items()}json.dump({"marker_vars": marker_vars, "tags": str_tags, "paths": paths}, sys.stdout)

This emits a JSON dict on stdout with separate entries for each set ofpybi-specific tags.

Symlinks

Currently, symlinks are used by default in all Unix Python installs (e.g.,bin/python3 -> bin/python3.9). And furthermore, symlinks are required tostore macOS framework builds in .pybi files. So, unlike wheel files, weabsolutely have to support symlinks in .pybi files for them to be useful atall.

Representing symlinks in zip files

The de-facto standard for representing symlinks in zip files is theInfo-Zip symlink extension, which works as follows:

  • The symlink’s target path is stored as if it were the file contents
  • The top 4 bits of the Unix permissions field are set to 0xa,i.e.: permissions & 0xf000 == 0xa000
  • The Unix permissions field, in turn, is stored as the top 16 bits ofthe “external attributes” field.

So if using Python’s zipfile module, you can check whether aZipInfo represents a symlink by doing:

(zip_info.external_attr >> 16) & 0xf000 == 0xa000

Or if using Rust’s zip crate, the equivalent check is:

fn is_symlink(zip_file: &zip::ZipFile) -> bool { match zip_file.unix_mode() { Some(mode) => mode & 0xf000 == 0xa000, None => false, }}

If you’re on Unix, your zip and unzip commands probably understands thisformat already.

Representing symlinks in RECORD files

Normally, a RECORD file lists each file + its hash + its length:

my/favorite/file,sha256=...,12345

For symlinks, we instead write:

name/of/symlink,symlink=path/to/symlink/target,

That is: we use a special “hash function” called symlink, and thenstore the actual symlink target as the “hash value”. And the length isleft empty.

Rationale: we’re already committed to the RECORD file containing aredundant check on everything in the main archive, so for symlinks we at leastneed to store some kind of hash, plus some kind of flag to indicate that this isa symlink. Given that symlink target strings are roughly the same size as ahash, we might as well store them directly. This also makes the symlinkinformation easier to access for tools that don’t understand the Info-Zipsymlink extension, and makes it possible to losslessly unpack and repack a Unixpybi on a Windows system, which someone might find handy at some point.

Storing symlinks in pybi files

When a pybi creator stores a symlink, they MUST use both of themechanisms defined above: storing it in the zip archive directly usingthe Info-Zip representation, and also recording it in the RECORDfile.

Pybi consumers SHOULD validate that the symlinks in the archive andRECORD file are consistent with each other.

We also considered using only the RECORD file to store symlinks,but then the vanilla unzip tool wouldn’t be able to unpack them, andthat would make it hard to install a pybi from a shell script.

Limitations

Symlinks enable a lot of potential messiness. To keep things undercontrol, we impose the following restrictions:

  • Symlinks MUST NOT be used in .pybis targeting Windows, or otherplatforms that are missing first-class symlink support.
  • Symlinks MUST NOT be used inside the pybi-info directory.(Rationale: there’s no need, and it makes things simpler forresolvers that need to extract info from pybi-info withoutunpacking the whole archive.)
  • Symlink targets MUST be relative paths, and MUST be inside the pybidirectory.
  • If A/B/... is recorded as a symlink in the archive, then thereMUST NOT be any other entries in the archive named likeA/B/.../C.

    For example, if an archive has a symlink foo -> bar, and thenlater in the archive there’s a regular file named foo/blah.py,then a naive unpacker could potentially end up writing a file calledbar/blah.py. Don’t be naive.

Unpackers MUST verify that these rules are followed, because withoutthem attackers could create evil symlinks like foo -> /etc/passwd orfoo -> ../../../../../etc + foo/passwd -> ... and cause havoc.

Non-normative comments

Why not just use conda?

This isn’t really in the scope of this PEP, but since conda is a popular way todistribute binary Python interpreters, it’s a natural question.

The simple answer is: conda is great! But, there are lots of python users whoaren’t conda users, and they deserve nice things too. This PEP just gives themanother option.

The deeper answer is: the maintainers who upload packages to PyPI are thebackbone of the Python ecosystem. They’re the first audience for Pythonpackaging tools. And one thing they want is to upload a package once, and haveit be accessible across all the different ways Python is deployed: in Debian andFedora and Homebrew and FreeBSD, in Conda environments, in big companies’monorepos, in Nix, in Blender plugins, in RenPy games, ….. you get the idea.

All of these environments have their own tooling and strategies for managingpackages and dependencies. So what’s special about PyPI and wheels is thatthey’re designed to describe dependencies in a standard, abstract way, thatall these downstream systems can consume and convert into their localconventions. That’s why package maintainers use Python-specific metadata andupload to PyPI: because it lets them address all of those systemssimultaneously. Every time you build a Python package for conda, there’s anintermediate wheel that’s generated, because wheels are the common language thatPython package build systems and conda can use to talk to each other.

But then, if you’re a maintainer releasing an sdist+wheels, then you naturallywant to test what you’re releasing, which may depend on arbitrary PyPI packagesand versions. So you need tools that build Python environments directly fromPyPI, and conda is fundamentally not designed to do that. So conda and pip areboth necessary for different cases, and this proposal happens to be targetingthe pip side of that equation.

Sdists (or not)

It might be cool to have an “sdist” equivalent for pybis, i.e., somekind of format for a Python source release that’s structured-enough tolet tools automatically fetch and build it into a pybi, for platformswhere prebuilt pybis aren’t available. But, this isn’t necessary for theMVP and opens a can of worms, so let’s worry about it later.

What packages should be bundled inside a pybi?

Pybi builders have the power to pick and choose what exactly goes inside. Forexample, you could include some preinstalled packages in the pybi’ssite-packages directory, or prune out bits of the stdlib that you don’twant. We can’t stop you! Though if you do preinstall packages, then it’sstrongly recommended to also include the correct metadata (.dist-info etc.),so that it’s possible for Pip or other tools to understand out what’s going on.

For my prototype “general purpose” pybi’s, what I chose is:

  • Make sure site-packages is empty.

    Rationale: for traditional standalone python installers that are targetedat end-users, you probably want to include at least pip, to avoidbootstrapping issues (PEP 453). But pybis are different: they’re designedto be installed by “smart” tooling, that consume the pybi as part of somekind of larger automated deployment process. It’s easier for these installersto start from a blank slate and then add whatever they need, than for them tostart with some preinstalled packages that they may or may not want. (Andbesides, you can still run python -m ensurepip.)

  • Include the full stdlib, except for test.

    Rationale: the top-level test module contains CPython’s own testsuite. It’s huge (CPython without test is ~37 MB, then testadds another ~25 MB on top of that!), and essentially never used byregular user code. Also, as precedent, the official nuget packages,the official manylinux images, and multiple Linux distributions allleave it out, and this hasn’t caused any major problems.

    So this seems like the best way to balance broad compatibility withreasonable download/install sizes.

  • I’m not shipping any .pyc files. They take up space in thedownload, can be generated on the final system at minimal cost, anddropping them removes a source of location-dependence. (.pycfiles store the absolute path of the corresponding .py file andinclude it in tracebacks; but, pybis are relocatable, so the correctpath isn’t known until after install.)

Backwards Compatibility

No backwards compatibility considerations.

Security Implications

No security implications, beyond the fact that anyone who takes it uponthemselves to distribute binaries has to come up with a plan to manage theirsecurity (e.g., whether they roll a new build after an OpenSSL CVE drops). Butcollectively, we core Python folks are already maintaining binary builds for allmajor platforms (macOS + Windows through python.org, and Linux builds throughthe official manylinux image), so even if we do start releasing official CPythonbuilds on PyPI it doesn’t really raise any new security issues.

How to Teach This

This isn’t targeted at end-users; their experience will simply be that e.g.their pyenv or tox invocation magically gets faster and more reliable (if thoseprojects’ maintainers decide to take advantage of this PEP).

Copyright

This document is placed in the public domain or under theCC0-1.0-Universal license, whichever is more permissive.

PEP 711 – PyBI: a standard format for distributing Python Binaries (2024)
Top Articles
Latest Posts
Article information

Author: Rev. Leonie Wyman

Last Updated:

Views: 6411

Rating: 4.9 / 5 (59 voted)

Reviews: 90% of readers found this page helpful

Author information

Name: Rev. Leonie Wyman

Birthday: 1993-07-01

Address: Suite 763 6272 Lang Bypass, New Xochitlport, VT 72704-3308

Phone: +22014484519944

Job: Banking Officer

Hobby: Sailing, Gaming, Basketball, Calligraphy, Mycology, Astronomy, Juggling

Introduction: My name is Rev. Leonie Wyman, I am a colorful, tasty, splendid, fair, witty, gorgeous, splendid person who loves writing and wants to share my knowledge and understanding with you.