Python binary packages (and origin story of récitale)

or "One headache, two headaches, three headaches, four..." Tue 08 February 2022

Don't care about your origin story, gimme the good stuff

I used Prosopopée for a while for my photo blog and contributed some fixes back then. This software lets you define albums (aka galleries) with photos (and videos, audio files, text, HTML, iframes, ...) and then creates thumbnails for those photos so that your website does not make your user load 500MiB of data for your 20-photo album.

However, with 25+ albums and 1000+ images, it was not as small as a photo blog as it used to be in the beginning. After a hiccup on the server hosting my blog, I had to reinstall everything from scratch and with my slow upload link, a few GiB of pictures and thumbnails would be just too much. So instead, I uploaded the originals and "compiled" the blog on the server (like I've done it for years already, though making use of previous build cache). The thing is... it took FIVE HOURS AND A HALF to build this from scratch.

And this is where all this madness began.

I discovered that the whole Python project was single-threaded. But that it was using subprocess Python module to call GraphicsMagick which happens to be multi-threaded. While it was sub-optimal because GraphicsMagick could be running faster by converting all thumbnails from a single picture all in one command instead of having it load the base image for each thumbnail, I also discovered that there is some nice image manipulation library in Python called Pillow. I created a small Proof-of-Concept and it was slower than GraphicsMagick... because Pillow is single-threaded. So, time to go for multithreading. This is where I learned about the Python Global Interpreter Lock (aka GIL) which means that threads in Python are only really ever useful for when you're doing IO-intensive tasks, when the CPU is waiting on some peripherals. So, threads... no go since I want all cores to be running at the same time at 100%. Then I discovered the multiprocessing Python module which is side-stepping the GIL. Proof-of-concept... and as hoped, multiprocesses with each having its own image to create thumbnails for was much faster than GraphicsMagick. The homemade benchmarks ran on my Asus C101-PA Chromebook (Rockchip RK3399, 6-core ARM64 SoC) and I could get between 5 to 8 times faster with multiprocessing + Pillow compared to GraphicsMagick.

I imagined the rework of the original project (prosopopée) to be too much work compared to the likeliness that the full rework would be accepted by the original maintainer. I thus started a "fork" from scratch, just reusing the templates. One proof-of-concept later, I shared it with the maintainer of prosopopée and explained all the challenges I faced along the way (who would have known that the GIL was SO hard to side-step while keeping one's own sanity?). They were interested so I started to work on a proper fork with proper commits so that I could send a pull request to the original project and see where it'd go from there. Many months of work and lots of headaches later (supporting multiple versions of Pillow turned out to be painful, with a handful of quirks and hacks to implement), the fork was in a good enough shape to be contributed back to the original project... but wait... With so many (big and drastic) changes, one needs benchmarks to highlight how much the situation improves!

While setting up multiple computers of different architectures (Aarch64, x86_64) and of different levels of power (absurdly slow to pretty fast), I was surprised to see some computers only improved by a factor of 50% the speed of the build. I discovered that for x86_64 SoCs with SIMD support, a fork of Pillow exists: Pillow-SIMD which claims to be much faster than the original Pillow. Tested and confirmed, much faster on computers that support SIMD of Intel x86_64 instruction set. However, Pillow-SIMD is not compatible with Aarch64 instruction set (well... any other than Intel's x86_64 actually). And that is unfortunately not something I want to support officially since I want to be able to use prosopopée on my Chromebook or have people use it on a Raspberry Pi (a friend generates and hosts his photo blog on one so I "had to" support it officially :) ). But it's good for benchmarking nonetheless! Also stumbled upon an explanation as to why it's hard to contribute it back to Pillow which helped me understand a bit more the packaging world of computer distributions.

Back to my benchmarking :) Since I don't like having multiple sources for software packages, I usually install them from the distribution package manager (dnf on Fedora). Since I wanted to support multiple (and more recent than the one available in distribution official package repositories) versions of Pillow I however had to use pip. After benchmarking for a while, I discovered that my numbers for pip-installed versions were worse (by a non-negligible factor) compared to the version that came with my distribution. Then I did the unthinkable: I tested the exact same version of Pillow, one from pip, one from my distribution. And the one from my distribution was almost twice faster than pip's. For. The. Same. Version. After some digging, I saw that Pillow was not using my distribution's jpeg library - libjpeg-turbo - but its own - the original, and slower, libjpeg. Stay til the end for the explanation :) Leave a like and subs... ah no, not YouTube.

I also discovered that one can build Python packages from source delivered by pip by using python3 -m pip install --no-binary :all: pillow (after making sure Pillow package was entirely removed from my system). And with that, my distribution's jpeg library (libjpeg-turbo) was used for Pillow and the perfs were similar. Phew.

Time for some (year-old) benchmarks. For 31 galleries and ~1400 photos:

Computer	Graphicsmagick	Pillow 8.1.0	Built Pillow 8.1.0	Pillow-SIMD 7.0.0.post3
Intel Q6600 (4c/4t @2.4GHz) 4GB RAM (Fedora Desktop 33)	1:37:13.06	26:57.71	17:43.66	N/A
Intel Atom N2800 (2c/4t @1.86GHz 2GB RAM (Fedora Server 33)	5:35:32.57	1:44:21.93	1:16:32.42	N/A
Intel Celeron G1610T (2c/2t @2.3GHz) 4GB RAM (Fedora Server 33)	1:42:33.79	46:10.00	26:10.30	17:30.49
Intel Core i7-8700 (6c/12t @3.2GHz) 32GB RAM (Ubuntu Desktop 20.04.2)	33:01.63	6:00.16	3:40.09	2:16.03
RaspberryPi 4 4GB RAM (Ubuntu Server 20.10)	3:44:57.00	44:43.67	33:29.86	N/A

Seems like the months hard at work proved to be useful after all!

I sent the Pull Request and called it a day.

Fast forward a few months, the maintainer had merged some other pull requests of mine but didn't take the time to review this (big) pull request. So after some careful thinking, I decided to start my own fork, récitale.

And the second round of madness started. I now have a pip package on pypi and wanted to create a container image for the project. Since I started to use container images, I've always tried to use Alpine-based container images as they are more lightweight than others and apparently also offer some decent security practices. I shall therefore create an Alpine-based container image for my project. I tried for hours and hours and pip's Pillow would always try to get compiled from source instead of taking the prebuilt version (aka wheels). Some evenings spent in the matrix and here's my summary of why that is:

Probably most of us only ever developed pure Python scripts. One that only needs a Python interpreter to run and that would be it. Another kind of Python software exists though: Python extension modules. Those are actually coded in C (or C++) with the Python API and can get imported and used as Python modules in your Python scripts. Since C language is compiled and not interpreted like Python, Python extension modules need to be compiled in order to be usable. Pillow actually mostly contains and make use of Python extension modules. Therefore it needs to be compiled. The reason for such Python extension modules is that some code is much faster if coded in low-level languages like C or C++ compared to Python. (As a side note, while there are multiple Python interpreters coded in different languages, CPython is the most widely used and is coded in C as its name suggests).

Since having users compile source code before being able to use software is not the best adoption strategy, there needs to be a way to share prebuilt Python extension modules. The prebuilt Python extension are compiled and shared as shared libraries (commonly .so files on UNIX systems). The compilation and packaging is being handled by wheels and we don't have to worry about that. However, shared libraries almost always depend on (link against) other shared libraries, at the very least against the standard C library (aka libc). So, the maintainer of the Python extension module will compile it into a shared library with wheels and then package it and publish it on some Python package index such as PyPi. Here comes the first problem: the shared library against which the Python extension module was linked may not be the same as the one installed on user computers. This would result in the inability to run the Python extension module anywhere else than on the maintainer computer, which kind of defeats the purpose of being able to share them.

Instead, Python community decided to redact a contract that each maintainer Python extension module should fulfil in order to be publicly shared. This contract is defined in PEP-0513. wheels packages for Linux systems with the manylinux1 tag expect a given set of system libraries to be present on the user computer, each with a specific major version, and guarantee that they do work in that environment. This is great since there's no need to share those system libraries with the Python extension module in PyPi, since they have to be on the system so that the Python package manager (e.g. pip) is able to fetch the prebuilt version of the modules. The not-so-nice thing is that it means this environment will probably get outdated over time since the source code of those system libraries do evolve too and will eventually have some backwards incompatibility. Meaning prebuilt modules will only be available for rather old systems. That's when PEP-0599 comes into play with the manylinux2014 tag and updated contract for the set of system libraries installed on user computers. (Additionally, this PEP brings support for non-Intel architectures, such as ARM or PowerPC). So now the maintainers need to compile two different wheels packages, one fulfilling the manylinux1 contract and another for manylinux2014's. And multiply this by the number of architectures they want to support. This is also where another issue arises: the PEP "contracts" needs to be constantly updated to match what is done on computer distribution. This is tedious for the Python community and therefore, they came up with another contract: PEP-0600. This PEP defines new tags, each one targeting a specific GNU libc (aka glibc) version (major and minor being used to discriminate the version) and a given CPU architecture. Therefore, any shared library not part of the glibc is not part of the new manylinux "contract".

All that being said, the contract only ever mentions a very small set of system libraries and it is very likely some Python extension modules will link against other shared libraries. Such is the case of Pillow. After installing Pillow with pip, one can find the Python extension module shared libraries in ~/.local/lib/python3.7/site-packages/PIL/ directory. One can discover which shared libraries they are linked against by running the following command:

$ ldd ~/.local/lib/python3.7/site-packages/PIL/_imaging.cpython-37m-aarch64-linux-gnu.so
        linux-vdso.so.1 (0x000000766a429000)
        libjpeg-35e8c64c.so.62.3.0 => /home/qsdevices/.local/lib/python3.7/site-packages/PIL/../Pillow.libs/libjpeg-35e8c64c.so.62.3.0 (0x000000766a277000)
        libopenjp2-ae40752c.so.2.4.0 => /home/qsdevices/.local/lib/python3.7/site-packages/PIL/../Pillow.libs/libopenjp2-ae40752c.so.2.4.0 (0x000000766a1c4000)
        libz-21b81fdb.so.1.2.11 => /home/qsdevices/.local/lib/python3.7/site-packages/PIL/../Pillow.libs/libz-21b81fdb.so.1.2.11 (0x000000766a183000)
        libtiff-e22335e6.so.5.7.0 => /home/qsdevices/.local/lib/python3.7/site-packages/PIL/../Pillow.libs/libtiff-e22335e6.so.5.7.0 (0x000000766a081000)
        libxcb-be71eb15.so.1.1.0 => /home/qsdevices/.local/lib/python3.7/site-packages/PIL/../Pillow.libs/libxcb-be71eb15.so.1.1.0 (0x000000766a00c000)
        libpthread.so.0 => /lib/aarch64-linux-gnu/libpthread.so.0 (0x0000007669fc9000)
        libc.so.6 => /lib/aarch64-linux-gnu/libc.so.6 (0x0000007669e57000)
        libm.so.6 => /lib/aarch64-linux-gnu/libm.so.6 (0x0000007669d9a000)
        liblzma-4da4ab69.so.5.2.5 => /home/qsdevices/.local/lib/python3.7/site-packages/PIL/../Pillow.libs/liblzma-4da4ab69.so.5.2.5 (0x0000007669d39000)
        libXau-21870672.so.6.0.0 => /home/qsdevices/.local/lib/python3.7/site-packages/PIL/../Pillow.libs/libXau-21870672.so.6.0.0 (0x0000007669d08000)
        /lib/ld-linux-aarch64.so.1 (0x000000766a3fb000)

Here you can see that this _imaging.so library is linking against libjpeg.so.62.3.0 from /home/qsdevices/.local/lib/python3.7/site-packages/PIL/../Pillow.libs/libjpeg-35e8c64c.so.62.3.0 and not my system's. And this is where I discovered that the shared libraries that aren't part of the manylinux contract are actually shared and installed with the wheels package. This seems to be handled automatically by auditwheel repair. So now you understand why, with the exact same Pillow version installed from pip or my distribution package manager, the benchmarks were so different. Prebuilt Pillow ships with the original libjpeg while the one I get from my distribution links against libjpeg-turbo, thus being much faster. This also explains why recompiling Pillow from source with pip instead of taking the wheels package would make it so similar to my distribution's: Pillow being built locally, it would find libjpeg-turbo from my system instead of libjpeg and use the former. As a note, Pillow 9.0.0 and later wheels are now built against libjpeg-turbo. I did a very quick test on my Intel Q6600-based system from above. Pillow 8.4.0 would take around 29 minutes for the current state of the blog against 17 minutes for Pillow 9.0.1. No need anymore to compile Pillow from source to have better perfs! (Though Pillow-SIMD is likely to still be more performant for computers that support it).

If you paid attention earlier, I stated that the manylinux tag is a contract for packages linking agains the glibc. It happens that Alpine Linux is not using glibc but rather musl. Therefore, when pip tries to find a wheels package that can fulfil a contract with the musl libc, it certainly couldn't find any (because none existed). This means that pip could only fetch the source version of the Python module extension and had to compile it. This is why using Alpine container for Python packages was so discouraged. However, that is now history because there's a new PEP-0656 which introduces a contract for musl-based systems with the musllinux tag. Pillow still does not support it though but one can only dream it'll be supported soon enough :)

I'm happy to have decided to work on my recitale fork for that taught me a lot about Python and packaging :)

Now let's see how long I keep torturing myself with multiprocessing in Python instead of reimplementing it in a more adapted language (and probably much faster too).

Install Fedora on headless remote servers

or "Freeeeeedom of OS choice" Fri 14 May 2021

Context

I run this website on a dedicated server hosted by OVH from their Kimsufi product range. I've had this server for a long time now and even though they publicly advertise that one can install Fedora (and many other) distribution on that product line, they cannot (or don't want to) give me the option to do it on my dedicated server, therefore I had to come up with something to install Fedora without OVH giving me the option to do so.

I did this 2 years ago, it was painful but I thought it was a one time thing to do, so I didn't document the process. Fast forward to about a month ago, I started the Fedora 34 upgrade. When the packages are installed, Fedora reboots into a very simple system where old version packages are replaced by new version packages. In that scenario, obviously no network is needed so the server does not respond to pings. Turns out I had monitoring of my server enabled and the technician seeing a logging screen but no ping hard rebooted my server while it was upgrading. I tried to recover it for a day, gave up and went through a day and half of pain to reinstall Fedora. The hard thing with those servers is that there's no console access, only SSH is available, so if your distribution does not setup the network or does not boot, you have no indication on what went wrong. Now that I've struggled twice, it's time for me to document the process so that next time my server is messed up or I change my server, I can reinstall Fedora in the blink of an eye.

Steps

Install an officially supported distribution from your cloud provider dashboard. From my Kimsufi dashboard, I selected CentOS 7 because it's made by Red Hat too but any distribution which uses GRUB2 as bootloader will do just fine.
SSH to your new server OS,
Download the PXE boot images from Fedora. For release 34, they are available here. I installed wget by running yum install wget and then ran:

wget https://mirror.karneval.cz/pub/linux/fedora/linux/releases/34/Server/x86_64/os/images/pxeboot/initrd.img
wget https://mirror.karneval.cz/pub/linux/fedora/linux/releases/34/Server/x86_64/os/images/pxeboot/vmlinuz

Put both files in the /boot directory. I renamed the file to more or less match the naming convention in my /boot directory. vmlinuz became vmlinuz-fc34 and initrd.img became initrd-fc34.img.
Check whether the /boot directory is in its own partition or not by running this command: df /boot. If you have /boot in the column Mounted on, it is on its own partition, therefore no /boot prefix needed in the next step. If it says /, then you need to have /boot as a prefix in the next step. On my CentOS 7 installation, /boot is mounted on / so I add the /boot prefix in the next step.
In order to have access to the Fedora installer that starts with the PXE boot image on a headless server, you need to access it via VNC. This can be done by adding inst.vnc and inst.vncpassword=<password>. <password> should be 7 characters long. 6 and 8 characters length are supposed to be supported but it didn't work for me.
Edit your GRUB2 configuration file to add a GRUB entry for this kernel and initrd images. On CentOS 7, you just need to add your entry to /etc/grub.d/40_custom file since all files in /etc/grub.d/ are appended to the final configuration file. The file should look like this:

menuentry 'FedoraNetInstall' {
   load_video
   set gfxpayload=keep
   insmod gzi
   insmod part_msdos
   insmod ext2
   set root='hd0,msdos1'
   linux16 /boot/vmlinuz-fc34 ip=dhcp inst.repo=https://download.fedoraproject.org/pub/fedora/linux/releases/34/Server/x86_64/os inst.vnc inst.vncpassword=test123
   initrd16 /boot/initrd-fc34.img
}

N.B.: on Debian 9 and Fedora 34, the linux16 and initrd16 are respectively linux and initrd but the content stays the same (+- the /boot prefix).

N.B.2: Check that the settings before linux16 are similar to the ones you have in entries listed in /boot/grub2/grub.cfg, specifically the set root one.

N.B.3: Quadruple check that there are no typo anywhere in this file (I lost a few hours because I put /boot/initramfs-fc34.img instead of the above after initrd16)

Make a backup of /boot/grub2/grub.cfg: cp /boot/grub2/grub.cfg /boot/grub2/grub.cfg.bak
Add your menuentry to grub.cfg by running: grub2-mkconfig -o /boot/grub2/grub.cfg
In order to make sure you boot next time into your new menuentry but that if it fails to boot the next menuentry to be selected at next boot will be your current OS, run grub2-set-default 0 and grub2-reboot FedoraNetInstall.
Reboot your machine.
Check there's a ping after a few minutes (servers take a much longer time than consumer desktops or laptops to boot so be patient :) ).
Install a VNC client on your computer. I used TigerVNC on Fedora which I installed with dnf install -y tigervnc. From the command line, run vncviewer <remote server IP>:<vnc port>. Or from the GUI, just start TigerVNC and configure it to connect to your server. By default, vnc port is 5901. One can use the 1 equivalent too.
Configure your new Fedora installation and wait for it to reboot.
Profit.

Source:

A good friend :)
https://fedoramagazine.org/start-a-fedora-29-installation-from-the-grub-menu/

Set up your Canon PIXMA MP495/499 to use WiFi

or "The 'only the power-cord'-goal" Sat 08 April 2017

Even if I doubt you'd ever be able to buy this Canon printer/scanner now (I bought it 6 years ago), maybe this tutorial could help you set up your MP495/499 to connect to your personal WiFi and get the scanner and printer to work wirelessly in Linux. To be fully honest with you, I'm only writting this so I can have a straightforward tutorial next time I need to set it up again since I've had to set it up already 5 times and I always spend too much time to find out how to do it.

Configure your printer's WiFi settings

The MP495 WiFi settings are set up from a webpage on the printer itself. For that, you need to access it wirelessly (don't you see the irony here?). The MP495 is expecting a Wireless Network with the following "features":

SSID: BJNPSETUP
No password
DHCP server activated

The printer will now automatically connect to the WiFi network once the WiFi has been activated. To do so, power up your MP495 and wait for it to initialize.

Once it's done its whole init process, press the Maintenance button (A) until the 7-segment display (B) shows something looking like the letter G. Then press the Color button (C). Wait a few seconds and the WiFi logo on the front panel should light up.

Settings panel

Find out the IP address of your printer by going on the webpage of your Access Point or connect your computer to the BJNPSETUP network and run nmap -sP 192.168.1.0/24 (192.168.1.0/24 being the network on which you are when connected to BJNPSETUP). This should return three IPs: the AP's, yours and your printer's.

Enter the IP address of your printer in your web browser. You'll be greeted by a page, you'll click Advanced, you'll note the Network Printer Name, and then click Network Settings.

Select Use wireless LAN and enter the SSID of the network you want your printer to connect to. Click on Modify next to the Encryption Method label. In the opened page, set up the settings of your WiFi network.

Then click save. Of course, you'll lose the connection to your printer since it is now expecting another WiFi network. So go back to your AP and set it back up to what it was before you modified it.

Now, you'll certainly want your printer to have the same IP address for ever so you don't have to reconfigure your printing and scanning tools on your computer. Either you do that in the Network Settings of your printer or on your AP (if it supports it) by giving it a static DHCP lease. I recommend the latter.

Check on your AP or with your computer that the printer is connected on your WiFi network (and get its IP address thanks to the Network Printer Name you noted earlier).

Configure printer on Linux

Then you can configure your printer with the tools available in your distro. On Xubuntu, open system-config-printer and click on Add and select Find Network Printer in the Network Printer dropdown menu. Enter the IP address of your printer and once it's been found, click Forward.

It'll first look for drivers and then ask you which one you want. Select Canon and then either PIXMA MP495 or PIXMA MP499 depending on which one you got. Print a test page to be sure your printer is well configured.

Configure scanner on Linux

XSane and simple-scan are broken right now on Xubuntu for the MP495 (16.10 at the moment of article redacting) so most people will tell you to install the driver from Canon. I had really a hard time to install it and I generally avoid using vendor's drivers. So after searching a bit, I found out that XSane had its support for the MP495 broken only if libsane library's version is 1.0.25 (which it was on my system). After adding some repo and updating libsane, I could scan over WiFi.

sudo add-apt-repository ppa:rolfbensch/sane-git
sudo apt-get update
sudo apt-get upgrade

Factory reset the printer

If for any reason you misconfigured your printer, factory reset it by pressing the Maintenance button until you see the small letter t on the 7-segment display and then press on the Color button.

Disable WiFi on MP495

Press the Maintenance button until you see the letter G and then press the Black button (on the left of the Color button).

Quentin Schulz