JupyterHub and Linux PAM

February 2022 (republished in March 2026)

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

This post was written in 2022 and has not been updated since (up to fixing some typos and broken links).

JupyterHub is a tool for starting and managing JupyterLab (and Jupyter Notebook) sessions in a multi-user environment via webbrowser. As such it provides a login facility to a (Linux) server, allowing to work on the server without logging in via SSH. From the admin point of view it is desirable to configure both login processes (via JupyterHub and via SSH, and maybe also local logins) in a unified manner.

Linux PAM (Pluggable Authentication Modules) is the tool of choice for managing user authentication and doing stuff at login and logout (mounting network shares, for instance). It’s the standard tool in almost all Linux distributions.

Although JupyterHub ships with native PAM support, the web is full of discussions on how to get JupyterHub/PAM working in the intended way. Taking the time and digging into this reveales that PAM support in JupyterHub (version <= 2.1.1) is essentially broken (by design) and hard to fix. Some GitHub issues:

In this blog post I describe the details of the bug and show how to cope with the situation until JupyterHub’s PAM support gets fixed some day. JupyterHub developers are aware of the bug and discussing a path forward. I also tried to fix it myself, but ran into another bug (in pam_mount), which presumably won’t get fixed. Situation is highly nontrivial and demands for explanation and documentaion…

Problem description

In small lab environments using PAM authentication for JupyterHub is a straight-forward solution to many authentication and resource allocation problems. Corresponding authenticator class is called PAMAuthenticator. It’s JupyterHub’s default authenticator. By default next to PAM authentication also PAM session handling is enabled. It can be disabled in JupyterHub’s config file via

c.PAMAuthenticator.open_sessions = False

Note that the default will change to False soon as a result of discussing the PAM bug details in the past weeks, see pull request 3787.

With PAM session handling enabled, logging in to JupyterHub should initiate typical login actions like mounting the user’s network shares. But this does not happen due to permission issues. Typical errors from the logs are:

Nonetheless, JupyterHub will start JupyterLab and the user is able to work in JupyterLab. But network shares won’t be shown and resource management (limits on memory or CPU usage, for instance) won’t work as intended. In addition, user sessions won’t be managed by systemd, leaving the system’s process management in an undesired state.

How PAM works

Understanding JupyterHub’s PAM session handling problem requires advanced PAM knowledge.

PAM transactions

An application (here JupyterHub or JupyterLab) which wants to use PAM functionality has to start a PAM transaction by calling libpam’s pam_start function. When all work is done, a call to pam_end ends the PAM transaction.

A PAM transaction consists of several optional stages:

The most important feature of PAM is that within a PAM transaction the user has to provide it’s password only once (authentication stage). All further password requests, for instance if network shares have to be mounted at login, are automatically answered by PAM. What to do during each stage is highly configurable and can be configured on a per-app basis. Configuration files usually reside in /etc/pam.d/. See pam.d manual page for details.

Authentication is done by calling libpam’s pam_authenticate. A call to pam_open_session starts a PAM session, that is, performs the configured login tasks. A call to pam_close_session ends the session stage, that is, performs the logout tasks.

PAM modules

All PAM functionality is devided into PAM modules. Important PAM modules for the session stage are pam_mount (for mounting and unmounting file systems) and pam_systemd (for getting some systemd stuff running, resource limitations, for instance). Each PAM module is a shared object (shared library).

PAM configuration files determine which modules to use in which stage. Each module provides a function for each supported stage. This function is called by libpam during the corresponding stage. For example, an app’s call to pam_authenticate leads to calling each module’s pam_sm_authenticate if the module is configured to be part of the app’s authentication stage.

Example: The pam_mount module supports authentication and session stages. In the authentication stage user provided credentials are passed on to pam_mount. In the session stage pam_mount uses these credentials to mount network shares or encrypted local drives.

PAM handles

Starting a PAM transaction yields a PAM handle identifying the transaction. Thus, there may be multiple parallel PAM transactions, even per app. The handle is provided to each PAM module at each call to the module. PAM modules have to take care of separating data and states from different transactions.

In pam_mount this separation mechanism is buggy. Separation between different apps’ PAM transactions works, but multiple parallel PAM transactions per app result in a segmentation fault. See bug report for details on what exactly causes the segfault. Relation to the PAM session handling issue in JupyterHub will become clear below. Fixing this pam_mount bug would require major modifications to pam_mount’s C source code, because a global data structure has to be made a per-PAM-handle data structure.

JupyterHub vs. JupyterLab

JupyterHub is a multi-user system spawning one or multiple JupyterLab instances for each user. Users may login and logout at will while their JupyterLabs keep running. Understanding the interaction between JupyterHub, JupyterLab and PAM requires at least some basic knowledge of JupyterHub’s design.

Keeping authentication aside for a moment users can tell JupyterHub to start a so called single-user server, usually a JupyterLab instance. Starting multiple single-user servers per user is possible, too. The user also can tell the hub to shutdown one or serveral of the users’s servers. Single-user servers run as separate Linux processes in user space. The hub itself ususally is run by the root user.

If the user logs in to the hub, a single-user server is started if there is no already running one. Logging out from the hub does not stop the user’s server. The user may come back later, log in to the hub, and continue working in the already running JupyterLab. So authentication to the hub is more or less unrelated to starting and stopping single-user servers. This has to be taken into account when dealing with PAM and it makes PAM session handling rather difficult.

JupyterHub’s PAM session handling

We first discuss why JupyterHub’s PAM code is incorrect. Then we’ll have a look at possible solutions.

Broken PAM implementation

Looking at JupyterHub’s PAM related source code everything looks fine. In PAMAuthenticator.authenticate there is a call to some PAM authentication function and also to some PAM account checking. In PAMAuthenticator.pre_spawn_start and PAMAuthenticator.post_spawn_stop there are calls for opening and closing a PAM session, respectively.

Calls don’t go directly to libpam, but to a wrapper Python module called pamela. Inspecting pamela’s functions JupyterHub’s calls to libpam are as follows:

  1. pam_start
  2. pam_authenticate
  3. pam_end
  4. pam_start
  5. pam_pam_acct_mgmt
  6. pam_end
  7. pam_start
  8. pam_open_session
  9. pam_end
  10. pam_start
  11. pam_close_session
  12. pam_end

For each PAM stage there is a separate PAM transaction! This explains several seemingly different JupyterHub issues (see issue list and error messages above). Although the user is authenticated successfully, this information gets lost. All the other stages and their PAM modules are run as unauthenticated user resulting in all kinds of permission errors.

Note, that the pamela module also provides functions for doing everything in one PAM transaction, but JupyterHub does not use those functions. So it’s not a pamela issue although recent discussion of the JupyterHub/PAM issue on GitHub took place in pamela issue 22.

Attempt 1: fix pamela usage (fails due to pam_mount bug)

A relatively simple and straight forward attempt to fix JupyterHub’s PAM session handling would be to open a PAM session as soon as the user spawns a single-user server and to close the PAM session if the server has terminated. This is what’s currently implemented in JuypterHub. But one has to do all the PAM stuff related to a single-user server in one PAM transaction.

I implemented this approach. It works as long as there is only one user with one single-user server on the hub. Starting a second server results in a segmentation fault. Tracking it down, the segfault is caused by pam_mount during closing a PAM session. As described above pam_mount does not support parallel PAM sessions within one application (JupyterHub). This has to be considered a bug, because PAM documentation explicitly states that “The transaction state is contained entirely within the structure identified by this handle, so it is possible to have multiple transactions in parallel.” PAM development seems to by rather inactive, so presumably the bug won’t get fixed (and fixing it is not trivial).

Of course, one could abstain from pam_mount. Then PAM session handling in JupyterHub would work with the described fix. But for most admins the reason to get JupyterHub with PAM working is to mount network shares at login to JupyterHub. So having pam_mount in the PAM stack is essential.

Attempt 2: open PAM session in single-user server (fails due to libpam implementation details)

In principle, the single-user server (JupyterLab) could do PAM session handling. But opening a PAM session has to be done within the same PAM transaction as used for authentication. Authentication is done by JupyterHub, so JupyterHub has to pass on the PAM transaction handle (or PAM handle for short) to the single-user server.

The problem is that PAM handles are not serializable. PAM handles are pointers to somewhere in memory and passing such pointers to other processes directly should segfault. Copying the information pointed to by the PAM handle would require to rely on implementation details of libpam. Layout of the data structure is not an API feature of libpam and may change from version to version.

Note, that this approach also would contradict JupyterHub’s design. JupyterHub should do all the authentication and user management stuff, while the single-user server does not have to care about such things. In particular, JupyterHub’s single-user server should be as similar as possible to usual Jupyter servers running without the hub.

Attempt 3: start PAM transaction in single-user server (fails due to security concerns and complexity)

Starting PAM transaction in the single-user server would be a clean solution. But then JupyterHub would have to pass user credentials to the single-user server. Without authentication no pam_mount for network shares in a PAM transaction. Without obtaining user credentials from the hub the single-user server has to ask the user for the password. So the user has to type its password several times (for the hub and for each spawned server).

Passing credentials to the single-user server in clear text is not a good idea considering security implications. So communication has to be encrypted, but there is no standard encrypted communication channel between hub and servers. Usually, information is passed via environment variables. This path is very complex to implement and will remain dubious regarding security.

Again, this approach contradicts JupyterHub’s design (cf. above).

Attempt 4: use intermediate processes (maybe successful, but not yet implemented)

The idea described in a comment to pamela issue 22 and previously also outlined in a comment to JupyterHub pull request could be a successful approach to getting PAM sessions work in JupyterHub. But implementation is complex, too complex for me as an average JupyterHub admin. There are chances that things get fixed this year by the JupyterHub team, now that there is much more light on the problem’s details.

JupyterHub without PAM sessions

Until PAM session handling in JupyerHub gets fixed, one has to live without PAM sessions. The first thing to do is to put

c.PAMAuthenticator.open_sessions = False

into JupyterHub’s configuration file.

Mounting file systems at login is hard to do without PAM. Seems that systemd also has mounting capabilities. But this would require major changes to the system configuration. A simple solution, which requires no configuration, is to tell hub users to start a terminal in JupyterLab and initiate an SSH connection via ssh localhost, if they want to see their network shares.

For getting resource limiting without PAM one can use JupyterHub’s SystemdSpawner class instead of the default LocalProcessSpawner. This requires installation of systemdspawner, the line

c.JupyterHub.spawner_class = 'systemdspawner.SystemdSpawner'

in JupyterHub’s config file, and maybe some additional spawner configuration lines.