One of the wonderful things about Python is the ease with which you can start
writing a script - just drop some code into a .py
file, and run python
my_file.py
. Similarly it’s easy to get started with modularity: split
my_file.py
into my_app.py
and my_lib.py
, and you can import my_lib
from
my_app.py
and start organizing your code into modules.
However, the details of the machinery that makes this work have some surprising, and sometimes very security-critical consequences: the more convenient it is for you to execute code from different locations, the more opportunities an attacker has to execute it as well...
Python needs a safe space to load code from
Here are three critical assumptions embedded in Python’s security model:
- Every entry on
sys.path
is assumed to be a secure location from which it is safe to execute arbitrary code. - The directory where the “main script” is located is always on
sys.path
. - When invoking
python
directly, the current directory is treated as the “main script” location, even when passing the-c
or-m
options. **
If you’re running a Python application that’s been installed properly on your
computer, the only location outside of your Python install or virtualenv that
will be automatically added to your sys.path
(by default) is the location
where the main executable, or script, is installed.
For example, if you have pip
installed in /usr/bin
, and you run
/usr/bin/pip
, then only /usr/bin
will be added to sys.path
by this
feature. Anything that can write files to that /usr/bin
can already make
you, or your system, run stuff, so it’s a pretty safe place. (Consider what
would happen if your ls
executable got replaced with something nasty.)
However, one emerging convention is to
prefer calling /path/to/python -m pip
in order to avoid the complexities of
setting up $PATH
properly, and to avoid dealing with divergent documentation
of how scripts are installed on Windows (usually as .exe
files these days,
rather than .py
files).
This is fine — as long as you trust that you’re the only one putting files into the places you can import from — including your working directory.
Your “Downloads” folder isn’t safe
As the category of attacks with the name “DLL Planting” indicates, there are many ways that browsers (and sometimes other software) can be tricked into putting files with arbitrary filenames into the Downloads folder, without user interaction.
Browsers are starting to take this class of vulnerability more seriously, and adding various mitigations to avoid allowing sites to surreptitiously drop files in your downloads folder when you visit them.1
Even with mitigations though, it will be hard to stamp this out entirely: for
example, the Content-Disposition
HTTP header’s filename*
parameter
exists entirely to allow the the site to choose the filename that it downloads
to.
Composing the attack
You’ve made a habit of python -m pip
to install stuff. You download a Python
package from a totally trustworthy website that, for whatever reason, has a
Python wheel by direct download instead of on PyPI. Maybe it’s internal, maybe
it’s a pre-release; whatever. So you download totally-legit-package.whl
, and
then:
1 2 |
|
This seems like a reasonable thing to do, but unbeknownst to you, two weeks ago,
a completely different site you visited had some XSS JavaScript on it that
downloaded a pip.py
with some malware in it into your downloads folder.
Boom.
Demonstrating it
Here’s a quick demonstration of the attack:
1 2 3 4 5 |
|
PYTHONPATH
surprises
Just a few paragraphs ago, I said:
If you’re running a Python application that’s been installed properly on your computer, the only location outside of your Python install or virtualenv that will be automatically added to your
sys.path
(by default) is the location where the main executable, or script, is installed.
So what is that parenthetical “by default” doing there? What other directories might be added?
Anything entries on your
$PYTHONPATH
environment variable. You wouldn’t put your current directory on
$PYTHONPATH
, would you?
Unfortunately, there’s one common way that you might have done so by accident.
Let’s simulate a “vulnerable” Python application:
1 2 3 4 5 |
|
Make 2 directories: install_dir
and attacker_dir
. Drop this in
install_dir
. Then, cd attacker_dir
and put our sophisticated malware
there, under the name used by tool.py
:
1 2 |
|
Finally, let’s run it:
1 2 |
|
So far, so good.
But, here’s the common mistake. Most places that still recommend PYTHONPATH
recommend adding things to it like so:
1 |
|
Intuitively, this makes sense; if you’re adding project X to your
$PYTHONPATH
, maybe project Y had already added something, maybe not; you
never want to blow it away and replace what other parts of your shell startup
might have done with it, especially if you’re writing documentation that lots
of different people will use.
But this idiom has a critical flaw: the first time it’s invoked, if
$PYTHONPATH
was previously either empty or un-set, this then includes an
empty string, which resolves to the current directory. Let’s try it:
1 2 3 |
|
Oh no! Well, just to be safe, let’s empty out $PYTHONPATH
and try it again:
1 2 3 |
|
Still not safe!
What’s happening here is that if PYTHONPATH
is empty, that is not the same
thing as it being unset. From within Python, this is the difference between
os.environ.get("PYTHONPATH") == ""
and os.environ.get("PYTHONPATH") ==
None
.
If you want to be sure you’ve cleared $PYTHONPATH
from a shell (or somewhere
in a shell startup), you need to use the unset
command:
1 2 |
|
Setting PYTHONPATH
used to be the most common way to set up a Python
development environment; hopefully it’s mostly fallen out of favor, with
virtualenvs serving this need better. If you’ve got an old shell configuration
that still sets a $PYTHONPATH
that you don’t need any more, this is a good
opportunity to go ahead and delete it.
However, if you do need an idiom for
“appending to” PYTHONPATH
in a shell startup, use this
technique:
1 2 |
|
In both bash and zsh, this results in
1 2 |
|
with no extra colons or blank entries on your $PYTHONPATH
variable now.
Finally: if you’re still using $PYTHONPATH
, be sure to always use absolute
paths!
Related risks
There are a bunch of variant unsafe behaviors related to inspecting files in
your Downloads
folder by doing anything interactive with Python. Other risky
activities:
- Running
python ~/Downloads/anything.py
(even ifanything.py
is itself safe) from anywhere - as it will add your downloads folder tosys.path
by virtue ofanything.py
’s location. - Jupyter Notebook puts the directory that the notebook is in onto
sys.path
, just like Python puts the script directory there. Sojupyter notebook ~/Downloads/anything.ipynb
is just as dangerous aspython ~/Downloads/anything.py
.
Get those scripts and notebooks out of your downloads folder before you run ’em!
But cd Downloads
and then doing anything interactive remains a problem too:
- Running a
python -c
command that includes animport
statement while in your~/Downloads
folder - Running
python
interactively and importing anything while in your~/Downloads
folder
Remember that ~/Downloads/
isn’t special; it’s just one place where
unexpected files with attacker-chosen filenames might sneak in. Be on the
lookout for other locations where this is true. For example, if you’re
administering a server where the public can upload files, make extra sure
that neither your application nor any administrator who might run python
ever
does cd public_uploads
.
Maybe consider changing the code that handles uploads to mangle file names to
put a .uploaded
at the end, avoiding the risk of a .py
file getting
uploaded and executed accidentally.
Mitigations
If you have tools written in Python that you want to use while in your
downloads folder, make a habit of preferring typing the path to the script
(/path/to/venv/bin/pip
) rather than the module (/path/to/venv/bin/python -m
pip
).
In general, just avoid ever having ~/Downloads
as your current working
directory, and move any software you want to use to a more appropriate location
before launching it.
It’s important to understand where Python gets the code that it’s going to be executing. Giving someone the ability to execute even one line of arbitrary Python is equivalent to giving them full control over your computer!
Why I wrote this article
When writing a “tips and tricks” article like this about security, it’s very easy to imply that I, the author, am very clever for knowing this weird bunch of trivia, and the only way for you, the reader, to stay safe, is to memorize a huge pile of equally esoteric stuff and constantly be thinking about it. Indeed, a previous draft of this post inadvertently did just that. But that’s a really terrible idea and not one that I want to have any part in propagating.
So if I’m not trying to say that, then why post about it? I’ll explain.
Over many years of using Python, I’ve infrequently, but regularly, seen users
confused about the locations that Python loads code from. One variety of this
confusion is when people put their first program that uses Twisted into a file
called twisted.py
. That shadows the import of the library, breaking
everything. Another manifestation of this confusion is a slow trickle of
confused security reports where a researcher drops a module into a location
where Python is documented to load code from — like the current directory in
the scenarios described above — and then load it, thinking that this reflects
an exploit because it’s executing arbitrary code.
Any confusion like this — even if the system in question is “behaving as intended”, and can’t readily be changed — is a vulnerability that an attacker can exploit.
System administrators and developers are high-value targets in the world of cybercrime. If you hack a user, you get that user’s data; but if you hack an admin or a dev, and you do it right, you could get access to thousands of users whose systems are under the administrator’s control or even millions of users who use the developers’ software.
Therefore, while “just be more careful all the time” is not a sustainable recipe for safety, to some extent, those of us acting on our users’ behalf do have a greater obligation to be more careful. At least, we should be informed about the behavior of our tools. Developer tools, like Python, are inevitably power tools which may require more care and precision than the average application.
Nothing I’ve described above is a “bug” or an “exploit”, exactly; I don’t think that the developers of Python or Jupyter have done anything wrong; the system works the way it’s designed and the way it’s designed makes sense. I personally do not have any great ideas for how things could be changed without removing a ton of power from Python.
One of my favorite safety inventions is the SawStop. Nothing was wrong with the way table saws worked before its invention; they were extremely dangerous tools that performed an important industrial function. A lot of very useful and important things were made with table saws. Yet, it was also true that table saws were responsible for a disproportionate share of wood-shop accidents, and, in particular, lost fingers. Despite plenty of care taken by experienced and safety-conscious carpenters, the SawStop still saves many fingers every year.
So by highlighting this potential danger I also hope to provoke some thinking among some enterprising security engineers out there. What might be the SawStop of arbitrary code execution for interactive interpreters? What invention might be able to prevent some of the scenarios I describe below without significantly diminishing the power of tools like Python?
Stay safe out there, friends.
Acknowledgments
Thanks very much to Paul Ganssle, Nathaniel J. Smith, Itamar Turner-Trauring and Nelson Elhage for substantial feedback on earlier drafts of this post.
Any errors remain my own.
-
Restricting which sites can drive-by drop files into your downloads folder is a great security feature, except the main consequence of adding it is that everybody seems to be annoyed by it, not understand it, and want to turn it off. ↩