Careful With That PyPI

PyPI credentials are important. Here are some tips for securing them a little better.

Too Many Secrets

A wise man once said, “you shouldn’t use ENV variables for secret data”. In large part, he was right, for all the reasons he gives (and you should read them). Filesystem locations are usually a better operating system interface to communicate secrets than environment variables; fewer things can intercept an open() than can read your process’s command-line or calling environment.

One might say that files are “more secure” than environment variables. To his credit, Diogo doesn’t, for good reason: one shouldn’t refer to the superiority of such a mechanism as being “more secure” in general, but rather, as better for a specific reason in some specific circumstance.

Supplying your PyPI password to tools you run on your personal machine is a very different case than providing a cryptographic key to a containerized application in a remote datacenter. In this case, based on the constraints of the software presently available, I believe an environment variable provides better security, if you use it correctly.

Popping A Shell By Any Other Name

If you upload packages to the python package index, and people use those packages, your PyPI password is an extremely high-privilege credential: effectively, it grants a time-delayed arbitrary code execution privilege on all of the systems where anyone might pip install your packages.

Unfortunately, the suggested mechanism to manage this crucial, potentially world-destroying credential is to just stick it in an unencrypted file.

The authors of this documentation know this is a problem; the authors of the tooling know too (and, given that these tools are all open source and we all could have fixed them to be better about this, we should all feel bad).

Leaving the secret lying around on the filesystem is a form of ambient authority; a permission you always have, but only sometimes want. One of the worst things about this is that you can easily forget it’s there if you don’t use these credentials very often.

The keyring is a much better place, but even it can be a slightly scary place to put such a thing, because it’s still easy to put it into a state where some random command could upload a PyPI release without prompting you. PyPI is forever, so we want to measure twice and cut once.

Luckily, even more secure places exist: password managers. If you use https://1password.com or https://www.lastpass.com, both offer command-line interfaces that integrate nicely with PyPI. If you use 1password, you’ll really want https://stedolan.github.io/jq/ (apt-get install jq, brew install jq) to slice & dice its command-line.

The way that I manage my PyPI credentials is that I never put them on my filesystem, or even into my keyring; instead, I leave them in my password manager, and very briefly toss them into the tools that need them via an environment variable.

First, I have the following shell function, to prevent any mistakes:

1
2
3
4
function twine () {
    echo "Use dev.twine or prod.twine depending on where you want to upload.";
    return 1;
}

For dev.twine, I configure twine to always only talk to my local DevPI instance:

1
2
3
4
5
6
function dev.twine () {
    env TWINE_USERNAME=root \
        TWINE_PASSWORD= \
        TWINE_REPOSITORY_URL=http://127.0.0.1:3141/root/plus/ \
        twine "$@";
}

This way I can debug Twine, my setup.py, and various test-upload things without ever needing real credentials at all.

But, OK. Eventually, I need to actually get the credentials and do the thing. How does that work?

1Password

1password’s command line is a little tricky to log in to (you have to eval its output, it’s not just a command), so here’s a handy shell function that will do it.

1
2
3
4
5
6
function opme () {
    # Log this shell in to 1password.
    if ! env | grep -q OP_SESSION; then
        eval "$(op signin "$(jq -r '.latest_signin' ~/.op/config)")";
    fi;
}

Then, I have this little helper for slicing out a particular field from the OP JSON structure:

1
2
3
function _op_field () {
    jq -r '.details.fields[] | select(.name == "'"${1}"'") | .value';
}

And finally, I use this to grab the item I want (named, memorably enough, “PyPI”) and invoke Twine:

1
2
3
4
5
6
7
function prod.twine () {
    opme;
    local pypi_item="$(op get item PyPI)";
    env TWINE_USERNAME="$(echo ${pypi_item} | _op_field username)" \
        TWINE_PASSWORD="$(echo "${pypi_item}" | _op_field password)" \
        twine "$@";
}

LastPass

For lastpass, you can just log in (for all shells; it’s a little less secure) via lpass login; if you’ve logged in before you often don’t even have to do that, and it will just prompt you when running command that require you to be logged in; so we don’t need the preamble that 1password’s command line did.

Its version of prod.twine looks quite similar, but its plaintext output obviates the need for jq:

1
2
3
4
5
function prod.twine () {
    env TWINE_USERNAME="$(lpass show PyPI --username)" \
        TWINE_PASSWORD="$(lpass show PyPI --password)" \
        twine "$@";
}

In Conclusion

“Keep secrets out of your environment” is generally a good idea, and you should always do it when you can. But, better a moment in your process environment than an eternity on your filesystem. Environment-based configuration can be a very useful stopgap for limiting the lifetimes of credentials when your tools don’t support more sophisticated approaches to secret storage.1

Post Script

If you are interested in secure secret storage, my micro-project secretly might be of interest. Right now it doesn’t do a whole lot; it’s just a small wrapper around the excellent keyring module and the pinentry / pinentry-mac password prompt tools. secretly presents an interface both for prompting users for their credentials without requiring the command-line or env vars, and for saving them away in keychain, for tools that need to pull in an API key and don’t want to make the user manually edit a config file first.


  1. Really, PyPI should have API keys that last for some short amount of time, that automatically expire so you don’t have to freak out if you gave somebody a 5-year-old laptop and forgot to wipe it first. But again, if I wanted that so bad, I should have implemented it myself... 

Photo Flow

How do you edit and share photos when more than one person is involved?

Hello, the Internet. If you don’t mind, I’d like to ask you a question about photographs.

My spouse and I both take pictures. We both anticipate taking more pictures in the near future. No reason, just a total coincidence.

We both have iPhones, and we both have medium-nice cameras that are still nicer than iPhones. We would both like to curate and touch up these photos and actually do something with them; ideally we would do this curation collaboratively, whenever either of us has time.

This means that there are three things we want to preserve:

  1. The raw, untouched photographs, in their original resolution,
  2. The edits that have been made to them, and
  3. The “workflow” categorization that has been done to them (minimally, “this photo has not been looked at”, “this photo has been looked at and it’s not good enough to bother sharing”, “this photo has been looked at and it’s good enough to be shared if it’s touched up”, and “this has been/should be shared in its current state”). Generally speaking this is a “which album is it in” categorization.

I like Photos. I have a huge photo library with years of various annotations in it, including faces (the only tool I know of that lets you do offline facial recognition so you can automatically identify pictures of your relatives without giving the police state a database to do the same thing).

However, iCloud Photo Sharing has a pretty major issue; it downscales photographs to “up to 2048 pixels on the long edge”, which is far smaller even than the 12 megapixels that the iPhone 7 sports; more importantly it’s lower resolution than our television, so the image degradation is visible. This is fine for sharing a pic or two on a small phone screen, but not good for a long-term archival solution.

To complicate matters, we also already have an enormous pile of disks in a home server that I have put way too much energy into making robust; a similarly-sized volume of storage would cost about $1300 a year with iCloud (and would not fit onto one account, anyway). I’m not totally averse to paying for some service if it’s turnkey, but something that uses our existing pile of storage would definitely get bonus points.

Right now, my plan is to dump all of our photos into a shared photo library on a network drive, only ever edit them at home, try to communicate carefully about when one or the other of us is editing it so we don’t run into weird filesystem concurrency issues, and hope for the best. This does not strike me as a maximally robust solution. Among other problems, it means the library isn’t accessible to our mobile devices. But I can’t think of anything better.

Can you? Email me. If I get a really great answer I’ll post it in a followup.

Beyond ThunderDock

I Plugged Some Stuff Into A Thunderbolt Dock. You Won’t Believe what Happens Next

This weekend I found myself pleased to receive a Kensington SD5000T Thunderbolt 3 Docking Station3.

Some of its functionality was a bit of a weird surprise.

The Setup

Due to my ... accretive history with computer purchases, I have 3 things on my desk at home: a USB-C macbook pro, a 27" Thunderbolt iMac, and an older 27" Dell display, which is old enough at this point that I can’t link it to you. Please do not take this to be some kind of totally sweet setup. It would just be somewhat pointlessly expensive to replace this jumble with something nicer3. I purchased the dock because I want to have one cable to connect me to power & both displays.

For those not familiar, iMacs of a certain vintage1 can be jury-rigged to behave as Thunderbolt displays with limited functionality (no access from the guest system to the iMac’s ethernet port, for example), using Target Display Mode, which extends their useful lifespan somewhat. (This machine is still, relatively speaking, a powerhouse, so it’s not quite dead yet; but it’s nice to be able to swap in my laptop and use the big screen.)

On the back of the Thunderbolt dock, there are 2 Thunderbolt 3 ports. I plugged the first one into a Thunderbolt 3 to Thunderbolt 2 adapter which connects to the back of the iMac, and the second one into the Macbook directly. The Dell display plugs into the DisplayPort; I connected my network to the Ethernet port of the dock. My mouse, keyboard, and iPhone were plugged into the USB ports on the dock.

The Problem

I set it up and at first it seemed to be delivering on the “one cable” promise of thunderbolt 3. But then I switched WiFi off to test the speed of the wired network and was surprised to see that it didn’t see the dock’s ethernet port at all. Flipping wifi back on, I looked over at my router’s control panel and noticed that a new device (with the expected manufacturer) was on my network. nmap seemed to indicate that it was... running exactly the network services I expected to see on my iMac. VNCing into the iMac to see what was going on, I popped open the Network system preference pane, and right there alongside all the other devices, was the thunderbolt dock’s ethernet device.

The Punch Line

Despite the miasma of confusion surrounding USB-C and Thunderbolt 32, the surprise here is that apparently Thunderbolt is Thunderbolt, and (for this device at least) Thunderbolt devices connected across the same bus can happily drive whatever they’re plugged in to. The Thunderbolt 2 to 3 adapter isn’t just a fancy way of plugging in hard drives and displays with the older connector; as far as I can tell all the functionality of the Thunderbolt interface remains intact as both “host” and “guest”. It’s like having an ethernet switch for your PCI bus.

What this meant is that when I unplugged everything and then carefully plugged in the iMac before the Macbook, it happily lit up the Dell display, and connected to all the USB devices plugged into the USB hub. When I plugged the laptop in, it happily started charging, but since it didn’t “own” the other devices, nothing else connected to it.

Conclusion

This dock works a little bit too well; when I “dock” now I have to carefully plug in the laptop first, give it a moment to grab all the devices so that it “owns” them, then plug in the iMac, then use this handy app to tell the iMac to enter Target Display mode.

On the other hand, this does also mean that I can quickly toggle between “everything is plugged in to the iMac” and “everything is plugged in to the MacBook” just by disconnecting and reconnecting a single cable, which is pretty neat.


  1. Sadly, not the most recent fancy 5K ones. 

  2. which are, simultaneously, both the same thing and not the same thing. 

  3. Paid links. See disclosures

The Sororicide Antipattern

Don’t murder your parents or your siblings to get their attributes.

Composition is better than inheritance.”. This is a true statement. “Inheritance is bad.” Also true. I’m a well-known compositional extremist. There’s a great talk you can watch if I haven’t talked your ear off about it already.

Which is why I was extremely surprised in a recent conversation when my interlocutor said that while inheritance might be bad, composition is worse. Once I understood what they meant by “composition”, I was even more surprised to find that I agreed with this assertion.

Although inheritance is bad, it’s very important to understand why. In a high-level language like Python, with first-class runtime datatypes (i.e.: user defined classes that are objects), the computational difference between what we call “composition” and what we call “inheritance” is a matter of where we put a pointer: is it on a type or on an instance? The important distinction has to do with human factors.

First, a brief parable about real-life inheritance.


You find yourself in conversation with an indolent heiress-in-waiting. She complains of her boredom whiling away the time until the dowager countess finally leaves her her fortune.

“Inheritance is bad”, you opine. “It’s better to make your own way in life”.

“By George, you’re right!” she exclaims. You weren’t expecting such an enthusiastic reversal.

“Well,”, you sputter, “glad to see you are turning over a new leaf”.

She crosses the room to open a sturdy mahogany armoire, and draws forth a belt holstering a pistol and a menacing-looking sabre.

“Auntie has only the dwindling remnants of a legacy fortune. The real money has always been with my sister’s manufacturing concern. Why passively wait for Auntie to die, when I can murder my dear sister now, and take what is rightfully mine!”

Cinching the belt around her waist, she strides from the room animated and full of purpose, no longer indolent or in-waiting, but you feel less than satisfied with your advice.

It is, after all, important to understand what the problem with inheritance is.


The primary reason inheritance is bad is confusion between namespaces.

The most important role of code organization (division of code into files, modules, packages, subroutines, data structures, etc) is division of responsibility. In other words, Conway’s Law isn’t just an unfortunate accident of budgeting, but a fundamental property of software design.

For example, if we have a function called multiply(a, b) - its presence in our codebase suggests that if someone were to want to multiply two numbers together, it is multiply’s responsibility to know how to do so. If there’s a problem with multiplication, it’s the maintainers of multiply who need to go fix it.

And, with this responsibility comes authority over a specific scope within the code. So if we were to look at an implementation of multiply:

1
2
3
def multiply(a, b):
    product = a * b
    return product

The maintainers of multiply get to decide what product means in the context of their function. It’s possible, in Python, for some other funciton to reach into multiply with frame objects and mangle the meaning of product between its assignment and return, but it’s generally understood that it’s none of your business what product is, and if you touch it, all bets are off about the correctness of multiply. More importantly, if the maintainers of multiply wanted to bind other names, or change around existing names, like so, in a subsequent version:

1
2
3
4
5
def multiply(a, b):
    factor1 = a
    factor2 = b
    result = a * b
    return result

It is the maintainer of multiply’s job, not the caller of multiply, to make those decisions.

The same programmer may, at different times, be both a caller and a maintainer of multiply. However, they have to know which hat they’re wearing at any given time, so that they can know which stuff they’re still repsonsible for when they hand over multiply to be maintained by a different team.

It’s important to be able to forget about the internals of the local variables in the functions you call. Otherwise, abstractions give us no power: if you have to know the internals of everything you’re using, you can never build much beyond what’s already there, because you’ll be spending all your time trying to understand all the layers below it.

Classes complicate this process of forgetting somewhat. Properties of class instances “stick out”, and are visible to the callers. This can be powerful — and can be a great way to represent shared data structures — but this is exactly why we have the ._ convention in Python: if something starts with an underscore, and it’s not in a namespace you own, you shouldn’t mess with it. So: other._foo is not for you to touch, unless you’re maintaining type(other). self._foo is where you should put your own private state.

So if we have a class like this:

1
2
3
class A(object):
    def __init__(self):
        self._note = "a note"

we all know that A()._note is off-limits.

But then what happens here?

1
2
3
4
class B(A):
    def __init__(self):
        super().__init__()
        self._note = "private state for B()"

B()._note is also off limits for everyone but B, except... as it turns out, B doesn’t really own the namespace of self here, so it’s clashing with what A wants _note to mean. Even if, right now, we were to change it to _note2, the maintainer of A could, in any future release of A, add a new _note2 variable which conflicts with something B is using. A’s maintainers (rightfully) think they own self, B’s maintainers (reasonably) think that they do. This could continue all the way until we get to _note7, at which point it would explode violently.


So that’s why Inheritance is bad. It’s a bad way for two layers of a system to communicate because it leaves each layer nowhere to put its internal state that the other doesn’t need to know about. So what could be worse?

Let’s say we’ve convinced our junior programmer who wrote A that inheritance is a bad interface, and they should instead use the panacea that cures all inherited ills, composition. Great! Let’s just write a B that composes in an A in a nice clean way, instead of doing any gross inheritance:

1
2
3
4
class Bprime(object):
    def __init__(self, a):
        for var in dir(a):
            setattr(self, var, getattr(a, var))

Uh oh. Looks like composition is worse than inheritance.


Let’s enumerate some of the issues with this “solution” to the problem of inheritance:

  • How do we know what attributes Bprime has?
  • How do we even know what type a is?
  • How is anyone ever going to grep for relevant methods in this code and have them come up in the right place?

We briefly reclaimed self for Bprime by removing the inheritance from A, but what Bprime does in __init__ to replace it is much worse. At least with normal, “vertical” inheritance, IDEs and code inspection tools can have some idea where your parents are and what methods they declare. We have to look aside to know what’s there, but at least it’s clear from the code’s structure where exactly we have to look aside to.

When faced with a class like Bprime though, what does one do? It’s just shredding apart some apparently totally unrelated object, there’s nearly no way for tooling to inspect this code to the point that they know where self.<something> comes from in a method defined on Bprime.

The goal of replacing inheritance with composition is to make it clear and easy to understand what code owns each attribute on self. Sometimes that clarity comes at the expense of a few extra keystrokes; an __init__ that copies over a few specific attributes, or a method that does nothing but forward a message, like def something(self): return self.other.something().

Automatic composition is just lateral inheritance. Magically auto-proxying all methods1, or auto-copying all attributes, saves a few keystrokes at the time some new code is created at the expense of hours of debugging when it is being maintained. If readability counts, we should never privilege the writer over the reader.


  1. It is left as an exercise for the reader why proxyForInterface is still a reasonably okay idea even in the face of this criticism.2 

  2. Although ironically it probably shouldn’t use inheritance as its interface. 

So You Want To Web A Twisted

I’m live-streaming a webinar on Twisted service architecture.

As a rehearsal for our upcoming tutorial at PyCon, Creating And Consuming Modern Web Services with Twisted, Moshe Zadka, we are doing a LIVE STREAM WEBINAR. You know, like the kids do, with the video games and such.

As the webinar gods demand, there is an event site for it, and there will be a live stream.

This is a practice run, so expect “alpha” quality content. There will be an IRC channel for audience participation, and the price of admission is good feedback.

See you there!