Separate your Fakes and your Inspectors

Test fakes have two pieces.

When you are writing unit tests, you will commonly need to write duplicate implementations of your dependencies to test against systems which do external communication or otherwise manipulate state that you can’t inspect. In other words, test fakes. However, a “test fake” is just one half of the component that you’re building: you’re also generally building a test inspector.

As an example, let’s consider the case of this record-writing interface that we may need to interact with.

1
2
3
4
5
6
class RecordWriter(object):
    def write_record(self, record):
        "..."

    def close(self):
        "..."

This is a pretty simple interface; it can write out a record, and it can be closed.

Faking it out is similarly easy:

1
2
3
4
5
class FakeRecordWriter(object):
    def write_record(self, record):
        pass
    def close(self):
        pass

But this fake record writer isn’t very useful. It’s a simple stub; if our application writes any interesting records out, we won’t know about it. If it closes the record writer, we won’t know.

The conventional way to correct this problem, of course, is to start tracking some state, so we can assert about it:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
class FakeRecordWriter(object):
    def __init__(self):
        self.records = []
        self.closed = False

    def write_record(self, record):
        if self.closed:
            raise IOError("cannot write; writer is closed")
        self.records.append(record)

    def close(self):
        if self.closed:
            raise IOError("cannot close; writer is closed")
        self.closed = True

This is a very common pattern in test code. However, it’s an antipattern.

We have exposed 2 additional, apparently public attributes to application code: .records and .closed. Our original RecordWriter interface didn’t have either of those. Since these attributes are public, someone working on the application code could easily, inadvertently access them. Although it’s unlikely that an application author would think that they could read records from a record writer by accessing .records, it’s plausible that they might add a check of .closed before calling .close(), to make sure they won’t get an exception. Such a mistake might happen because their IDE auto-suggested the completion, for example.

The resolution for this antipattern is to have a separate “fake” object, exposing only the public attributes that are also on the object being faked, and an “inspector” object, which exposes only the functionality useful to the test.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
class WriterState(object):
    def __init__(self):
        self.records = []
        self.closed = False

    def raise_if_closed(self):
        if self.closed:
            raise ValueError("already closed")


class _FakeRecordWriter(object):
    def __init__(self, writer_state):
        self._state = writer_state

    def write_record(self, record):
        self._state.raise_if_closed()
        self._state.records.append(record)

    def close(self):
        self._state.raise_if_closed()
        self._state.closed = True


def create_fake_writer():
    state = WriterState()
    return state, _FakeRecordWriter(state)

In this refactored example, we now have a top-level entry point of create_fake_writer, which always creates a pair of WriterState and thing-which-is-like-a-RecordWriter. The type of _FakeRecordWriter can now be private, because it’s no longer interesting on its own; it exposes nothing beyond the methods it’s trying to fake.

Whenever you’re writing test fakes, consider writing them like this, to ensure that you can hand application code the application-facing half of your fake, and test code the test-facing half of the fake, and not get them mixed up.

Security as Stencil

If you’re writing a “secure” email program, it needs to be a good email program.

Image Credit: Horia Varlan

On the Internet, it’s important to secure all of your communications.

There are a number of applications which purport to give you “secure chat”, “secure email”, or “secure phone calls”.

The problem with these applications is that they advertise their presence. Since “insecure chat”, “insecure email” and “insecure phone calls” all have a particular, detectable signature, an interested observer may easily detect your supposedly “secure” communication. Not only that, but the places that you go to obtain them are suspicious in their own right. In order to visit Whisper Systems, you have to be looking for “secure” communications.

This allows the adversary to use “security” technologies such as encryption as a sort of stencil, to outline and highlight the communication that they really want to be capturing. In the case of the NSA, this dumps anyone who would like to have a serious private conversation with a friend into the same bucket, from the perspective of the authorities, as a conspiracy of psychopaths trying to commit mass murder.

The Snowden documents already demonstrate that the NSA does exactly this; if you send a normal email, they will probably lose interest and ignore it after a little while, whereas if you send a “secure” email, they will store it forever and keep trying to crack it to see what you’re hiding.

If you’re running a supposedly innocuous online service or writing a supposedly harmless application, the hassle associated with setting up TLS certificates and encryption keys may seem like a pointless distraction. It isn’t.

For one thing, if you have anywhere that user-created content enters your service, you don’t know what they are going to be using it to communicate. Maybe you’re just writing an online game but users will use your game for something as personal as courtship. Can we agree that the state security services shouldn’t be involved in that?. Even if you were specifically writing an app for dating, you might not anticipate that the police will use it to show up and arrest your users so that they will be savagely beaten in jail.

The technology problems that “secure” services are working on are all important. But we can’t simply develop a good “secure” technology, consider it a niche product, and leave it at that. Those of us who are software development professionals need to build security into every product, because users expect it. Users expect it because we are, in a million implicit ways, telling them that they have it. If we put a “share with your friend!” button into a user interface, that’s a claim: we’re claiming that the data the user indicates is being shared only with their friend. Would we want to put in a button that says “share with your friend, and with us, and with the state security apparatus, and with any criminal who can break in and steal our database!”? Obviously not. So let’s stop making the “share with your friend!” button actually do that.

Those of us who understand the importance of security and are in the business of creating secure software must, therefore, take on the Sisyphean task of not only creating good security, but of competing with the insecure software on its own turf, so that people actually use it. “Slightly worse to use than your regular email program, but secure” is not good enough. (Not to mention the fact that existing security solutions are more than “slightly” worse to use). Secure stuff has to be as good as or better than its insecure competitors.

I know that this is a monumental undertaking. I have personally tried and failed to do something like this more than once. As the Rabbi Tarfon put it, though:

It is not incumbent upon you to complete the work, but neither are you at liberty to desist from it.

Public or Private?

To make data public or not to make data public, that is the question.

If I am creating a new feature in library code, I have two choices with the implementation details: I can make them public - that is, exposed to application code - or I can make them private - that is, for use only within the library.

https://www.flickr.com/photos/skyrim/6518329775/

If I make them public, then the structure of my library is very clear to its clients. Testing is easy, because the public structures may be manipulated and replaced as the tests dictate. Inspection is easy, because all the data is exposed for clients to manipulate. Developers are happy when they can manipulate and test things easily. If I select "public" as the general rule, then developers using my library will be happy, because they'll be able to inspect and test everything quite easily whether I specifically designed in support for that or not.

However, now that they're public, I have to support them in their current form into the forseeable future. Since I tend to maintain the libraries I work on, and maintenance necessarily means change, a large public API surface means a lot of ongoing changes to exposed functionality, which means a constant stream of deprecation warnings and deprecated feature removals. Without private implementation details, there's no axis on which I can change my software without deprecating older versions. Developers hate keeping up with deprecation warnings or having their applications break when a new version of a library comes out, so if I adopt "public" as the general rule, developers will be unhappy.

https://www.flickr.com/photos/lisasaunders18/14061397865

If I make them private, then the structure of my library is a lot easier to understand by developers, because the API surface is much smaller, and exposes only the minimum necessary to accomplish their tasks. Because the implementation details are private, when I maintain the library, I can add functionality "for free" and make internal changes without requiring any additional work from developers. Developers like it when you don't waste their time with trivia and make it easier to get to what they want right away, and they love getting stuff "for free", so if I adopt "private" as the general rule, developers will be happy.

However, now that they're private, there's potentially no way to access that functionality for unforseen use-cases, and testing and inspection may be difficult unless the functionality in question was designed with an above-average level of care. Since most functionality is, by definition, designed with an average level of care, that means that there will inevitably be gaps in these meta-level tools until the functionality has already been in use for a while, which means that developers will need to report bugs and wait for new releases. Developers don't like waiting for the next release cycle to get access to functionality that they need to get work done right now, so if I adopt "private" as the general rule, developers will be unhappy.

Hmm.