Linux has two special files that provide a stream of random bytes:
/dev/random
and /dev/urandom
. The two streams work identically, except for
one detail: /dev/random
tracks how much entropy (i.e. “randomness”) has been
“used up”, and blocks if there isn’t enough available.
So, you better use /dev/random
, right? Nope. This design fundamentally
misinterprets how entropy works: entropy doesn’t get “used up”. /dev/random
doesn’t actually have any value (except to attackers, who now have a new way to
make your server hang). /dev/urandom
is the right choice, even for
generating encryption keys.
A brief exploration of your typical crypto RNG
This seems counter-intuitive, so let’s look at how a typical crypto-friendly PRNG works. To keep things brief, I’ll omit a few specifics that are unimportant to our discussion here (if you’re planning to implement a PRNG, make sure to look them up!).
By default, computer programs are deterministic – if run in the same way with the same inputs, they will produce the same outputs. Normally, this is actually quite helpful, but a random number generator that outputs the same number every time wouldn’t be particular useful. RNGs solve this by mixing together one or more sources of (hopefully) true randomness to produce a seed. These sources of randomness can include the precise timings of keystrokes, thermal noise, or whatever is available.
The RNG uses the seed to produce an arbitrary amount of random bits on demand. Often, this process is built around a block cipher like AES. While producing output, the RNG keeps track of two values: an encryption key (generated from the seed) and a counter. Each block of output is the current value of the counter encrypted using the block cipher and key. The output is different for each block, because a small change in the input value to a block cipher completely changes the output:
The RNG will continue to generate blocks of output as long as is needed. But is this secure?
- If an attacker could somehow use the RNG output to derive the key, that would be quite bad. But that would imply that they were able to crack AES encryption, which, after over 15 years of analysis from cryptographers, seems unlikely to happen in the near future.
- If an attacker could somehow use earlier output blocks to predict later blocks, that would also be bad. Fortunately, block ciphers like AES are again designed to avoid even subtle flaws like this one (specifically, this would make the block cipher distinguishable from a pseudorandom permutation).
OK, so this seems safe, at least superficially (and more rigorous analysis also gives encouraging results). This still doesn’t answer our original question, though: does the RNG become less secure as it gets used? Does it need to be reseeded periodically?
Crypto is for today, but entropy is forever
Let’s talk about entropy. Here’s a string:
AOlOSlGeMEDjwhEbL8xCMaHctgxDK7w1iQ8d8cOCtUJjdL70P8
This string is not random, and it does not have any entropy. It’s just a string. That said, however, this string was indeed generated randomly. It was indeed selected at random from the set of 50-character strings that use only the characters A-Z, a-z, and 0-9. The process that generated this string has about 298 bits of entropy.
This seems pedantic, but let’s look at another example. Suppose we have a (terrible) password generator that randomly outputs one of 8 different passwords:
2TciUbl2GlgAbITtgwoKEP4XXA2nhLpgp7G3vcw6LI4DeHvzsl
AOlOSlGeMEDjwhEbL8xCMaHctgxDK7w1iQ8d8cOCtUJjdL70P8
LlQpXr0oVpc3hashzndHlyVE6bGwRK14CZHmunXISCVoSx7Ald
m0Uh5jy4Je27UAPaQXfSoGFDVuPMoVDyNmvMBvZ491kPHJJrqB
461rFqv4e1bqdbgPBMSknQueHt0xb607TiWSM2H2IklMIxcDl4
8xK032c9JUUFmINW7zcvRRMksfMTQpTZgKKhYXctULOxoPTq0G
imEUHfMrFOFcv91iJlETTalPtP5qVSu8swFstN0vrNLFTtWj09
66iqd83nxmCYhLD4NsaRLA5iwSkl7WvL1XlSzhHWUMfieFaN5z
Let’s suppose it outputs the second option, AOlOSlGeME...
. Since there are
only 8 options, our password was generated with only 3 bits of entropy (8 =
2^3). Strangely enough, this is the same string from our last example, which
had 298 bits of entropy. What happened?
Entropy is a property of the process through which data is generated, not of data itself. This is a subtle but important distinction that affects many aspects of information security involving randomness.
Thinking back to our RNG, we know that the seed was generated through a process involving plenty of entropy. The algorithm that derives the encryption key from the seed is designed to preserve as much entropy possible (limited only by the key size). The same story goes for the block cipher. Putting it all together, it’s straightforward to conclude that a block of output would be generated with plenty of entropy, limited only by the original seeding process, the encryption key length, and block cipher output size. And you can make the same argument for every one of the RNG’s output blocks.
Infinite entropy?
But hold on, doesn’t this mean that an RNG’s output can have more entropy than it was originally seeded with?
This is where things get tricky. Let’s suppose our RNG is seeded with 256 bits of entropy. This RNG can produce as many 256-bit outputs as we want, and every one of these outputs will have been created with 256 bits of entropy. But if we were to assemble these outputs into one large string, the process that created that larger output would involve only 256 bits of entropy.
And that’s where the pedantries pay off: entropy is a property of the process that generates the output, not the output itself.
Another thought experiment: suppose we had two separate RNGs, seeded separately from independent sources, each with 100 bits of entropy. If we combined outputs from both RNGs, the most entropy we could possibly hope to get is 2 x 100 bits = 200 bits.
In practice, there’s only so much entropy needed for cryptographic operations to be secure. Once seeded with 256 bits of entropy, the random number generator is no longer the weakest part of the system.