Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

And then your compiler will optimize it to this:

    int token_length = strlen(token);
    
    // Just get this out of the way now; no need to dawdle
    // (because the length of a token is generally known)
    if (token_length != strlen(input)) {
        return false;
    }
    
    // No need to worry about mismatched buffer lengths at this point
    for(int i=0; i<token_length; i++) {
        if (token[i] != input[i]) {
            return false;
        }
    }
    return true;
And all of a sudden the timing attack is back.

You cannot write a constant-time function in portable C / C++, and trying to do so is a fool's game.



I was willing to concede your point, but I tested first. Using clang to compile, I did some profiling on my function optimized to various levels (-O0...-O3, -Os, -Ofast) with various inputs. The variation in timing is noise (I get the same variation on the same test inputs in different runs of the program.) I'd be curious to know whether you can provide a combination of compiler and compiler options that do indeed produce noticeable variations in timing as a function of the inputs.

"You cannot write a constant-time function in portable C / C++..." perhaps you mean that we can't outsmart compiler optimizations. Without optimizations ("Debug" configuration), the code runs exactly as written.

Ultimately, I'm going to prefer code with obvious intent regardless of how the compiler will optimize the thing. We'll just have to find a way around this timing attack business.


I would be very surprised if a compiler on a modern high-performance processor actually did this optimization - as deep pipelines and speculative execution mean that the additional conditional branch per loop may hurt more than it helps.

That being said, on a simple architecture - namely a shallow pipeline - this sort of thing is an "obvious" optimization.

And w.r.t. "without optimizations" - there is no such thing at the C / C++ level. There are computer architectures where optimizations pretty much have to be done, for instance. (Many architectures with compile-time scheduling, for instance.)

For instance, if you have a dataflow machine it may end up running this in non-constant time. And I'll note that modern x86 processors are looking more and more like dataflow machines.


Why can't you just set a bit for each non-match and then bitwise-or them all together, returning false if non-zero? I don't think a compiler can optimize that away.


It's the exact same optimization either way.

One is just relying on adding a positive number never makes a number > 0 0 again. The other is just relying on oring a bit to a non-zero number never makes it zero again.


I don't understand. I thought the timing attack was because the logic short circuits when it finds a non-match. How does the compiler short circuit the bitwise-or?

At first I was thinking you could write the bits out to a separate word, but I think this suffices:

  uint result = 0;
  for (i = 0; i < length; i++)
    result |= notmatch (i);
  return result == 0;
You would need value range analysis for the use of result in addition to the insertion of an extra test and branch to short circuit the loop, plus a model of what |= does to values. Do any compilers actually do this?

But, actually I'm kind of confused about your first point. What optimizations did your compiler use to eliminate the match_count++ and hoist the return into the loop?


As I've said elsewhere, current compilers are unlikely to do so. Current processors have too deep a pipeline to make it worth it.

But on an architecture with a short pipeline it can be worth it for a compiler to do so.

And it's simple, in theory if nothing else. Note that once result has a bit set, result cannot ever return to zero. Hence, if result has a bit set, one can return false immediately. And all of a sudden you lose the constant-time part.

As for how a current compiler could do so? Look at what happens if/when the loop is unrolled.

That current compilers don't tend to do this sort of optimization makes it only more dangerous - because people do not go back and re-examine old code nearly as often as they should. Much less check the assembly every time they update the compiler.


Let's assume you're right. Why can't you choose characters to compare randomly, until you've compared them all?


You can still do a timing attack against that. Suppose the password is "password". You do a timing attack to find the correct length, then you start trying "aaaaaaaa", "bbbbbbbb", "cccccccc", ... The ones that take less time on average are the ones that contain a letter from the password. You can go from there. Ever played mastermind?

It's a mitigation but it doesn't prevent the attack.

Not to mention: how exactly will you check that you've compared them all?


Ah, mastermind. I think you mean the ones that take more time on average contain a letter from the password, not less. Anyway, good point.

I was thinking you could maintain an array of flags to indicate whether you've compared a certain position before, and a count of all the compared positions so far.

Alright, here are my other ideas:

1) Properly chosen 8-character passwords are pretty strong, right? So why not copy all of the 8-bit chars into a u64 and compare that directly? You can treat longer passwords as a series of 8-char passwords. Assumes a machine that won't short circuit on u64_a == u64_b. Less effective for 32-bit. The compiler won't undo this since it's an optimization (1x aligned 64-bit compare is more efficient than 8x 8-bit compares, seven of which are unaligned).

2) Introduce a random delay after the byte-wise comparison is done that is up to 10x the length of the comparison. The comparison variance gets lost in delay variance. I know, a mitigation, but it's effective. Combine with random selection of characters for more effectiveness.

3) Use a perfect hash of the password. You don't need to compare keys after a perfect hash.

Thanks for humoring me.


Whoops, yep, that is incorrect. Should be more time, as you said.

1) That causes massive problems for people (like me) who use passphrases. (I use diceware passwords for anything I don't use a password manager for. So things like "corn umpire khaki dow heave hiatt sis steal". That's ~103 bits of entropy (massive overkill for most things), and yet is a whole lot easier to remember than, say, "FlyaJdqJW6kvyUQeE" - which is ~101 bits of entropy (17 random upper/lower/digit characters).

It also assumes that the machine doesn't short-circuit, as you say.

And, again, you're assuming that the compiler doesn't undo it. Just because it won't be undone by optimizations on current machines doesn't mean it won't be undone by optimizations on future ones. On a RISC architecture, for instance, a 64-bit compare may not even exist - it may be implemented by 8-bit compares. Or 16, or 32. Whatever.

2) Ever heard of the german tank problem? That will not drown out the comparison, not at all. I can defeat your example (10x noise as comparison time) with >90% accuracy with 100 samples. (Just take samples, and take the mean of the sample minimum and maximum.)

3) Without leaking other people's passwords every time you generate a password for someone?

I find this sort of conversation intriguing. I want things to be more secure, I just don't know of a good way to do so.


A perfect hash is a function in the mathematical sense in that there is a unique output for each input. So, hash the stored password ahead of time, and hash the challenge password after receiving it, and then compare the two hashes. You don't need collision detection, since a match can only come from two identical keys. You don't generate passwords for people, although you could, as long as you don't reveal your hash function. There are useless perfect hashes, like "password + 1", and there are much better ones that produce near-random output.


I know what a perfect hash is.

But a) pigeonhole principle (you need an output at least as long as the longest string you'll encounter), and b) I was under the impression that perfect hash schemes require you to know the set of strings encountered ahead-of-time.

Could you give me an example of a perfect hash function that is suitable and secure enough for password use? I do not know of any.


You're right, there's a hard limit on password length. That might be fatal if you're using diceware, I don't know if you could accommodate passwords that big. As for the set of strings, just assume it's every string possible, so 256^8 strings for an 8 byte password. That's only 16384 petabytes.

All of the perfect hash functions I've found require knowing all of the keys ahead of time. You might not be able to fit 16384 PB on your hard drive or in memory. (Hey, I don't know. You could work at LLNL.) But I think that if you knew you might use any key in the space, and you didn't insist on a minimal perfect hash, i.e. one where there is a 1:1 mapping from hashes back to keys, that you could write such a function without having all of the keys.

If you did this, you'd also need to prove you were generating a unique and effectively random hash. If I was a crypto academic, that might be an interesting line of research. I'd be kind of surprised if nobody ever tried this though, and there's probably a decent amount of literature to pore over. I guess most people use perfect hashes for hashtables and not as a defense against timing attacks.


Congratulations. You've just described the standard method of password hashing. (Well, you've missed the common extensions of salt / pepper, but meh.)

If you pick a cryptographically secure hash that's long enough, you don't need to worry about collisions. The birthday problem dictates that you start getting collisions after ~sqrt(D) random entries. So if you pick a 128+ bit hash you should be fine on that front (128 bits -> ~2^64 entries before a collision on average, which shouldn't be a problem.)

However, as I've mentioned elsewhere, all this does is punt the constant-time problem off to the hash function.


Wow, maybe I have a future in cryptography!

Anyway, why exactly isn't the hash function constant-time? I don't understand this, the hashes I've played with for hashtables are just a bunch of bit shifts. Is it only message length?


Maybe you do...

The hashes generally used for things like hash tables don't need to be cryptographically secure, and as such can be substantially simpler. When you start getting into cryptographically secure hashes things get complex enough that you start ending up with timing variations if you're not very careful. Especially once you start talking about table lookups / etc (although that's more common with encryption algorithms than straight hash algorithms).

And the compiler - with C and C++ at least - is free to optimize in ways that introduce timing variations. Which is what sparked this conversation in the first place. So even if your source code doesn't take data-dependent time, the compiler may make the compiled output take data-dependent time. And there's no way around that in pure C / C++.


Ok, fine, so there are no real-time, optimization, or scheduling guarantees in C / C++ and it's better to be safe than sorry. But why does it actually matter if either the hash function or hash value comparison takes data-dependent time? How can you use that information to recover the password?


Let's say the hash value comparison short-circuits by hex digit:

    char[] entered+_hash = ...
    char[] password_hash = ...
    for (int i = 0; i < entered_hash.length; i++)
        if entered_hash[i] != password_hash[i]:
            return false;
    return true;
This is a modification of a dictionary attack. as such, it assumes that the person has used a known password.

I take a standard password dictionary and hash everything. I arrange it into a Trie or somesuch by the hash.

I start trying passwords. Specifically: I try passwords where the first character of the hash is different. So, for instance, with SHA256, I try:

     12345 5994471abb01112afcc18159f6cc74b4f511b99806da59b3caf5a9c173cacfc5
     abc123 6ca13d52ca70c883e0f0bb101e425a89e8624de51db2d2392593af6a84118090
     computer aa97302150fce811425cd84537028a5afbe37e3f1362ad45a51d467e17afdc9c
     123456 8d969eef6ecad3c29a3a629280e686cf0c3f5d5a86aff3ca12020c923adc6c92
     1234 03ac674216f3e15c761ee1a5e255f067953623c8b388b4459e13f978d7c846f4
     a1b2c3 4f32044a655f32e8528edea64dbfd11cba810b8790e6e6e23d28ad3a75980734
     xxx cd2eb0837c9b4c962c22d2ff8b5441b7b45805887f051d39bf133b583baf6860
     test 9f86d081884c7d659a2feaa0c55ad015a3bf4f1b2b0b822cd15d6c15b0f00a08
     carmen f3c2ce176290b0c384cb4881eb714f2db58f630c33863d91c9bedf58d36007db
     mickey 33c614ca3cf78827a85dc0d8d06bfcf8c4d923fd23c813acd50b80ed2d4d4fb3
     secret 2bb80d537b1da3e38bd30361aa855686bde0eacd7162fef6a25fe97bf527a25b
     summer e83664255c6963e962bb20f9fcfaad1b570ddf5da69f5444ed37e5260f3ef689
     ranger dbc4a04327176e6577b4da46df04564150053960eba5d89587dad1f76a818d80
     letmein 1c8bfe8f801d79745c4631d09fff36c82aa37fc4cce4fc946683d7b336b63032
     mindy 7376c22801fbd6e01009830a70028820f280958165e6d3c2bac9683dab28feb7
     bear bc98bb50e8094b2ac3ceb90ba2512587c0513cd294a07efcfdcf467198da6266
One of these should take slightly more time than the others. Let's say it's 'carmen'. I now know that the first character of the password hash is 'f'. I then try passwords where the first character of the password hash is 'f' and the second is different. Repeat until I have the entire password.

Note that this can also turn an online brute-force attack into an offline brute-force attack with a small online component. (The only difference being instead of a password dictionary, you brute-force for hashes with the known bits.)

Note that a salt - if the salt is never leaked - prevents this attack.

(Except that if you have an account yourself you can do a timing attack against your own account to try to figure out the salting scheme!)

--------

If the password hash itself takes data-dependent time... There are plenty of ways to go from that to breaking passwords. I could elaborate if you wish.


Hey this is pretty interesting. Thanks for taking the time to explain. My first reaction was that you have to know the hash function for this attack to work, but I guess the salt serves the same purpose as hiding the hash function, and is more practical since then you can use a generic hash. Presumably someone figured out a way to prevent timing attacks against your own account - what is it?

Sure, assume that the hash takes data-dependent time, but that's the only vulnerability. I can see that this might reveal password length, but not easily beyond that. How does it work? Does SHA256 with salt take constant time?

In general, what is the most secure password scheme if you're looking to prevent timing attacks? Do you need real-time guarantees? You have enough material for a nice "evolution of secure password authentication" article here, if you ever wanted to write it up.


> Presumably someone figured out a way to prevent timing attacks against your own account - what is it?

As long as the salt has sufficient entropy, this isn't a problem. So instead of generating the salt from the username / etc, you use a random salt and store it.

> How does it work?

Effectively, by leaking internal state of the cypher / hash through timing variations. The most common ways of doing so are either through data-dependent table lookups or through operations that may be microcoded or otherwise take variable time (multiplications, rotations sometimes, divisions, modulo, squaring).

See, for instance, http://cr.yp.to/streamciphers/leaks.html - although note that this is dealing with stream cyphers not hash functions. In particular, the linked paper describes an effective timing attack against AES by taking advantage of processor cache effects on a data-dependent table lookup. I'd give it a read - it's quite interesting.

SHA256 is pretty good w.r.t. being close to constant time for a fixed input length. Not perfect, but meh. It uses bitwise xor and and, rotates, shifts, and addition, none of which are generally variable-time (although note that on some platforms rotates are variable-time).

For most things, SHA2, by which I mean the newer variants, works reasonably well. In particular, you store SHA2(password xor random_salt xor pepper) and discard the original password. Store the salt in the database with the user, and if you want to get really fancy you embed a single random value ("pepper") in the application code - this is a kind of last-ditch effort to hopefully prevent user passwords from being leaked even if the database is leaked.

If you really want to get fancy, you also embed a couple "canary" accounts (random made-up accounts) in your database, with a passive device on the network that screams bloody murder if any of those accounts ever get successfully logged in to.

However, that being said, that doesn't really answer your question.

A couple things:

1) You don't want to use SHA, generally. You want to use a proper KDF (key derivation function). Why? Standard hash functions are designed to be fast and not take much memory. Which means it is a whole lot easier to crack.

2) If any of the user authentication is in a language that you cannot disable optimizations in, and you cannot look at the assembly code coming out, you cannot prevent timing attacks. You may be able to mitigate them, but that is all.

3) In general, if a KDF / hash function doesn't use any data-dependent table lookups / pointer chasing / control flow, and doesn't do any complex functions (multiplication, etc), it's probably relatively safe from timing attacks. Otherwise... Not so much.

4) Theoretically, you need cooperation of the kernel, etc, to truly be safe from timing attacks. Among other things, context switches could leak information. In actuality... It's definitely possible but I don't know of anyone who has made it work.


This stuff is so interesting. D.J. Bernstein is certainly prolific! It's cool how processor-specific that cache-based attack is. It's weird, I have a compilers background but (evidently) I've never even thought about timing attacks. There isn't a lot of overlap between the PL and crypto communities I guess. I know we used to use asm volatile with gcc but apparently that isn't even a guaranteed scheduling barrier anymore.

Anyway, my curiosity is satisfied for now, but thanks again for sharing, and keep posting about this stuff.


Yes you can - take the hash of the input and token, and compare those. Now you're trying to cause a sha256 collision. Good luck!


You still have to perform the byte-by-byte comparison and this is where TheLoneWolfling alleges the optimizations kick in and give away secrets. However, from my testing, the variations in timing are unpredictable (I get variations on compare speed on different runs of the same program with the same input) and small and would be swallowed up by variations in network latency.


You could probably do that relatively safely with bit operations, but I expect to be proved wrong.

Something like this should work:

    /* Swap char for uint<register_size> for speed */
    #define CMP_TYPE char

    /* Assuming char* sha256(char* input); */
    CMP_TYPE* input_hash=(CMP_TYPE*) sha256(input);
    CMP_TYPE* token_hash=(CMP_TYPE*) sha256(token); /* Precomputed? */
    /* max_pos=32 for char on most systems */
    int max_pos=256 >> (3 + sizeof(CMP_TYPE));

    char cmp=0;
    for (int i=0;i<max_pos;i++) {
        cmp = cmp | (token_hash[i] ^ input_hash[i]);
    }
    return (cmp != 0);
The reason for the CMP_TYPE #define is so that you can optimise the comparison by replacing char with uint32 or uint64.


Again, the compiler is free to optimize that to data-dependent time. It's relatively unlikely to do so on current machines (the time saved is unlikely to be worth the potential of a pipeline stall), but it's free to do so.

For instance, it's legal for the compiler to insert a strcmp fallthrough (`if input == token: return true`, or rather the strcmp equivalent) at the start, I'm pretty sure.


I have a bunch of other replies in this thread that touch on this, but suffice to say it's not just the comparison that matters.

For one thing, you have to assume that the SHA function is data-independent time (which, again, good luck doing in C / C++).

For another thing, noise in timing attacks doesn't prevent them. Even at levels of noise that seemingly obscure everything. And it's a very bad thing to rely on network latency being unpredictable enough.


Since an optimizing compiler is free to go to town on functions such as string comparison, are these functions typically written in assembler per-architecture? I.e. one for x86 and x86-64, another for MIPS, RISC, etc?


Except that that still leaves you open to a dictionary attack.

Or rather, it accelerates a dictionary attack from O(sizeof(dictionary)) tries to O(length(hash)) tries.

(Let's suppose you are comparing by hex digits. You try 16 dictionary "words" whose hashes start with 0...f. One of those should take slightly longer. You then try 16 dictionary "words" whose hashes start with the digit found previously with different second digits. (Skip any digits that you don't have a word for.) One of those should take slightly longer... Repeat until found.)

This can be mitigated with a salt - if and only if you manage to prevent the user from figuring out how your salting scheme works.


a) You're just punting the timing attack off to the SHA algorithm.

b) There is no SHA algorithm in the C / C++ standard library (that I know of), muchless a guaranteed constant-time (or rather, data-independent) one.

c) The compiler is well within its rights to insert "busy_loop_for_ms(input_char);" anywhere it wishes. It is unlikely to do so, but it is allowed to. As I said: C and C++ don't have any notion of constant or variable time.


What if you mark token, input, and match_count as volatile?


In practice, it'd probably work.

In theory? The compiler is still free to add data-dependent delays.

For an "evil" example, a compiler could do this:

    ...

    for(int i=0; i<token_length; i++) {
        char temp = input[i];
        if (token[i] == temp) {
            match_count++;
        }
        wait_for_ns(temp);
    }
It wouldn't make sense for a compiler to do so - but it is allowed. It


That's a good point. However, even if you wrote the routine in assembly, the CPU could pull a similar trick. At some point you have to give up and trust the tool not to do something too catastrophically stupid. Whether that point is at the CPU or the C compiler (with sufficiently paranoid C code) or elsewhere, I'm not sure.


Yep. A classic example is multiplication - which isn't constant-time on modern machines but was on many older ones.

Currently, CMOV is looking like it could become the same thing. It currently takes data-independent time. But is not guaranteed to do so, and very well may not in the future.

The distinction is that hardware tends to change slower than software. You generally have multiple compiler versions per hardware version (possible exception of embedded processors, but even then...).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: