A fifteen year old TCP bug?

pilom · on March 24, 2011

This is true hacking news! Discusses a possible new bug in TCP, teaches how TCP works, has links to useful and relevant books on the subject, AND includes remarks about how difficult it is for a newbie to actually make changes to open source software and not get yelled at. I love it!

mrspeaker · on March 24, 2011

I agree - but it's funny that there's only a handful of comments. You just can't have an uninformed rant with such technical correctness (the best kind of correctness)!

rwmj · on March 24, 2011

.. and comes from a cool hackerish domain (42.org)

feintruled · on March 24, 2011

The response to the bug report looks depressingly typical. Rejects the working fix with a wall of text speculation on numerous other possibly better fixes (without deigning to actually choose one). Nirvana fallacy in action!

stcredzero · on March 24, 2011

I know of one commercial Smalltalk UI bug that persisted 12 years -- being reported all the while. To be fair, it was a very tricky low-level race condition, very hard to reproduce, though very serious. (Unhandled exception in the bowels of the UI library. Boom! Application goes down.) Still, the attitude of the vendor was just unbelievable from the POV of the customer. After dozens of reports, hundreds of messages, numerous pieces of documentation, it still took 12 years for engineers to even start thinking it was something besides user error -- even though multiple customers were reporting it. (I know because I worked for 3 of them!) There is a huge perceptual wall there. I know because I used to work for the vendor. I know how apparent this bug is at a production shop and how opaque it appears from inside the vendor's camp. (And despite my being from inside, I still got the "user error" chant!)

EDIT: Oh, and I know of another UI bug that's been in their system for about 8 years. It's a Smalltalk newbie classic -- shoving non-identity keys into an IdentityDictionary. I could describe what it is to a Smalltalker in 2 sentences, and they could then find it and fix it. This vendor seems to have the same attitude about this bug, so I've already learned my lesson. They can keep their damn bug!

cletus · on March 24, 2011

1-2 years ago I had the exact same thing with a PHP bug (I know, PHP bugs... shocking!), specifically with mysqli. It would crash on LONGTEXT columns. Not reliably. Different people reported it in different forms over 2-3 years previous. all of them getting automated responses ("Please provide...") followed by ("Closed due to no activity for 7 days...") with the odd dismissive comment by a committer.

It's an incredibly frustrating experience.

sedachv · on March 24, 2011

This is why forks happen (I've forked two projects because of this, three if you count one I've deployed but haven't publicly released).

SwellJoe · on March 24, 2011

If you can fork it, you can include a patch in your bug report. Sometimes, that's all it takes to make things happen.

The most infuriating situations for me are when I submit a working patch, and it is ignored. This is, thankfully, very rare. In some cases, the patch leads to a better fix being written by the maintainer or someone else (an example of this for me was when I needed yum to support authenticated repositories; it didn't, so I patched it, posted the patch to the mailing list, and soon after one of the members of the team rewrote it to be more robust and have nicer configuration syntax within a week).

albertzeyer · on March 25, 2011

Sometimes it takes several weeks until the first respOnse to your patch and even longer to get it accepted (or rejected). But if you rely on that patch for your own stuff and you already know a few other things which need some work, your only real option is to fork.

acdha · on March 25, 2011

That's why I stopped using PHP outright as well - after several inarguable bugs were just punted on, it became apparent that the core developers did not care and were not seriously interested in outside help. Even bow in the post-DVCS world, nothing kills a project faster than someone realizing that they're going to have to fork the whole thing if they want it to work.

cube13 · on March 24, 2011

It's a working fix, but isn't the proper fix. The real issue is the size difference of the long type for 32 and 64-bit architectures. Bruce Evans doesn't explicitly mention this fact, but it's the core of his reasoning.

Since long is 64-bits on 64-bit architecture, and 32 on 32-bit architecture. This is the reason that 0xffffffff is showing up as a non-negative number on the 64-bit machine, but shows up as negative on the 32-bit.

Changing the type to int(which is 32 bits long on both x86 and x64), while it does break 16-bit systems(which don't exist anymore), fixes this completely by removing the x86 and x64 behavior differences with the long type.

The two style issues that he mentions are easily fixed by moving the variable declaration to the top of the function and initializing it there. However, these may be forced by the function structure due to the gotos present in the code...

peterbotond · on March 24, 2011

a better fix to use the lmin macro. libkern.h.

http://www.freebsd.org/cgi/cvsweb.cgi/~checkout~/src/sys/sys...

cube13 · on March 24, 2011

Not really. The line is this: long adv = min(recwin, (long)TCP_MAXWIN << tp->rcv_scale) - (tp->rcv_adv - tp->rcv_nxt);

So it's really: long = uint(long, (long)uint32) - (uint32-uint32)

Here's the problem on x64: you're converting a 64-bit long to a uint, then doing a subtraction with another uint, then placing that result into a 64-bit long. Since the compiler is just doing an assignment rather than a sign extension, which is why there is a large positive number rather than -1.

Changing it to use the lmin macro would make it: long = long(long, (long)uint32) - (uint32-uint32)

This still has the underlying issue(using 32-bit values in 64-bit buckets), which should work out fine, but may have issues down the road.

It makes more sense to change all the long types to int32/uint32 types rather than just cast longs everywhere. If recwin and adv were changed to int32, it would be: int32 = uint32(int32, uint32) - (uint32 - uint32)

While this potentially has issues if the uints are between 0x80000000 and 0xfffffff, it's a safer solution than using longs.

EDIT: added some explanation

peterbotond · on March 25, 2011

the lmin[1] macro will convert the uints to a long int and then do the arithmetic.

[1]static __inline long lmin(long a, long b) { return (a < b ? a : b); }

[2]static __inline u_int min(u_int a, u_int b) { return (a < b ? a : b); }

Edit: 6.3.1.8 Usualarithmetic conversions in the c-99 standard.

btmorex · on March 24, 2011

You have to do this if you want your codebase to get better over time rather than worse over time.

jongraehl · on March 24, 2011

What he's doing seems useful to the project. There's no better time to get it right. I'm just surprised he's willing to expend so much effort communicating instead of just fixing the patch.

I noticed that C programmers tend to use macros for things where (possibly non-exported) inline functions would make more sense. Why is that? Are they in the habit of building the OS with all optimizations off? Or is it that they're being used as poor man's generic function?

cube13 · on March 24, 2011

The inline keyword is best thought of as a hint to the compiler, not a command. The compiler is free to ignore the meatbag telling it to inline functions if it chooses to.

Macros are substituted in before the compiler, so they are always inlined.

EDIT: Hint, not suggestion.

Someone · on March 25, 2011

That is true, but the idea that programmers can use macros to force the compiler to emit optimal code is wrong, too. In the early days of C, that was (almost) true, but those days are over.

In theory, a compiler could uninline common code blocks, including macro calls, into functions to decrease object code size and/or working set size, thus speeding up the program (example: functions f and g with inlined function h each take 2 cache lines; without inlining, each of f, g and h fit a single cache line)

In practice, using an inline function will give the compiler the opportunity to weigh different objectives (code size, execution speed, debuggability, etc) against each other, and do the better thing.

wulczer · on March 24, 2011

You can't always reply on the compiler inlining your function, and AFAIK there's no portable way of forcing inlining.

abecedarius · on March 25, 2011

I think it's mostly inertia and culture. Inline functions weren't in the standard till C99.

runjake · on March 24, 2011

The response to the bug report was by Bruce Evans, who is listed as the "Style Police-Meister" for FreeBSD. Apparently his job is to enforce standards & code style. Seems like he was doing his job.

http://www.freebsd.org/doc/en_US.ISO8859-1/articles/committe...

Edit: edited for clarity. Thanks, pinko!

pinko · on March 24, 2011

I believe the comment above was meant as a response to the "The response to the bug report looks depressingly typical" comment elsewhere in this thread.

It took me a minute to sort that out ("hmm, why is he referencing Bruce Evans?"), so I thought I'd mention it for anyone else trying to follow.

direxorg · on March 24, 2011

In 2002 we did custom patch for an energy company which had hundreds of outdated remote RS232 terminals hooked up via wireless links to the central station for control and monitoring. Their goal was to encrypt transmitted messages so it will not be intercepted and messed with during wireless transmission. Solution was Linux boxes on both sides that encrypts communication using OpenSSL... The problem was the terminal do not want to talk to Linux over crossover Ethernet because of.... you guessed... bug in TCP... To solve that we had to make patch for Linux kernel. and let me tell you that code in 2.4 kernel was very ugly with extremely funny comments :-)

My companion since than developing drivers and "he feels that he is doing something important rather than boring UI".

but all he is doing is mostly his own projects and drivers since updating open source IS a pain in the neck.

I guess problem in collaborative work is the reason why people do open source vs something that have to be supported. What do you think?

rboyd · on March 24, 2011

The only reply this PR got, was from Bruce Evans who critiqued my use of a simple (long) cast, which appears to have derailed this PR, sticking it in the usual never getting fixed limbo where unfortunately most of my PR's appear to end up.

Looks to me like Bruce gave you some valuable advice. You spent more time complaining about the handling of your PR and documenting the issue on your blog than it would have taken you to fix your patch.

HenryR · on March 24, 2011

Is Stevens vol. 2 in the public domain now? If not, that's pretty poor form, linking to a scanned pdf of the book.

uxp · on March 24, 2011

No, actually it's not public domain. The 19th edition was printed and released in 2005. Pearson looks like it actively tries to protect it's copyrights to the series as well: http://www.foo.be/docs/TCPIP-Illustrated-1/

derrickpetzold · on March 25, 2011

Its not because if you are a FreeBSD developer it is already assumed that you own every copy of Steven's books next to the shrine of him in your basement. So linking to a digital scan is not infringement its just convenience.

doki_pen · on March 24, 2011

Wouldn't that be fair use? Do you have issue with the content being posted, or the fact that it's scanned?

estel · on March 24, 2011

Quoting parts of the book to support some points made in the article would fall under fair use, but linking to a scan of the entire book would not.

btilly · on March 24, 2011

Hosting a scan is not fair use, but linking to a copyright violation is not generally a copyright violation. Though a few courts have ruled that it can be if done for the specific purpose of knowingly disseminating illegal material. See http://www.chillingeffects.org/linking/faq.cgi#QID152 for more.

estel · on March 24, 2011

That's true, but the original point was 'poor form', which this likely falls under also.

ZoFreX · on March 24, 2011

It may not be illegal, but it's incredibly poor form.

pavel_lishin · on March 24, 2011

> As I had virtually no understanding of the TCP code, I liberally sprinkled it with printf()s

And people say it's a stupid way to debug!

SwellJoe · on March 24, 2011

What people say that?

pavel_lishin · on March 25, 2011

Pick any "how do you debug?" submission anywhere, and you'll see a lot of people claiming that using printf, etc, is retarded in the age of good debuggers.

Maybe they're just a noisy minority.

SoftwareMaven · on March 25, 2011

Using printf to debug when you could use a good debugger is...well, I wouldn't say stupid, just highly unproductive.

The problem is there are a lot of problems that aren't debugger friendly, especially if you are new to a particular domain. The kernel, timing-related problems, remote systems, production systems (you have intelligent logging, right?), etc., all have extremely valid reasons for using printf debugging.

kahawe · on March 25, 2011

One thing I am missing or haven't found in some of the debuggers I have used (mostly Java) is the feature to just print out the code execution and return values for a certain part of or the whole code.

Those prints or printfs are great to get a quick overview of what's going on, if you ask me... instead of stepping through the whole thing.

branden · on March 25, 2011

printf debugging is too simple and intuitive to justify an entire blog post. ;)

barrkel · on March 24, 2011

So much of this is caused by unsigned types. They are evil; avoid them wherever you can.

1amzave · on March 24, 2011

Care to elaborate? Unlike signed ones, unsigned integral types at least have well-defined behavior on shifting and overflow. (I'm speaking in terms C specifically here, of course.)

cpeterso · on March 24, 2011

Signed ints are easier to range check at runtime. Given an unsigned int, it's difficult to detect an invalid result from combining or comparing signed and unsigned ints.

Google's C++ Style Guide discourages using unsigned ints to represent nonnegative numbers (like sizes or counts). It recommends using runtime checks or assertions instead.

http://google-styleguide.googlecode.com/svn/trunk/cppguide.x...

Unsigned ints make sense for bit twiddling, but you should probably use a fixed-size uint32_t or uint64_t to ensure the results are consistent across various architectures.

bodyfour · on March 25, 2011

The "always use signed" rule is a source of endless debate in C circles. I personally like almost everything in the Google C++ Style Guide, but this is one place where I think they got it wrong.

The problem is that the riskiest place for a signed/unsigned mismatch is when calling an unsigned API with a signed value. Simply deciding to not use unsigned at all doesn't fix this because ANSI C and STL use unsigned types throughout (f.e. memcpy)

  if (size <= 10) {
    // Yay, I have plenty of space
    memcpy(buffer, src, size);
  }

The code looks fine, but if "size" is an int with the value -1 there's a hard-to-spot bug. Plenty of security holes have been caused by just this sort of mistake. If you don't fight against the types that libc uses you don't have this problem.

There will still be spots where you'll need to compare signed and unsigned values, but the compiler will warn you about these. You'll have to cast one side or the other but that's a GOOD thing. Since neither a signed-compare nor an unsigned-compare is always what you want you want to be explicit about it.

There are other advantages to using unsigned types. For instance, it gives an explicit hint to the person reading the code about the range of the value. I think this makes interfaces clearer. For instance if you see a function signature of "void foo(const uint8_t *, size_t)" you'll immediately guess that you're dealing with a memory buffer and its explicit size without even seeing the names of the parameters.

Actually, if I had my way "int" would default to being unsigned and you'd have to specifically request "signed" if that's what you want. I find that I probably use unsigned types 5x as often as signed ones.

barrkel · on March 26, 2011

"There are other advantages to using unsigned types. For instance, it gives an explicit hint to the person reading the code about the range of the value."

This is, without doubt, the worst reason for using unsigned types, and it's the primary reason (IMHO) for the flaws in the C API that force you to use unsigned types unnecessarily. Unsigned types are not a documentation feature, and they are not merely an advert for an invariant; they are opting in to a subtly different arithmetic that most people are surprised by. It would be better to have a range-checked types, like Pascal, than to infect the program with unsigned arithmetic.

I find that most programs deal with values for their integer types with an absolute value of under 1000; about the only excuse for using an unsigned type, IMO, is when you must have access to that highest bit in a defined way (for safe shifting and bit-twiddling).

bodyfour · on March 26, 2011

> they are opting in to a subtly different arithmetic that most people are surprised by

I think that's a "citation needed" moment there. It's true that any native integer type will strange if you go outside of its defined range. The only way to avoid that is to use a language that automatically converts to bignums behind the scene (Common Lisp, etc)

What I don't agree with is that this is something that "most people are surprised by" If anything, the word "unsigned" is a pretty good hint about what behavior you'll get.

And even when you play fast-and-loose with the rules, it usually turns out ok:

   unsigned a, b, c, d;
   a = b + (c - d);

even if d > c, this will do the expected thing on any 2's compliment architecture. Now, this will break if a and b were instead "unsigned long long". I think that case is fairly rare -- it's not a mistake I've seen commonly in real life (especially compared to the dangerous "botched range-check of a signed value" error)

But you are correct that it's not "merely an advert for an invariant" -- it's advertising that the compiler actually reads. It gives you better warnings (I've had plenty of bugs prevented by "comparison of signed and unsigned" warnings) It also allows the compiler to optimize better in some cases: compare the output of "foo % 16" with foo as signed and unsigned.

> It would be better to have a range-checked types, like Pascal

Adding runtime checks to arithmetic is the type of costs that are never going to be in C. This is no different than saying "C should have garbage collection" or "C should have RTTI". They're perfectly valid things to want in a language, but they're anathema to the niche that C holds in the modern world. With C I want "a + b" to compile down to one instruction -- no surprises.

And even if you DID do a range-check, what do you do if it fails? 1. Throw an exception? Sounds logical... oh wait, this is C there's no such thing as an exception 2. Clamp the value? Now you have behavior that is just as bizarre as an integer overflow 3. Crash? Not very friendly.. 4. Have a user-definable callback (i.e. like a signal) What is the chance that the programmer will be able to make meaningful recovery though?

There are, however, some additions to the C99 type system that I think would be useful.. for example C++11's strongly typed enum's are a good idea.

> I find that most programs deal with values for their integer types with an absolute value of under 1000

I find that most programs deal with values greater-than-or-equal-to zero.

barrkel · on March 27, 2011

I find that most programs deal with values greater-than-or-equal-to zero.

-1 is very frequently used as a sentinel value. For example, counting backwards through the elements of some container:

    for (i = count - 1; i >= 0; --i)
        /* body */;

I've had plenty of bugs prevented by "comparison of signed and unsigned" warnings

You wouldn't have had these warnings, much less needed to pay attention to them, if you hadn't had to use unsigned types in the first place.

This conversation is much like those around GC. It's impossible to convince people labouring under tyranny they've learned to love without them experiencing a free life first. You just can't communicate it with words.

bodyfour · on March 27, 2011

> -1 is very frequently used as a sentinel value.

I actually think these sentinels work quite nicely with unsigned.

   unsigned element_num;
   static const unsigned NO_ELEMENT = (unsigned) -1;

Yes, you need a cast, but it's just one place. From then on you can use a nice constant for your sentinel.

Also, your sentinel is more range-check safe then it would have been if it were an int (the classic "if (x < sizeof(arr)) arr[x] = 1;" issue again)

> For example, counting backwards through the elements of some container:

Ok, that is a fair point. Tat type of loop is easy to mess up with an unsigned type. Worse, gcc will only warn you if you compile with -Wextra which not everybody does.

Actually implementing the loop in a safe manner isn't really that hard though, of course:

   if (count > 0) {
      unsigned i = count - 1;
      do {
         /* body */
      } while (i-- != 0);
   }

Couple extra lines, true. I don't think it loses much clarity.

> For example, counting backwards through the elements of some container:

You missed my point. I mean warnings caused by "oops, that's not the variable I meant to compare with there"

It's a similar story with "const". One of the great side-benefits of using "const" consistently is that suddenly you find that the compiler starts catching more of your dumb mistakes ("oh I thought this was called like func(dest,src) but it's actually func(src,dest)" The moral is that "more info to the compiler" translates to "compiler notices a larger percentage of your dumb mistakes"

> It's impossible to convince people labouring under tyranny they've learned to love without them experiencing a free life first.

Melodramatic much?

adrianN · on March 24, 2011

Oh man, so I've been doing in wrong all the time. I always thought it would be a good idea to use the type system to its full capabilities and complained about the compiler for not adequately slapping my wrist when I obviously assign negative numbers to unsigned variables (a sign analysis is pretty simple to implement!).

barrkel · on March 26, 2011

Yes, you have been doing it wrong all this time, and I'm fairly confident in this. Using the type system to its full capabilities is not in itself a natural good. Sign analysis doesn't help you with the problem, because the problem is that using unsigned types means opting into a different arithmetic, an arithmetic which is unnatural to most humans.

seabee · on March 24, 2011

> it's difficult to detect an invalid result from combining or comparing signed and unsigned ints

Isn't this why you should compile with all warnings on?

andrewf · on March 24, 2011

I'd wager that 90%+ of the time, people fix "comparison between signed and unsigned values" warnings by casting one side of the expression.

But if you really want to eliminate the potential for a bug from this warning, you have to go back through and tweak/check the values you're testing, all the way back to their source, fixing signedness along the way. At this point you may as well have settled on a default to begin with.

The real pain comes when you have to interface with external code. Even in the standard library, you'll find size_t (eg fread(3)) and ssize_t (eg read(2)). You're going to have a mismatch with one or the other.

barrkel · on March 26, 2011

I care less about C's specific behaviour on shifting and overflow (both of which are pretty rare), and more about the fact that unsigned integers use a different arithmetic to the signed integers most people are familiar with. In particular, subtraction doesn't mean what you think it does. At 0 in unsigned arithmetic, there's a gaping cliff you can fall off of where you wrap around the other side, while at 0 in signed arithmetic, you're well away from that cliff and are highly unlikely to get anywhere near to it. Writing a program using many unsigned numbers means playing on the edge of a cliff.

__david__ · on March 24, 2011

I wouldn't say they are evil. In fact, both signed and unsigned are the same--the only difference is the "pain point" (the place where you subtract 1 and your world breaks) is in a different spot. 0 for unsigned, INT_MIN for signed. Both are perfectly fine as long as you stay in their good range.

barrkel · on March 26, 2011

Yes - but 0 is much closer to the range most people put in their integer values than INT_MIN. The cliff you fall of off is far closer with unsigned integers.

a1k0n · on March 24, 2011

A coworker of mine was just bit badly by Java's insistence that unsigned types are so evil that the language shouldn't have them. He calculated a 32-bit hash, but since the ints are all signed, he took the absolute value before the modulo with the hash table size. That's all well and good, but abs(-2147483648) is still -2147483648 in 32-bit two's complement arithmetic.

I'm sure I don't need to point out that this particular problem had nothing to do with unsigned types (they were signed!). A better rule of thumb is: never use "long" in C/C++ unless you really don't care whether it's 32 or 64 bits.

ig1 · on March 24, 2011

I only had a quick skim through the article (need to be off to the London HN meetup shortly!), but couldn't this be used to mount a DOS attack sucking up the number of available sockets on a server?

pmjordan · on March 24, 2011

Maybe, if you could trick the server (64-bit FreeBSD) into connecting to sockets open on 32-bit FreeBSD machines. I can't think of any common services that would be susceptible to this (they would normally be susceptible to being tricked into opening other kinds of long-standing connections, too, which is just as good for DoS).

zwp · on March 24, 2011

> if you could trick the server (64-bit FreeBSD) into connecting to sockets

Proxies, SMTP gateways, FTP servers (active mode), ...

delinka · on March 25, 2011

"...on 32-bit [systems]" is the operative part of his statement. I'm pretty sure 'tricking' a client into opening connections to a server is trivial regardless of TCP bugs.

pmjordan · on March 24, 2011

Proxies usually have connection limits. FTP might work.