The Limitations of the Ethernet CRC and TCP/IP Checksums for Error Detection

kjetijor · on July 29, 2015

Reminds me of a random google SRE slide deck I stumbled across a couple of years ago. Switch itself corrupts payload; dense enough to pass the tcp checksum in use.

http://www.catonmat.net/blog/wp-content/uploads/2008/11/that...

kev009 · on July 29, 2015

This. I work at a CDN and there's a particular switch model that we have a fair number of doing this. It was incredibly hard for the network team and vendor to track down. It's because one of the internal data paths was not error checked (i.e. hw design and layout problem). We're somewhat working around it with application layer stuff, but are moving off that switch and vendor as fast as capex allows. It wasn't a bottom of the barrel vendor either.

mingus68040 · on July 29, 2015

Please name and shame this vendor so that others may benefit from avoiding them.

jlgaddis · on July 29, 2015

Brocade?

kev009 · on July 30, 2015

codezero · on July 29, 2015

that's pretty interesting – do you know if there's a more thorough writeup about that issue?

rurban · on July 29, 2015

Super old article from 2008.

Nowadays people are switching from TCP to unprotected UCP to avoid the costly ACK dance, and not the other way round.

He also fails to mention how trivial CRC is to reverse (e.g. http://www.woodmann.com/fravia/crctut1.htm), and how good CRC32-C actually is to detect random bitflips. Much better than all other fast hash functions, and up to par with most slow and secure hash functions.

netheril96 · on July 29, 2015

When more and more transport employ end-to-end authenticated encryption like TLS and SSH, this will be a non-issue. Message authentication code will resist even malicious tampering, let alone accidental corruption.

dvanduzer · on July 29, 2015

You are paraphrasing the third line of the article.

thelema314 · on July 29, 2015

The suggestion at the end to use zip compression to protect your files seems funny; when we're talking about extremely rare errors, CRC32 seems insufficient to really protect.

admiun · on July 29, 2015

Interestingly IPv6 removed the checksum from its packet header [1], delegating this work to the higher protocols such as TCP and UDP.[2] So I guess that raises the chance of an invalid IP header getting through. Source and target address will probably result in dropped packets but I wonder what happens if one of the other fields is corrupted, say traffic class.

[1] https://en.wikipedia.org/wiki/IPv6_packet#Fixed_header

[2] https://en.wikipedia.org/wiki/IPv6#Simplified_processing_by_...

X-Istence · on July 29, 2015

If traffic class is corrupted it will simply be QoS'ed wrong. No biggie.

teambob · on July 29, 2015

Would the increasing use of SSL mostly solve this problem as a side-effect?

How likely is it that a secure connection will be corrupted without being noticed?

guan · on July 29, 2015

These days many SSL connections use authenticated modes of operation such as CCM or GCM that explicitly guarantee that data is transferred correctly.

therealmarv · on July 29, 2015

Here is the chance of that case: "checksum will fail to detect errors for roughly 1 in 16 million to 10 billion packets". source: http://dl.acm.org/citation.cfm?id=347561&dl=GUIDE&coll=GUIDE

fabioyy · on July 29, 2015

ethernet >=1gb/s have workarounds for error detection

https://en.wikipedia.org/wiki/Jumbo_frame#Error_detection

wmf · on July 29, 2015

I think that link says that SCTP (which no one uses) has better error detection, not Ethernet itself.

dkd · on Aug 3, 2015

all telecom networks (sigtran) uses SCTP.

detaro · on July 29, 2015

The "workaround" has nothing to do with Ethernet. Because Ethernet isn't reliable enough at detecting errors higher-level protocols now use better error detection mechanisms of their own.

kevin_thibedeau · on July 29, 2015

The error estimate that "between 1 in 16 million and 1 in 10 billion TCP segments will have corrupt data and a correct TCP checksum" is from "Performance of Checksums and CRCs over Real Data" [Stone and Partridge] which only analyzed a particular type of framing error over ATM. More modern transports should be immune to this form of error because the line encodings used make it nearly impossible to start a packet in the wrong place or include a fragment of the following packet.