ebxml-msg message

Subject: Re: T2 Retry with Delivery Receipt
From: Dan Weinreb <dlw@exceloncorp.com>
To: mwsachs@us.ibm.com
Date: Thu, 20 Sep 2001 22:19:10 -0400 (EDT)
When we first started talking about data corruption, my immediate
reaction was that we don't have to worry about that, because the lower
layers take care of it.  And in my own experience, as far as I know,
I have not been seeing data corrupted in my http'ing and ftp'ing.

However, in light of the material that David Fischer has pointed out,
I'm not so sure.

From "When The CRC and TCP Checksum Disagree" by Stone and Partridge (2000):

   Between one packet in 10 billion and one packet in a few million will have
   an error that goes undetected.  The exact range depends on the type of
   data being transferred and the path being traversed.

   Our conclusion is that vital applications should strongly consider
   augmenting the TCP checksum with an application sum.

Exploring, I found an earlier paper called "Performance of Checksums
and CRCs over Real Data" by Stone, Greenwald, Partridge, and Hughes
(1998), that investigated a specific subset of this problem.
Co-author Mike Greenwald is an ex-colleague of mine, and I sent him
mail asking what he thinks all of this really means for us.  (I was
going to say "what this adds up to", but fortunately I stopped myself
in time.)  (I presume that co-author Jim Hughes, then of Storage
Technology, is not the same person as Jim Hughes of HP Cupertino, or
he would have said something by now?)  Summarizing my discussion
with Mike Greenwald:

He says: "In general, I would strongly urge that there be some form of
(end-to-end) data integrity checking above TCP."

Problems with the TCP checksum include:

-- It's only a 16-bit checksum, so the possibility of an error that leaves
   the checksum unchanged, while small, is not zero, and must be considered
   in light of the large amount of data that might be transferred over ebXML MS;
   The TCP checksum will fail to detect an error -- in the best case --
   once out of every 64K errors.  If you're in a situation where you are
   sending 1 million packets per minute, and one in a million packets has
   an error, then TCP will allow a corrupted packet through once every
   thousand hours.  That's best case.
-- The basic theory behind checksums assumes that the data being transferred is
   random, but in fact real data sometimes has patterns that make the checksum
   less effective than it would otherwise be;
-- The basic theory behind checksums assumes that corruption is random, but
   real failure modes observed in the Internet are not random, and some of
   them are particularly good at causing problems that are not detected
   by TCP checksums.

Problems with error-checking at the link level:

-- TCP does not always run on top of link level protocols that have 32 bit CRC's.
-- Even when it does, corruption occurs in places other than the wire, such
   as in network interface cards with DMA, and also in network software.
   Michael has seen TCP let corrupted packets through when there were bugs
   (hardware, OS) that let the corruption occur before computing the CRC.

But there's also good news:

-- TCP *usually* runs on top of a link-layer CRC.  Your error rate will
   vary depending on the network path you actually use.
-- TCP checksum works better on random data.  If data has been compressed
   and/or encrypted, it's random.
-- If you do a cryptographically secure hash on the message and it's checked
   at the other end, that gives you end-to-end checking for random message
   corruption as well.  If the sender digitally signs the message and the
   receiver checks the digital signature, that has the same benefit.

In light of all this, we might consider some of:

-- Require digital signatures on all messages, always
-- Recommend digital signatures wherver possible

The problem with requiring digital signatures is that in some cases a
pair of business partners might not want to take the time, trouble, or
expense of generating keys, storing keys, getting signed certificates,
rolling over the certificates when necessary, and all that.  So it
would be nice if there were a mechanism for simply computing a
cryptographically-secure hash code over the message and including that
in the message.

I have not yet managed to fully comprehend the XMLDSIG spec.  There
isn't, by any chance, a way to use XMLDSIG solely for creating and
transmitting a message digest, without any digital signature?  If
there were, we could

-- Require message digests on those messages that are unsigned
-- Recommend message digests, ditto, ditto.
Follow-Ups:
- Re: T2 Retry with Delivery Receipt
  - From: christopher ferris <chris.ferris@Sun.COM>
- Re: T2 Retry with Delivery Receipt
  - From: Rich Salz <rsalz@zolera.com>
References:
- RE: T2 Retry with Delivery Receipt
  - From: Martin W Sachs <mwsachs@us.ibm.com>