[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Subject: Re: T2 Retry with Delivery Receipt
When we first started talking about data corruption, my immediate reaction was that we don't have to worry about that, because the lower layers take care of it. And in my own experience, as far as I know, I have not been seeing data corrupted in my http'ing and ftp'ing. However, in light of the material that David Fischer has pointed out, I'm not so sure. From "When The CRC and TCP Checksum Disagree" by Stone and Partridge (2000): Between one packet in 10 billion and one packet in a few million will have an error that goes undetected. The exact range depends on the type of data being transferred and the path being traversed. Our conclusion is that vital applications should strongly consider augmenting the TCP checksum with an application sum. Exploring, I found an earlier paper called "Performance of Checksums and CRCs over Real Data" by Stone, Greenwald, Partridge, and Hughes (1998), that investigated a specific subset of this problem. Co-author Mike Greenwald is an ex-colleague of mine, and I sent him mail asking what he thinks all of this really means for us. (I was going to say "what this adds up to", but fortunately I stopped myself in time.) (I presume that co-author Jim Hughes, then of Storage Technology, is not the same person as Jim Hughes of HP Cupertino, or he would have said something by now?) Summarizing my discussion with Mike Greenwald: He says: "In general, I would strongly urge that there be some form of (end-to-end) data integrity checking above TCP." Problems with the TCP checksum include: -- It's only a 16-bit checksum, so the possibility of an error that leaves the checksum unchanged, while small, is not zero, and must be considered in light of the large amount of data that might be transferred over ebXML MS; The TCP checksum will fail to detect an error -- in the best case -- once out of every 64K errors. If you're in a situation where you are sending 1 million packets per minute, and one in a million packets has an error, then TCP will allow a corrupted packet through once every thousand hours. That's best case. -- The basic theory behind checksums assumes that the data being transferred is random, but in fact real data sometimes has patterns that make the checksum less effective than it would otherwise be; -- The basic theory behind checksums assumes that corruption is random, but real failure modes observed in the Internet are not random, and some of them are particularly good at causing problems that are not detected by TCP checksums. Problems with error-checking at the link level: -- TCP does not always run on top of link level protocols that have 32 bit CRC's. -- Even when it does, corruption occurs in places other than the wire, such as in network interface cards with DMA, and also in network software. Michael has seen TCP let corrupted packets through when there were bugs (hardware, OS) that let the corruption occur before computing the CRC. But there's also good news: -- TCP *usually* runs on top of a link-layer CRC. Your error rate will vary depending on the network path you actually use. -- TCP checksum works better on random data. If data has been compressed and/or encrypted, it's random. -- If you do a cryptographically secure hash on the message and it's checked at the other end, that gives you end-to-end checking for random message corruption as well. If the sender digitally signs the message and the receiver checks the digital signature, that has the same benefit. In light of all this, we might consider some of: -- Require digital signatures on all messages, always -- Recommend digital signatures wherver possible The problem with requiring digital signatures is that in some cases a pair of business partners might not want to take the time, trouble, or expense of generating keys, storing keys, getting signed certificates, rolling over the certificates when necessary, and all that. So it would be nice if there were a mechanism for simply computing a cryptographically-secure hash code over the message and including that in the message. I have not yet managed to fully comprehend the XMLDSIG spec. There isn't, by any chance, a way to use XMLDSIG solely for creating and transmitting a message digest, without any digital signature? If there were, we could -- Require message digests on those messages that are unsigned -- Recommend message digests, ditto, ditto.
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Powered by eList eXpress LLC