How to refer to the proper CRC32 algorithm for the track changes module?

Hi everyone,

as briefly discussed today in the call I wanted to hear your opinion how one can best refer to the CRC32 implementation I proposed to use in the Track Changes module which Ryan prepares.

The problem with CRCs is that there are many algorithms. For CRC32 there are still 3 used ones, using different polynomials, so one needs to refer to the right one, and then it needs to be clarified which init and exit operations are done on the calculated CRC value, because that is not determined by the polynomial.

Just so everyone understands: the bulk of CRC32 implementations (in ZIP, Ethernet, ..) use the same algorithm and values – but there are more. And if you look e.g. at the English Wikipedia page about CRCs (http://en.wikipedia.org/wiki/Cyclic_redundancy_check) you do not get the impression as if you can easily tell which one to pick (which is in practice not correct).

The initial idea was to point to an existing standard. But that is difficult, because commonly people use a reference to RFC 1952 (http://www.ietf.org/rfc/rfc1952.txt), which includes as appendix only a C code sample of an implementation. For a formal definition it refers to ISO 3309 (which is not publicly available, so I could not check the contents) and to ITU-T V.42. The latter is available, but it covers a lot more items than only the CRC32 generation, which itself it explains correctly from the mathematical level, but probably not in a way that would help the average implementer.

So the idea was to come up with pseudo code to make sure the right algorithm and init and exit values are understood.

Is this anything you would deem suitable for a standards document like the XLIFF draft? Or should we only refer to the mathematical foundations, by pointing to ITU-T V.42 (http://www.itu.int/rec/T-REC-V.42/en) ?

My proposed pseudo code looks like this:

Pseudo code to generate the CRC32 bit-by-bit looks as follows (where it is assumed that the least significant bit is the rightmost):

crc = 0xFFFFFFFF

for each bit in input do

if rightmost bit in crc <> current bit in input

crc = right shift(crc) XOR 0xEDB88320

else

crc = right shift(crc)

end for

crc = crc XOR 0xFFFFFFFF

Pseudo code to generate the CRC32 in the typical table driven approach looks as follows:

Table generation:

for n = 0 to 255 do

crc = n

for i = 0 to 7 do

if (crc AND 1 <> 0)

crc = right shift(crc) XOR 0xEDB88320

else

crc = right shift(crc)

end for

CRC_TABLE[n] = crc

end for

CRC calculation:

crc = 0xFFFFFFFF

for n = 0 to (octet count of input data)-1 do

crc = CRC_TABLE[ (crc XOR input[n]) AND 0xFF ] XOR right shift(crc, 8)

end for

crc = crc XOR 0xFFFFFFFF

0xEDB88320 is the reverse order representation of the coefficients of the polynomial x^32+x^26+x^23+x^22+x^16+x^12+x^11+x^10+x^8+x^7+x^5+x^4+x^2+x+1

Thanks for any comments!

(and I hope the code survives the list re-send halfways formatted)

Joachim

________________________________
Joachim Schurig
Senior Technical Director,

Lionbridge Fellow

Lionbridge

1240 Route des Dolines

06560 Sophia Antipolis

France

xliff message