# Checksum vs hashbytes issue

• ### Question

• we have to use checksum in one of our process and when we try to use checksum then it generats the duplicate results but as per mentioned on different forums or blogs we find its probability of duplicate is 1 in a billion instead of that we have to use hashbytes in a specific format so that it will not give us duplicate records and as it is defined that checksum uses MD5 algorithim so it should not give us the duplicate records.

Friday, September 30, 2011 1:27 PM

• Hi Sachin,

Checksum and Hashbyte are hash functions which gives hash key as output for input keys and when two different input keys gives same output then it's known as collision. Based on my knowledge this collision might occur, although very very rare in both Checksum and Hashbytes hashing algorithims but the probability of having it in Hashbyte is lesser than Checksum.

BTW, what is your intention of using Checksum? There could be other solutions than Checksum to achieve that.

HTH.

Regards,
Santosh

It feels great if you give us points for helpful post. :)
• Proposed as answer by Tuesday, October 4, 2011 9:52 AM
• Marked as answer by Thursday, October 6, 2011 9:30 AM
Tuesday, October 4, 2011 5:31 AM
• Checksum will give you more duplicates than HashBytes. Checksum returns a integer which is 4 bytes whereas MD5 returns 16 bytes which is the standard MD5 algorithm. If both checksum and hashbytes are both using MD5, checksum only return 1/4 of full length of standard MD5 result.

Think about this, if you use hashbytes to has strings, if you only look at the last 4 bytes of all returning values, you will see a lot of duplicates.

Regards

• Marked as answer by Thursday, October 6, 2011 9:30 AM
Thursday, October 6, 2011 3:53 AM

### All replies

• Hi Sachin,

Checksum and Hashbyte are hash functions which gives hash key as output for input keys and when two different input keys gives same output then it's known as collision. Based on my knowledge this collision might occur, although very very rare in both Checksum and Hashbytes hashing algorithims but the probability of having it in Hashbyte is lesser than Checksum.

BTW, what is your intention of using Checksum? There could be other solutions than Checksum to achieve that.

HTH.

Regards,
Santosh

It feels great if you give us points for helpful post. :)
• Proposed as answer by Tuesday, October 4, 2011 9:52 AM
• Marked as answer by Thursday, October 6, 2011 9:30 AM
Tuesday, October 4, 2011 5:31 AM
• Checksum will give you more duplicates than HashBytes. Checksum returns a integer which is 4 bytes whereas MD5 returns 16 bytes which is the standard MD5 algorithm. If both checksum and hashbytes are both using MD5, checksum only return 1/4 of full length of standard MD5 result.

Think about this, if you use hashbytes to has strings, if you only look at the last 4 bytes of all returning values, you will see a lot of duplicates.

Regards

• Marked as answer by Thursday, October 6, 2011 9:30 AM
Thursday, October 6, 2011 3:53 AM