locked
Checksum vs hashbytes issue RRS feed

  • Question

  • we have to use checksum in one of our process and when we try to use checksum then it generats the duplicate results but as per mentioned on different forums or blogs we find its probability of duplicate is 1 in a billion instead of that we have to use hashbytes in a specific format so that it will not give us duplicate records and as it is defined that checksum uses MD5 algorithim so it should not give us the duplicate records.

    Can you please help us in this.

    Friday, September 30, 2011 1:27 PM

Answers

  • Hi Sachin,

    Checksum and Hashbyte are hash functions which gives hash key as output for input keys and when two different input keys gives same output then it's known as collision. Based on my knowledge this collision might occur, although very very rare in both Checksum and Hashbytes hashing algorithims but the probability of having it in Hashbyte is lesser than Checksum.

    BTW, what is your intention of using Checksum? There could be other solutions than Checksum to achieve that.

    HTH.

    Regards,
    Santosh


    It feels great if you give us points for helpful post. :)
    • Proposed as answer by Jerry Nee Tuesday, October 4, 2011 9:52 AM
    • Marked as answer by Jerry Nee Thursday, October 6, 2011 9:30 AM
    Tuesday, October 4, 2011 5:31 AM
  • Checksum will give you more duplicates than HashBytes. Checksum returns a integer which is 4 bytes whereas MD5 returns 16 bytes which is the standard MD5 algorithm. If both checksum and hashbytes are both using MD5, checksum only return 1/4 of full length of standard MD5 result.

    Think about this, if you use hashbytes to has strings, if you only look at the last 4 bytes of all returning values, you will see a lot of duplicates.


    Regards

    John Huang, MCM-SQL, http://www.sqlnotes.info

    • Marked as answer by Jerry Nee Thursday, October 6, 2011 9:30 AM
    Thursday, October 6, 2011 3:53 AM

All replies

  • Hi Sachin,

    Checksum and Hashbyte are hash functions which gives hash key as output for input keys and when two different input keys gives same output then it's known as collision. Based on my knowledge this collision might occur, although very very rare in both Checksum and Hashbytes hashing algorithims but the probability of having it in Hashbyte is lesser than Checksum.

    BTW, what is your intention of using Checksum? There could be other solutions than Checksum to achieve that.

    HTH.

    Regards,
    Santosh


    It feels great if you give us points for helpful post. :)
    • Proposed as answer by Jerry Nee Tuesday, October 4, 2011 9:52 AM
    • Marked as answer by Jerry Nee Thursday, October 6, 2011 9:30 AM
    Tuesday, October 4, 2011 5:31 AM
  • Checksum will give you more duplicates than HashBytes. Checksum returns a integer which is 4 bytes whereas MD5 returns 16 bytes which is the standard MD5 algorithm. If both checksum and hashbytes are both using MD5, checksum only return 1/4 of full length of standard MD5 result.

    Think about this, if you use hashbytes to has strings, if you only look at the last 4 bytes of all returning values, you will see a lot of duplicates.


    Regards

    John Huang, MCM-SQL, http://www.sqlnotes.info

    • Marked as answer by Jerry Nee Thursday, October 6, 2011 9:30 AM
    Thursday, October 6, 2011 3:53 AM