• 520@kbin.social
      link
      fedilink
      arrow-up
      1
      ·
      1 year ago

      …what are you even talking about? A hashing algorithm takes one data input and makes one hash from said data input.

      • conciselyverbose@kbin.social
        link
        fedilink
        arrow-up
        1
        ·
        1 year ago

        A hash converts a large input into a small output. If a hash takes up to 128 ASCII characters and outputs 64, there will be ~10^135 collisions per output. This is completely normal and not a design flaw. It’s simple math.

        The strength of a cyyptographic hash function (not the only kind of hash or the only useful kind) is in not being predictable, not in avoiding collisions.

        • 520@kbin.social
          link
          fedilink
          arrow-up
          1
          ·
          edit-2
          1 year ago

          Your understanding is a little lacking.

          Hash algorithms don’t take an input and make it smaller. What they do is, they take an input, plug it into a mathematical formula and that outputs a string of text of fixed size, the actual size being determined by the algorithm used.

          There are a few key factors people take into account while making a hashing algorithm:

          1. collision resistance. It won’t ever be possible to make it completely resistant, so they aim to make it unfeasible to do with the foreseeable future of technology. Many technologies we rely on, such as TLS, rely on hashes for verification purposes, so collision resistance is very important for that.

          2. irreversibility. This is a big reason why it doesn’t simply convert big output into small output (the other being that hashes can actually be bigger than the input data itself). Information is lost in the hashing process to the point where you can’t take a hash and unhash it into the original data.

          3. reliability. The algorithm must create the same output given the exact same data.

          4. predictability, like you said, but only kinda. While it is true that a requirement is that an attacker must not be able to derive even part of the original data, a lot of the onus here is actually on the user to not use predictable inputs when using hashes for secure things. As said before, a hashing algorithm must give the same output when given the same input, so someone using, let’s say a hashed timestamp for something secure is being a moron.

          • conciselyverbose@kbin.social
            link
            fedilink
            arrow-up
            1
            ·
            1 year ago

            They have a fixed size output, yes. That output is effectively universally substantially smaller than the input it supports. The fact that they can also take smaller inputs as well increases the actual number of inputs, because those are in addition to the number of full length messages. The point is that the input space is a fuckton of orders of magnitude larger than the output space, which means you’re literally unconditionally guaranteed that collisions have to exist.

            Half your points are specific to a cryptographic hash, which isn’t the only kind of hash or the only useful kind of hash, but since that’s what you’re talking about fine.

            1. Collisions existing are normal. You can only avoid making finding a collision easier than finding the actual input for a password application and finding a collision with a modified hard to do for a checksum. The collisions still exist. In some applications of hashing, eg semantic hashing, collisions for similar inputs are desirable.

            2. Yes, this is the point of a hash, but it’s not hard to do.

            3. Again, same thing. Deterministic code isn’t that hard to do.

            4. Preventing predictability is the only point for a cryptographic hash (besides being deliberately heavy to prevent brute force). If there aren’t systematic flaws to make the distribution of outputs distinguishable from randomness, your cryptographic hash is going its job.