• 520@kbin.social
    link
    fedilink
    arrow-up
    1
    ·
    edit-2
    1 year ago

    Your understanding is a little lacking.

    Hash algorithms don’t take an input and make it smaller. What they do is, they take an input, plug it into a mathematical formula and that outputs a string of text of fixed size, the actual size being determined by the algorithm used.

    There are a few key factors people take into account while making a hashing algorithm:

    1. collision resistance. It won’t ever be possible to make it completely resistant, so they aim to make it unfeasible to do with the foreseeable future of technology. Many technologies we rely on, such as TLS, rely on hashes for verification purposes, so collision resistance is very important for that.

    2. irreversibility. This is a big reason why it doesn’t simply convert big output into small output (the other being that hashes can actually be bigger than the input data itself). Information is lost in the hashing process to the point where you can’t take a hash and unhash it into the original data.

    3. reliability. The algorithm must create the same output given the exact same data.

    4. predictability, like you said, but only kinda. While it is true that a requirement is that an attacker must not be able to derive even part of the original data, a lot of the onus here is actually on the user to not use predictable inputs when using hashes for secure things. As said before, a hashing algorithm must give the same output when given the same input, so someone using, let’s say a hashed timestamp for something secure is being a moron.

    • conciselyverbose@kbin.social
      link
      fedilink
      arrow-up
      1
      ·
      1 year ago

      They have a fixed size output, yes. That output is effectively universally substantially smaller than the input it supports. The fact that they can also take smaller inputs as well increases the actual number of inputs, because those are in addition to the number of full length messages. The point is that the input space is a fuckton of orders of magnitude larger than the output space, which means you’re literally unconditionally guaranteed that collisions have to exist.

      Half your points are specific to a cryptographic hash, which isn’t the only kind of hash or the only useful kind of hash, but since that’s what you’re talking about fine.

      1. Collisions existing are normal. You can only avoid making finding a collision easier than finding the actual input for a password application and finding a collision with a modified hard to do for a checksum. The collisions still exist. In some applications of hashing, eg semantic hashing, collisions for similar inputs are desirable.

      2. Yes, this is the point of a hash, but it’s not hard to do.

      3. Again, same thing. Deterministic code isn’t that hard to do.

      4. Preventing predictability is the only point for a cryptographic hash (besides being deliberately heavy to prevent brute force). If there aren’t systematic flaws to make the distribution of outputs distinguishable from randomness, your cryptographic hash is going its job.