• IHeartBadCode@kbin.social
      link
      fedilink
      arrow-up
      21
      ·
      1 year ago

      Data science term. Means everything runs inside the GPU entirely. No CPU or system RAM outside of the (usually Python) interface that started, monitors, and collects the result of the job.

      ROCm is AMD’s solution to CUDA that covers for nVidia.

        • IHeartBadCode@kbin.social
          link
          fedilink
          arrow-up
          5
          ·
          1 year ago

          Both are vendor specific implementations of processing on GPUs. This is in opposition to open standards like OpenCL, which a lot of the exascale big boys out there mostly use.

          nVidia spent a lot of cash on “outreach” to get CUDA into a lot of various packages in R, python, and what not. That did a lot of displacement from OpenCL stuff. These libraries are what a lot of folks spin up on as most of the leg work is done for them in the library. With the exascale rigs, you literally have a team that does nothing but code very specific things on the machine in front of them, so yeah, they go with the thing that is the most portable, but doesn’t exactly yield libraries for us mere mortals to use.

          AMD has only recently had the cash to start paying folks to write libs for their stuff. So were starting to see it come to python libs and what not. Likely, once it becomes a fight of CUDA v ROCm, people will start heading back over to OpenCL. The “worth it” for vendor lock-in for CUDA and ROCm will diminish more and more over time. But as it stands, with CUDA you do get a good bit of “squeezing that extra bit of steam out of your GPU” by selling your soul to nVidia.

          That last part also plays into the “why” of CUDA and ROCm. If you happen to NOT have a rig with 10,000 GPUs, then the difference between getting 98% of your GPU and 99.999% of your GPU means a lot to you. If you do have 10,000 GPUs, having like a 1% inefficiency is okay, you’ve got 10,000 GPUs the 1% loss is barely noticeable and not worth it to lose portability with OpenCL.

    • subtext@lemmy.world
      link
      fedilink
      arrow-up
      18
      ·
      1 year ago

      I think end-to-end refers to the “open source”, not the GPU acceleration. I know GPUs have always been a black magic to get working and so you often have to use proprietary, closed-source blobs from the manufacturer to get them to work.

      The revolution that this is bringing seems to be that all that black magic has been able to be implemented in open-source software.

      Could be wrong though, that’s just how I interpreted the article.

      • Yup, it’s definitely about the “open-source” part. That’s in contrast with Nvidia’s ecosystem: CUDA and the drivers are proprietary, and the drivers’ EULA prohibit you from using your gaming GPU for datacenter uses.

    • woelkchen@lemmy.world
      link
      fedilink
      arrow-up
      3
      ·
      1 year ago

      Any sort of computing done on the GPU. Not sure what they mean by “end-to-end”. Perhaps that users don’t have to mess with installers.