• @just_another_person@lemmy.world
    link
    fedilink
    English
    97 days ago

    I think you’re stuck in the traditional viewpoint of a computer being CPU+Mem+Storage. That’s fine for a single machine that a regular user would have.

    This type of memory could essentially wipe out the need for traditional deployments in datacenters by having memory banks of this stuff operating with many CPUs as a client on a bus with no local storage needed, so just CPU+Mem and everything loaded into a known state via network storage that won’t go away if something loses power or crashes. It would definitely make the current idiotic use of GPUs more cost-effective and less wasteful.

    If you try and take that down to a regular user needing a use-case, it’s really only going to matter for developers building things for such a system because it’s such a new idea having stateful memory. You may just be thinking about it like a single user, which is not what it would be used for at all (at first).

    To your other question about the actual speed: current memory speeds only need to be that fast because of the storage involved and shuttling data across a bus between the three parts. Getting this new type of stateful memory to higher speeds than a current storage device would already show a performance benefit because you’re removing one step in the total transfer path between all three points and just having the two. So really a speed of something higher than SSD but slower than current DDR speeds should still see a benefit in theory.

    Overall, this has been a path for things for quite awhile, and they’ve obviously got to get some sheets out to explain the performance and efficiency benefits still, and it will require a complete rework of how current CPUs and bridge controllers work…it’s quite a ways off from being an everyday product.

    • @brucethemoose@lemmy.world
      link
      fedilink
      English
      6
      edit-2
      7 days ago

      You are talking theoretical.

      A big reason that supercomputers moved to a network of “commodity” hardware architecture is that its cost effective.

      How would one build a giant unified pool of this memory? CXL, but how does it look physically? Maybe you get a lot of bandwidth in parallel, but how would it be even close to the latency of “local” DRAM busses on each node? Is that setup truly more power efficient than banks of DRAM backed by infrequently touched flash? If your particular workload needs fast random access to memory, even at scale the only advantage seems to be some fault tolerance at a huge speed cost, and if you just need bulk high latency bandwidth, flash has got you covered for cheaper.

      …I really like the idea of non a nonvolatile, single pool backed by caches, especially at scale, but ultimately architectural decisions come down to economics.

      • @just_another_person@lemmy.world
        link
        fedilink
        English
        67 days ago

        It’s not theoretical, it’s just math. Removing 1/3 of the bus paths, and also removing the need to constantly keep RAM powered…it’s quite a reduction when you’re thinking at large scale. If AWS or Google could reduce their energy needs by 33% on anything, they’d take it in a heartbeat. Thats just assuming this would/could be used somehow as a drop-in replacement, which seems unlikely. Think of an SoC with this on board, or an APU. The premise itself reduces cost while increasing efficiency, but again, they really need to get some sheets out and productize it before most companies will do much more than simply do trial runs for such things.

        • @brucethemoose@lemmy.world
          link
          fedilink
          English
          6
          edit-2
          7 days ago

          It’s not theoretical, it’s just math. Removing 1/3 of the bus paths, and also removing the need to constantly keep RAM powered

          And here’s the kicker.

          You’re supposing it’s (given the no refresh bonus) 1/3 as fast as dram, similar latency, and cheap enough per gigabyte to replace most storage. That is a tall order, and it would be incredible if it hits all three of those. I find that highly improbable.

          Even dram is starting to become a bottleneck for APUs, specifically, because making the bus wide is so expensive. This applies to the very top (the MI300A) and bottom (smartphones and laptop APUs).

          Optane, for reference, was a lot slower than DRAM and a lot more expensive/less dense than flash even with all the work Intel put into it and busses built into then top end CPUs for direct access. And they thought that was pretty good. It was good enough for a niche when used in conjunction with dram sticks

          • @just_another_person@lemmy.world
            link
            fedilink
            English
            27 days ago

            No, you misunderstood. A current standard computer bus path is guaranteed to have at least 3 bus paths: CPU, RAM, Storage.

            The amount of energy required to communicate between all three parts varies, but you can be guaranteed that removing just one PLUS removing the capacitor requirement for the memory will reduce power consumption by 1/3 of whatever that total bus power consumption is. This is ignoring any other additional buses and doing the bare minimum math.

            The speed of this memory would matter less if you’re also reducing the static storage requirement. The speed at which it can communicate with the CPU only is what would matter, so if you’re not traversing CPU>RAM>SSD and only doing CPU>DRAM+, it’s going to be more efficient.

            • @barsoap@lemm.ee
              link
              fedilink
              English
              0
              edit-2
              7 days ago

              PCIe 5.0 x16 can match DDR5’s bandwidth, that’s not the issue, the question is latency. The only reason OSs cache disk contents in memory is because SSD latency is something like at least 30x slower, the data ends up in the CPU either way RAM can’t talk directly to the SSD, modern mainboards are very centralised and it’s all point-to-point connection, the only bus you’ll find will be talking i2c. Temperature sensors and stuff.

              And I think it’s rather suspicious that none of those articles are talking about latency. Without that being at least in the ballpark of DDR5 all this is is an alternative to NAND which is of course also a nice thing but not a game changer.

              • @just_another_person@lemmy.world
                link
                fedilink
                English
                37 days ago

                I don’t even think you know what you’re trying to say at this point, because it’s not making sense. Think what you will, but it’s obvious your conception of how computer architecture works is flawed. You’ll see this memory in machines and hopefully figure it out though. Good luck 🤞

                • @barsoap@lemm.ee
                  link
                  fedilink
                  English
                  26 days ago

                  So… what’s wrong about my characterisation of computer hardware? Do you have any issue with the claim that RAM doesn’t talk directly to the SSD, but via the CPU? If yes, please show me the traces on the motherboard which enable that. About the importance of latency to CPU-type computations?

                  Or do you want to tell me how it’s absolutely unsuspicious to bang out a press release in tech and talk about “speed”, not distinguishing between bandwidth and latency? Where’s the fucking numbers. There’s no judging the tech without numbers and them not being forward with those numbers means they’re talking to investors, not techies.

                  • @just_another_person@lemmy.world
                    link
                    fedilink
                    English
                    16 days ago

                    I’m not even sure where you got this RAM talking to storage thing from. This is why I’m saying you don’t know what you’re talking about it. I think your fundamental understanding of this is flawed.