architectures that prevent freezing

  • . in a pre-emptive OS there should be no freezing;
    given the new concurrency model
    that includes the use of the graphics processor GPU
    to do the system's non-graphics processing,
    my current guess is that the freezes happen when
    something goes wrong in the GPU,
    and the CPU is just waiting forever .
    . the CPU needs to have some way of getting control back,
    and sending an exception message to
    any of the processes that were affected by the hung-up GPU .
    . could any of Apple's developers
    correct this theory or comment on it ?
  • On May 9, 2012, at 7:14 AM, Ph.T wrote:

    > . in a pre-emptive OS there should be no freezing;
    > given the new concurrency model
    > that includes the use of the graphics processor GPU
    > to do the system's non-graphics processing,

    Well, the GPU can _occasionally_ be used to do some non-graphics work, typically tasks that are highly parallelizable. I’d reckon this happens most often in games, less in general purpose software.

    > my current guess is that the freezes happen when
    > something goes wrong in the GPU,
    > and the CPU is just waiting forever .
    > . the CPU needs to have some way of getting control back,
    > and sending an exception message to
    > any of the processes that were affected by the hung-up GPU .
    > . could any of Apple's developers
    > correct this theory or comment on it ?

    OS freezes tend to happen when kernel-level code gets into an infinite loop or deadlock. Sure there “should be no freezing” but there should be no bugs either, and that’s never true. (It’s exacerbated by the fact that some 3rd party device drivers need to run in kernel space.)

    _Some_ system freezes are due to the GPU completely locking up, usually due to a bug in the GPU vendor’s driver. My understanding is that when this happens it’s not really possible for the GPU to recover without a system reset. The CPU is probably still OK, but that doesn’t do any good if it can’t access the display.

    —Jens
  • Le 11 mai 2012 à 18:05, Jens Alfke a écrit :

    >
    > On May 9, 2012, at 7:14 AM, Ph.T wrote:
    >
    >> . in a pre-emptive OS there should be no freezing;
    >> given the new concurrency model
    >> that includes the use of the graphics processor GPU
    >> to do the system's non-graphics processing,
    >
    > Well, the GPU can _occasionally_ be used to do some non-graphics work, typically tasks that are highly parallelizable. I’d reckon this happens most often in games, less in general purpose software.
    >
    >> my current guess is that the freezes happen when
    >> something goes wrong in the GPU,
    >> and the CPU is just waiting forever .
    >> . the CPU needs to have some way of getting control back,
    >> and sending an exception message to
    >> any of the processes that were affected by the hung-up GPU .
    >> . could any of Apple's developers
    >> correct this theory or comment on it ?
    >
    > OS freezes tend to happen when kernel-level code gets into an infinite loop or deadlock. Sure there “should be no freezing” but there should be no bugs either, and that’s never true. (It’s exacerbated by the fact that some 3rd party device drivers need to run in kernel space.)
    >
    > _Some_ system freezes are due to the GPU completely locking up, usually due to a bug in the GPU vendor’s driver. My understanding is that when this happens it’s not really possible for the GPU to recover without a system reset. The CPU is probably still OK, but that doesn’t do any good if it can’t access the display.
    >
    > —Jens

    While playing with GPU programming, I had a lot of such freeze, and they never locked the CPU. I was always able to connect to my machine though SSH.
    Killing the processes that are affected is not enough. You may have to reinitialize the GPU driver. I think this is something Windows is able to do, but not Mac OS X AFAIK.

    More info about how it works on Windows can be found here:  http://msdn.microsoft.com/en-us/windows/hardware/gg487368

    -- Jean-Daniel
  • On May 11, 2012, at 9:23 AM, Jean-Daniel Dupas wrote:

    > While playing with GPU programming, I had a lot of such freeze, and they never locked the CPU. I was always able to connect to my machine though SSH.

    So a regular user process can permanently lock up the display, requiring a reboot, just by executing some bad GPU code?! That’s kind of a bad privilege violation and could be considered a DoS exploit.

    —Jens
  • Le 11 mai 2012 à 19:55, Jens Alfke a écrit :

    >
    > On May 11, 2012, at 9:23 AM, Jean-Daniel Dupas wrote:
    >
    >> While playing with GPU programming, I had a lot of such freeze, and they never locked the CPU. I was always able to connect to my machine though SSH.
    >
    > So a regular user process can permanently lock up the display, requiring a reboot, just by executing some bad GPU code?! That’s kind of a bad privilege violation and could be considered a DoS exploit.
    >
    > —Jens

    It was on 10.6 and never test if the system was better at handling lack of VRAM on 10.7 (caused by like texture leak for example).
    That said, if you want a DoS exploit, just send a shutdown Apple Event to the system, and it will stop the machine without asking anything.

    osascript -e 'tell app "System Events" to shut down'

    -- Jean-Daniel
  • On May 11, 2012, at 12:55 PM, Jens Alfke wrote:

    > So a regular user process can permanently lock up the display, requiring a reboot, just by executing some bad GPU code?! That’s kind of a bad privilege violation and could be considered a DoS exploit.

    On the old 2008 non-unibody MacBook Pro that I used to have with the NVidia 8600M GT in it, you could do that just by executing some *good* GPU code.

    Charles
  • >> While playing with GPU programming, I had a lot of such freeze, and they never locked the CPU. I was always able to connect to my machine though SSH.

    Sometimes you can, sometimes you can't.  It depends on exactly how things fail.

    > So a regular user process can permanently lock up the display, requiring a reboot, just by executing some bad GPU code?! That’s kind of a bad privilege violation and could be considered a DoS exploit.

    Yes, it is.  A particularly serious one.  It's been a pet peeve of mine for years that this is knowingly ignored by those who should know better.

    Last I checked, there was a watchdog mechanism on the GPU that would fire after some time (7 seconds on nVidia GPUs).  That signals the driver (on the CPU) that it's doing a watchdog reset.  Unfortunately, the drivers don't really handle that.  They could - and *should* - reinitialise and recover, but it's just not implemented (or at least wasn't, a year or two ago).

    [[ There's probably also timeouts implemented in the driver and various other layers, though I don't know the details. ]]

    If you've ever done any CUDA work you'll be all too familiar with this problem.  Much of nVidia's own example code will trigger this failure mode, and most require a reboot to recover from.

    In general GPUs are in a comparative stone-age when it comes to security and stability.  They're getting better - retracing the steps of CPUs thirty years ago while thinking they're being very clever, in a cutely naive way - but it'll probably be many years before these problems are properly resolved.

    AMD & Intel are significantly ahead of nVidia in this regard, I hear.  But personally, after having every single nVidia-packing machine I've ever owned die (sometimes repeatedly) due to GPU-related hardware faults, I'd never buy an nVidia based machine to begin with.  But I digress...  back to trying to recover data from my 8800GS iMac...  again...
previous month may 2012 next month
MTWTFSS
  1 2 3 4 5 6
7 8 9 10 11 12 13
14 15 16 17 18 19 20
21 22 23 24 25 26 27
28 29 30 31      
Go to today