Leopard performance penalty (3x slower), NSPopAutoreleasePool

  • I have a complex Cocoa application, 10.4 SDK (no GC etc). Loading a
    document (and performing a large number of calculations triggered by
    that) takes 4 seconds under Tiger.

    Under Leopard, loading the same document takes 8 seconds until it
    displays, after which the application is unresponsive for another 4
    seconds, the entire time being spent in NSPopAutoreleasePool.

    I presume that performance penalty is largely due to Leopards more
    complex memory handling. Has anybody else observed this, and knows of
    any strategies to minimize that overhead?

    Thanks

    Gerd
  • On Nov 17, 2007, at 10:53 AM, Gerd Knops wrote:

    > I have a complex Cocoa application, 10.4 SDK (no GC etc). Loading a
    > document (and performing a large number of calculations triggered by
    > that) takes 4 seconds under Tiger.
    >
    > Under Leopard, loading the same document takes 8 seconds until it
    > displays, after which the application is unresponsive for another 4
    > seconds, the entire time being spent in NSPopAutoreleasePool.
    >
    > I presume that performance penalty is largely due to Leopards more
    > complex memory handling. Has anybody else observed this, and knows
    > of any strategies to minimize that overhead?

    What "more complex memory handling" are you referring to? Do you
    compare loading the document between Tiger and Leopard on the same
    machine (same hardware and amount of RAM)?

    You should probably investigate trying to create fewer temporary
    objects, and fewer autoreleased objects, as a way to fix this problem
    on your end.

    You may also want to file a performance regression bug report with
    Apple. I would suggest including: Shark Time Profile, Shark Time
    Profile (All Thread States), System Profile, top output. Make sure to
    sample the whole duration of loading the document, and include reports
    from both Tiger and Leopard.

    j o a r
  • On Nov 17, 2007, at 1:12 PM, j o a r wrote:

    >
    > On Nov 17, 2007, at 10:53 AM, Gerd Knops wrote:
    >
    >> I have a complex Cocoa application, 10.4 SDK (no GC etc). Loading a
    >> document (and performing a large number of calculations triggered
    >> by that) takes 4 seconds under Tiger.
    >>
    >> Under Leopard, loading the same document takes 8 seconds until it
    >> displays, after which the application is unresponsive for another 4
    >> seconds, the entire time being spent in NSPopAutoreleasePool.
    >>
    >> I presume that performance penalty is largely due to Leopards more
    >> complex memory handling. Has anybody else observed this, and knows
    >> of any strategies to minimize that overhead?
    >
    >
    > What "more complex memory handling" are you referring to? Do you
    > compare loading the document between Tiger and Leopard on the same
    > machine (same hardware and amount of RAM)?
    >
    Yes, at least initially. Now to speed things up I am using a (actually
    slower, but same setup, memory etc) machine to run Tiger, the results
    are the same as when I run on the same machine.

    > You should probably investigate trying to create fewer temporary
    > objects, and fewer autoreleased objects, as a way to fix this
    > problem on your end.
    >
    That would be a less than fun task, given >50.000 lines of code...

    I did sprinkle a number of local NSAutoReleasePools, so that
    autoreleased objects do not amass. That made no (measurable)
    difference in performance at all. I no longer have the 4 second period
    of unresponsiveness, but the overall process takes 4 seconds longer,
    so that was a wash. By adding timestamps before and after the
    [autoReleasePool release] I can see that they now roughly share the
    burden (eg take about the same time). So having all the temporary
    objects in one large pool or a number of smaller pools makes no
    difference.

    > You may also want to file a performance regression bug report with
    > Apple. I would suggest including: Shark Time Profile, Shark Time
    > Profile (All Thread States), System Profile, top output. Make sure
    > to sample the whole duration of loading the document, and include
    > reports from both Tiger and Leopard.

    I doubt that is going to be helpfull, as Shark records the time spent
    in NSPopAutoreleasePool as being in main(), and nothing else really
    jumps out, the slowness is all across the board.

    <rant>Also do radars ever actually help? I have spent many hours
    filing detailed radars, and best case (if I get an answer at all) a
    year or so later I hear "we can no longer reproduce this in our
    upcoming release". Somewhat frustrating, I'd rather spend the time
    playing with the dogs... For all the time I invested, I have never
    ever got a single helpful line back (like do x and things will be
    better). Makes me feel like I am spending my time (and money) to help
    Apple debug their stuff, for zero (other than the hope it might be
    better in a year) return. So basically the ROI on radars is extremely
    poor, and I can't make a business case for them. You may call that
    anti-social, but not working for a large company these radars do cost
    me actual dollars. Sorry for the rant...</rant>

    Gerd
  • OK, color my face red... Xcode was responsible for the slowdown.

    When testing on Leopard, I started my application through Xcode
    (normal run, not debug). Apparently that slows it down a lot...

    Under Tiger I usually ran the App right from Texmate after an
    xcodebuild run, but sadly under Leopard xcodebuild is no longer usable
    (50 sec "checking dependencies"), so I started building and running
    the app through Xcode. I do not recall Tiger's Xcode slowing Apps down
    in a normal 'run', so that never occurred to me as the potential
    source of my troubles.

    Gerd

    On Nov 17, 2007, at 12:53 PM, Gerd Knops wrote:

    > I have a complex Cocoa application, 10.4 SDK (no GC etc). Loading a
    > document (and performing a large number of calculations triggered by
    > that) takes 4 seconds under Tiger.
    >
    > Under Leopard, loading the same document takes 8 seconds until it
    > displays, after which the application is unresponsive for another 4
    > seconds, the entire time being spent in NSPopAutoreleasePool.
    >
    > I presume that performance penalty is largely due to Leopards more
    > complex memory handling. Has anybody else observed this, and knows
    > of any strategies to minimize that overhead?
    >
    > Thanks
    >
    > Gerd
    >
  • On Nov 17, 2007, at 11:45 AM, Gerd Knops wrote:

    >> You should probably investigate trying to create fewer temporary
    >> objects, and fewer autoreleased objects, as a way to fix this
    >> problem on your end.
    >>
    > That would be a less than fun task, given >50.000 lines of code...

    Perhaps not fun, but probably pretty effective. As you don't have to
    change your over all design patterns / architecture to make this
    change I would also expect it to be fairly straight forward and safe.

    > I did sprinkle a number of local NSAutoReleasePools, so that
    > autoreleased objects do not amass. That made no (measurable)
    > difference in performance at all. I no longer have the 4 second
    > period of unresponsiveness, but the overall process takes 4 seconds
    > longer, so that was a wash. By adding timestamps before and after
    > the [autoReleasePool release] I can see that they now roughly share
    > the burden (eg take about the same time). So having all the
    > temporary objects in one large pool or a number of smaller pools
    > makes no difference.

    Unless you have a problem with the total amount of memory used per
    iteration through the run loop I don't think that you should expect
    that moving the pools around will make much of a difference.

    >> You may also want to file a performance regression bug report with
    >> Apple. I would suggest including: Shark Time Profile, Shark Time
    >> Profile (All Thread States), System Profile, top output. Make sure
    >> to sample the whole duration of loading the document, and include
    >> reports from both Tiger and Leopard.
    >
    > I doubt that is going to be helpfull, as Shark records the time
    > spent in NSPopAutoreleasePool as being in main(), and nothing else
    > really jumps out, the slowness is all across the board.

    How much memory do you use in total? How much installed in the
    machine? Any paging / swapping (this is why I suggested including top
    reports)?

    > <rant>Also do radars ever actually help? I have spent many hours
    > filing detailed radars, and best case (if I get an answer at all) a
    > year or so later I hear "we can no longer reproduce this in our
    > upcoming release". Somewhat frustrating, I'd rather spend the time
    > playing with the dogs... For all the time I invested, I have never
    > ever got a single helpful line back (like do x and things will be
    > better). Makes me feel like I am spending my time (and money) to
    > help Apple debug their stuff, for zero (other than the hope it might
    > be better in a year) return. So basically the ROI on radars is
    > extremely poor, and I can't make a business case for them. You may
    > call that anti-social, but not working for a large company these
    > radars do cost me actual dollars. Sorry for the rant...</rant>

    They really do help, every single one of them. Radar is not a
    replacement for the Developer Technical Support that Apple provides
    and you shouldn't expect to use it as a communications channel with
    Apple. You're side comment about "the hope it might be better in a
    year" is spot on, and that's exactly what Radar provides: A way to
    increase the chance that an upcoming Software Update, or major OS
    release, will include a fix for the problem that you have reported. If
    you don't file the Radar the likelihood that your problem will be
    resolved is less than if you file the Radar. You will have to decide
    for yourself if it's worth the time or not, but I would urge you make
    the effort.

    On Nov 17, 2007, at 12:38 PM, Gerd Knops wrote:

    > When testing on Leopard, I started my application through Xcode
    > (normal run, not debug). Apparently that slows it down a lot...

    That is surprising, and worth additional investigation. Perhaps total
    memory usage on the system is to blame after all?

    > [...] sadly under Leopard xcodebuild is no longer usable (50 sec
    > "checking dependencies") [...]

    File a bug report!  :-)

    j o a r
  • A count of the number of seconds taken is not very useful.
    However, Instruments (formerly Xray) or Shark should tell you exactly
    what is going on and why things are taking time.
    You need to do some spelunking and compare the times coming out of
    Instruments or Shark in each OS.
    And FWIW, if you haven't enabled garbage collection, I don't believe
    Leopard does anything more sophisticated memory-wise than Tiger.

    On Nov 17, 2007, at 10:53 AM, Gerd Knops wrote:

    > I have a complex Cocoa application, 10.4 SDK (no GC etc). Loading a
    > document (and performing a large number of calculations triggered
    > by that) takes 4 seconds under Tiger.
    >
    > Under Leopard, loading the same document takes 8 seconds until it
    > displays, after which the application is unresponsive for another 4
    > seconds, the entire time being spent in NSPopAutoreleasePool.
    >
    > I presume that performance penalty is largely due to Leopards more
    > complex memory handling. Has anybody else observed this, and knows
    > of any strategies to minimize that overhead?
    >
    > Thanks
    >
    > Gerd
  • Running your application with Leopard's Instruments.app and the object
    alloc instrument should make what's going on clear.

    Run your app and when you see the big run up in the memory graph,
    select that range of time and see what object is being allocated more
    than expected.

    Jon Hess

    On Nov 17, 2007, at 10:53 AM, Gerd Knops <gerti-cocoadev...>
    wrote:

    > I have a complex Cocoa application, 10.4 SDK (no GC etc). Loading a
    > document (and performing a large number of calculations triggered by
    > that) takes 4 seconds under Tiger.
    >
    > Under Leopard, loading the same document takes 8 seconds until it
    > displays, after which the application is unresponsive for another 4
    > seconds, the entire time being spent in NSPopAutoreleasePool.
    >
    > I presume that performance penalty is largely due to Leopards more
    > complex memory handling. Has anybody else observed this, and knows
    > of any strategies to minimize that overhead?
    >
    > Thanks
    >
    > Gerd
  • On Nov 17, 2007, at 11:45 AM, Gerd Knops wrote:

    > <rant>

    Rant on USENET, not on Cocoa-dev.  We don't want this list to be like
    Carbon-dev.  Apple employees post here because they feel like it, not
    because they have to.  Don't make it unpleasant for them.

    -jcr
  • >> On Nov 17, 2007, at 10:53 AM, Gerd Knops wrote:
    >>
    >>> You should probably investigate trying to create fewer temporary
    >> objects, and fewer autoreleased objects, as a way to fix this
    >> problem on your end.
    >>
    > That would be a less than fun task, given >50.000 lines of code...
    >
    > I did sprinkle a number of local NSAutoReleasePools, so that
    > autoreleased objects do not amass. That made no (measurable)
    > difference in performance at all. I no longer have the 4 second period
    > of unresponsiveness, but the overall process takes 4 seconds longer,
    > so that was a wash. By adding timestamps before and after the
    > [autoReleasePool release] I can see that they now roughly share the
    > burden (eg take about the same time). So having all the temporary
    > objects in one large pool or a number of smaller pools makes no
    > difference.

    If Shark is reporting a lot of time popping an autorelease pool, then
    using fewer temporary/autoreleased objects is obvious.

    -autorelease is a convenience.  It's handy, saves a couple lines of
    code, can provide a simpler API, and can eliminate extra code in
    places you expect exceptions to be thrown.

    It's also, relative to just -release, expensive.  In addition to all
    the extra work to keep track of the object until the pool is released,
    it requires more memory.  The memory for a temporary object can't be
    reused for something else until you actually release it.  So an
    autorelease pool extends the lifetime of those objects.  Sometimes
    that's a useful feature.  It can make an API less error prone.
    Sometimes it's just a performance drain.  Unnecessarily extending the
    lifetime of memory blocks means that the process needs more memory.
    It needs to compensate for all the temporary blocks that are pending
    in the autorelease pool, but never used again.  Growing the high
    watermark of a process heap means that more memory needs to be
    allocated from the OS.  That is much much much more expensive than -
    release.

    So using autoreleased objects within performance critical loops is
    very counterproductive.

    It also sounds like you're allocating a large number of objects.  You
    may be able to improve performance by allocating fewer objects
    overall.  This is more work than simply releasing temporary objects
    more aggressively, but can provide additional improvements.  Depending
    on what you need, you can reuse allocated objects (allocate a mutable
    object, and mutate instead), malloc a large buffer and divide it up
    yourself, or use the batch malloc APIs in malloc.h

    - Ben
  • Hello,

    Am 17.11.2007 um 21:38 schrieb Gerd Knops:
    > When testing on Leopard, I started my application through Xcode
    > (normal run, not debug). Apparently that slows it down a lot...
    >
    > (...) I do not recall Tiger's Xcode slowing Apps down in a normal
    > 'run',

    If I'm not mistaken, in Leopard there is no differentiation anymore
    between "normal run" and "debug": Xcode 3 now always runs the program
    with GDB attached, but in "run" mode, it simply deactivates all break
    points, so in case you "run" your app and something bad happens, the
    debugger is usable immediatley, you don't have to re-run the app with
    the debugger attached and reproduce the error. Therefore, you can
    just re-activate your break points anytime to switch from "run" mode
    to "debug" mode.

    So, on Leopard both running and debugging should be slow now :)

    Best,
    Dirk Stegemann
previous month november 2007 next month
MTWTFSS
      1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30    
Go to today