CFDictionary callback on PPC vs Intel

  • Hello all,

    The included semi-pseudo code works on my Macbook Pro (core duo
    32-bit) and doesn't work on my powermac g5.

    Macbook Pro behavior: Gets or creates the mref and adds it to the args_set.

    Powermac G5 behavior: The CFNumberRef mref never changes after the
    first run no matter what the CFStringRef smref value is. The strange
    thing is it does make it inside the if statement from the
    CFDictionaryGetValueIfPresent (according to debugging) but the mref
    pointer never changes. Thus, the args_set only ever has one pointer
    value in it no matter the data available.

    The following is a stripped down version that probably doesn't compile
    (missing file pointer and such) as I copied and pasted only the
    relevant parts:

    /* Begin code snippet */

    CFMutableDictionaryRef args = CFDictionaryCreateMutable
    (NULL,0,&kCFCopyStringDictionaryKeyCallBacks,NULL);
    __strong CFMutableSetRef args_set = CFSetCreateMutable(NULL,0,NULL);

    while (fgets(line, LINE_MAX, fp) != NULL && int < 26000) {

    int = int +1;
    char *m = strtok(line,sep);
    int im = atoi(m);

    CFNumberRef mref;
    CFStringRef smref = CFStringCreateWithCString(NULL, m, kCFStringEncodingUTF8);
    CFStringRef srref = CFStringCreateWithCString(NULL, r, kCFStringEncodingUTF8);
    if (!CFDictionaryGetValueIfPresent(args, smref, (const void **)&mref)) {
      CFNumberRef nmref = CFMakeCollectable(CFNumberCreate(NULL,
    kCFNumberSInt16Type, &im));
      CFDictionaryAddValue(args, smref, nmref);
      CFDictionaryGetValueIfPresent(args, smref, (const void **)&mref);
    }
    CFRelease(smref);

    CFSetAddValue(args_set, mref);
    }

    /* End code snippet */

    In case you are wondering why I want to do this: The file I'm reading
    in can be many gigabytes in size. It is a csv representation of a
    database in which each row uses a few different values over and over.
    I basically need unique pointers to each value. In most cases the
    value itself is not needed at all. Doing this allowed me to turn the
    text of a value in a csv into an existing pointer to an in memory
    object, and if it doesn't exist, create it.

    Any ideas why this would work on intel and not on ppc?

    Or, better yet, anyone have a better way to do this? I'm new at this
    and would appreciate any suggestions code clarifications and memory
    management help. I am using garbage collection.

    Thank you!

    --Derrek

    http://www.allofzero.com/~dleute/

    <dleute...> (preferred contact)
    home: (802) 347-1573
    cell: (516) 528-4619
  • On Feb 17, 2008 10:51 AM, Derrek Leute <dleute...> wrote:
    > int im = atoi(m);

    You declare im as an int.

    > CFNumberRef nmref = CFMakeCollectable(CFNumberCreate(NULL,
    > kCFNumberSInt16Type, &im));

    But tell CFNumber it's a signed short. The only reason it ever worked
    is by luck. Don't do that.

    Since this is cocoa-dev, may I suggest using Cocoa instead of CF? As
    far as I can see you aren't taking advantage of anything CF offers
    above what Cocoa offers, and Cocoa is nicer-looking and will
    incidentally avoid this error since NSNumber takes parameters by value
    instead of by reference.

    Mike
  • > You declare im as an int.
    >
    >> CFNumberRef nmref = CFMakeCollectable(CFNumberCreate(NULL,
    >> kCFNumberSInt16Type, &im));
    >
    > But tell CFNumber it's a signed short. The only reason it ever worked
    > is by luck. Don't do that.

    Wow. had to be that simple. :) So, to sum up, it wasn't setting the
    value (the key was fine) because my types didn't match up. I get it.
    :)

    The project actually started in cocoa but I moved to CF because
    NSDictionary forces a string copy for keys. This made memory needs go
    through the roof (unless I'm misusing or misunderstanding how to use
    it). CF lets me use anything as a key. In some sense I would love to
    be in Cocoa as I had to reimplement NSSet intersection and some other
    little basic things to do this. But it's fast, and it seems to be
    working quite well now. Not to mention that pointer comparison also
    seemed ridiculously faster on this amount of data.

    Thanks so much!

    >
    > Since this is cocoa-dev, may I suggest using Cocoa instead of CF? As
    > far as I can see you aren't taking advantage of anything CF offers
    > above what Cocoa offers, and Cocoa is nicer-looking and will
    > incidentally avoid this error since NSNumber takes parameters by value
    > instead of by reference.
    >
    > Mike
    >
  • On Feb 17, 2008, at 10:43 AM, Derrek Leute wrote:
    > The project actually started in cocoa but I moved to CF because
    > NSDictionary forces a string copy for keys. This made memory needs go
    > through the roof (unless I'm misusing or misunderstanding how to use
    > it). CF lets me use anything as a key.

    Close but not quite exactly correct.

    NSDictionary requires that keys conform to the NSCopying protocol
    because it make a copy of the key (regardless of if it is an NSString,
    or whatever).  The important part is that for a non-immutable
    NSString, it will simply increase the reference count - no increase in
    the memory usage at all.  If you pass an NSMutableString as a key, it
    will make a (immutable) copy that does, in fact, increase memory usage.

    You may wonder why it does this - consider this:

    NSMutableString *s1 = [NSMutableString stringWithString: @"abc"];
    NSMutableString *s2 = [NSMutableString stringWithString: @"def"];
    NSMutableDictionary *d = [NSMutableDictionary dictionary];

    [d setObject; [NSNumber numberWithInt: 1] forKey: s1];
    [d setObject: [NSNumber numberWithInt: 2] forKey: s2];
    [s1 setString: @"def"];

    NSLog(@"%@", [d objectForKey: @"def"]);

    If it didn't make an immutable copy, you would end up with two values
    that both have the key "def".

    So, if you use immutable objects as keys, your memory usage won't go
    up.  If you use mutable objects as keys, it will make a copy because
    if it didn't, bad things would happen.

    Glenn Andreas                      <gandreas...>
      <http://www.gandreas.com/> wicked fun!
    quadrium | prime : build, mutate, evolve, animate : the next
    generation of fractal art
  • On Feb 17, 2008 11:43 AM, Derrek Leute <dleute...> wrote:
    > The project actually started in cocoa but I moved to CF because
    > NSDictionary forces a string copy for keys. This made memory needs go
    > through the roof (unless I'm misusing or misunderstanding how to use
    > it). CF lets me use anything as a key. In some sense I would love to
    > be in Cocoa as I had to reimplement NSSet intersection and some other
    > little basic things to do this. But it's fast, and it seems to be
    > working quite well now. Not to mention that pointer comparison also
    > seemed ridiculously faster on this amount of data.

    Check out toll free bridging:

    http://www.cocoadev.com/index.pl?TollFreeBridging

    In short, a lot of CF types are equivalent to their NS types and can
    be "converted" simply by casting. I put "converted" in quotes because
    no conversion takes place; the same object is simultaneously an NS
    object and a CF object.

    Note that a CFDictionary created with custom callbacks will still copy
    its keys if you use -setObject:forKey:, so don't do that. But
    otherwise you can take a CFSet and use NSSet methods, you can put
    NSNumbers in your CFDictionaries, and so forth. As far as I know, all
    custom callbacks will be respected aside from NSDictionary insisting
    on creating copies of the keys.

    Mike
  • On Feb 17, 2008, at 9:51 AM, Derrek Leute wrote:

    > Hello all,
    >
    > The included semi-pseudo code works on my Macbook Pro (core duo
    > 32-bit) and doesn't work on my powermac g5.
    >
    > Macbook Pro behavior: Gets or creates the mref and adds it to the
    > args_set.
    >
    > Powermac G5 behavior: The CFNumberRef mref never changes after the
    > first run no matter what the CFStringRef smref value is. The strange
    > thing is it does make it inside the if statement from the
    > CFDictionaryGetValueIfPresent (according to debugging) but the mref
    > pointer never changes. Thus, the args_set only ever has one pointer
    > value in it no matter the data available.
    >
    > The following is a stripped down version that probably doesn't compile
    > (missing file pointer and such) as I copied and pasted only the
    > relevant parts:
    >
    > /* Begin code snippet */
    >
    > CFMutableDictionaryRef args = CFDictionaryCreateMutable
    > (NULL,0,&kCFCopyStringDictionaryKeyCallBacks,NULL);
    Huh? How can you pass a reference to a CONST
    (kCFCopyStringDictionaryKeyCallBacks)? I thought a CONST was just a
    compiler replacement at compile-time, and doesn't actually have any
    storage. Or is this not a CONST?

    > __strong CFMutableSetRef args_set = CFSetCreateMutable(NULL,0,NULL);
    >
    > while (fgets(line, LINE_MAX, fp) != NULL && int < 26000) {
    >
    > int = int +1;
    > char *m = strtok(line,sep);
    > int im = atoi(m);
    >
    > CFNumberRef mref;
    > CFStringRef smref = CFStringCreateWithCString(NULL, m,
    > kCFStringEncodingUTF8);
    > CFStringRef srref = CFStringCreateWithCString(NULL, r,
    > kCFStringEncodingUTF8);
    > if (!CFDictionaryGetValueIfPresent(args, smref, (const void **)
    > &mref)) {
    > CFNumberRef nmref = CFMakeCollectable(CFNumberCreate(NULL,
    > kCFNumberSInt16Type, &im));
    > CFDictionaryAddValue(args, smref, nmref);
    > CFDictionaryGetValueIfPresent(args, smref, (const void **)&mref);
    > }
    > CFRelease(smref);
    >
    > CFSetAddValue(args_set, mref);
    > }
    >
    > /* End code snippet */
    >
    > In case you are wondering why I want to do this: The file I'm reading
    > in can be many gigabytes in size. It is a csv representation of a
    > database in which each row uses a few different values over and over.
    > I basically need unique pointers to each value. In most cases the
    > value itself is not needed at all. Doing this allowed me to turn the
    > text of a value in a csv into an existing pointer to an in memory
    > object, and if it doesn't exist, create it.
    >
    > Any ideas why this would work on intel and not on ppc?
    >
    > Or, better yet, anyone have a better way to do this? I'm new at this
    > and would appreciate any suggestions code clarifications and memory
    > management help. I am using garbage collection.
    >
    > Thank you!
    >
    > --Derrek
    >
    > http://www.allofzero.com/~dleute/
    >
    > <dleute...> (preferred contact)
    > home: (802) 347-1573
    > cell: (516) 528-4619
  • >> CFMutableDictionaryRef args = CFDictionaryCreateMutable
    >> (NULL,0,&kCFCopyStringDictionaryKeyCallBacks,NULL);
    > Huh? How can you pass a reference to a CONST
    > (kCFCopyStringDictionaryKeyCallBacks)? I thought a CONST was just a
    > compiler replacement at compile-time, and doesn't actually have any
    > storage. Or is this not a CONST?

    I'm just doing as the apple documentation and examples do. They say
    pass it as a reference so I do. That's all I know. :)

    --Derrek
  • This is excellent to know. What about performance? Performance is
    *very* important in this case. A 1% increase can knock hours or days
    off of processing time on this size data set. Any performance metrics
    available?

    --Derrek

    On Feb 17, 2008 11:54 AM, glenn andreas <gandreas...> wrote:
    >
    > On Feb 17, 2008, at 10:43 AM, Derrek Leute wrote:
    >> The project actually started in cocoa but I moved to CF because
    >> NSDictionary forces a string copy for keys. This made memory needs go
    >> through the roof (unless I'm misusing or misunderstanding how to use
    >> it). CF lets me use anything as a key.
    >
    > Close but not quite exactly correct.
    >
    > NSDictionary requires that keys conform to the NSCopying protocol
    > because it make a copy of the key (regardless of if it is an NSString,
    > or whatever).  The important part is that for a non-immutable
    > NSString, it will simply increase the reference count - no increase in
    > the memory usage at all.  If you pass an NSMutableString as a key, it
    > will make a (immutable) copy that does, in fact, increase memory usage.
    >
    > You may wonder why it does this - consider this:
    >
    > NSMutableString *s1 = [NSMutableString stringWithString: @"abc"];
    > NSMutableString *s2 = [NSMutableString stringWithString: @"def"];
    > NSMutableDictionary *d = [NSMutableDictionary dictionary];
    >
    > [d setObject; [NSNumber numberWithInt: 1] forKey: s1];
    > [d setObject: [NSNumber numberWithInt: 2] forKey: s2];
    > [s1 setString: @"def"];
    >
    > NSLog(@"%@", [d objectForKey: @"def"]);
    >
    > If it didn't make an immutable copy, you would end up with two values
    > that both have the key "def".
    >
    > So, if you use immutable objects as keys, your memory usage won't go
    > up.  If you use mutable objects as keys, it will make a copy because
    > if it didn't, bad things would happen.
    >
    >
    > Glenn Andreas                      <gandreas...>
    > <http://www.gandreas.com/> wicked fun!
    > quadrium | prime : build, mutate, evolve, animate : the next
    > generation of fractal art
    >
    >
    >
    >
  • On Feb 17, 2008, at 08:43, Derrek Leute wrote:

    > The project actually started in cocoa but I moved to CF because
    > NSDictionary forces a string copy for keys. This made memory needs go
    > through the roof (unless I'm misusing or misunderstanding how to use
    > it). CF lets me use anything as a key. In some sense I would love to
    > be in Cocoa as I had to reimplement NSSet intersection and some other
    > little basic things to do this. But it's fast, and it seems to be
    > working quite well now. Not to mention that pointer comparison also
    > seemed ridiculously faster on this amount of data.

    Sorry if I'm being dense here, but I don't see how it's NSDictionary's
    fault. NSDictionary is only going to copy the key when you insert
    something, and (as somebody already pointed out) if the key string is
    immutable there may be no actual copying.

    It looks like the real problem is that you're creating a large number
    of temporary objects (the strings you use to look up the dictionary)
    with a lifetime of 1 loop iteration, but doing nothing to reclaim the
    unused memory (in the code snippet you showed us, at least). Judicious
    use of GC's collectIfNeeded might take care of that.

    The reason CFDictionary seems to work better is nothing to do with
    string copying per se. It's actually that innocent-looking CFRelease,
    which reclaims the memory you used for the temporary string at each
    iteration of the loop. In effect, you've switched from GC to pseudo-
    non-GC mode for the duration of the loop. :) That's why memory usage
    stays moderate.
  • It was mostly the immutable issue. I was using mutable strings when I
    used NSDictionary as I was unaware that it would simply retain the
    immutable version instead of copying them (is this documented
    somewhere that I missed?).

    When I was using NSDictionary I believe I was releasing the strings in
    a balanced fashion. I do realize I'm mixing GC with non-GC which
    probably isn't very comprehensible. But there are no apparent leaks
    and it is getting the expected output.

    Right now it's running very well. The next version I'll try cocoa and
    see how I do. :) As this was just a command line utility, I didn't see
    an overwhelming need to go beyond core foundation.

    Thanks for everyone's help!

    On Feb 17, 2008 3:05 PM, Quincey Morris <quinceymorris...> wrote:
    >
    > On Feb 17, 2008, at 08:43, Derrek Leute wrote:
    >
    >> The project actually started in cocoa but I moved to CF because
    >> NSDictionary forces a string copy for keys. This made memory needs go
    >> through the roof (unless I'm misusing or misunderstanding how to use
    >> it). CF lets me use anything as a key. In some sense I would love to
    >> be in Cocoa as I had to reimplement NSSet intersection and some other
    >> little basic things to do this. But it's fast, and it seems to be
    >> working quite well now. Not to mention that pointer comparison also
    >> seemed ridiculously faster on this amount of data.
    >
    > Sorry if I'm being dense here, but I don't see how it's NSDictionary's
    > fault. NSDictionary is only going to copy the key when you insert
    > something, and (as somebody already pointed out) if the key string is
    > immutable there may be no actual copying.
    >
    > It looks like the real problem is that you're creating a large number
    > of temporary objects (the strings you use to look up the dictionary)
    > with a lifetime of 1 loop iteration, but doing nothing to reclaim the
    > unused memory (in the code snippet you showed us, at least). Judicious
    > use of GC's collectIfNeeded might take care of that.
    >
    > The reason CFDictionary seems to work better is nothing to do with
    > string copying per se. It's actually that innocent-looking CFRelease,
    > which reclaims the memory you used for the temporary string at each
    > iteration of the loop. In effect, you've switched from GC to pseudo-
    > non-GC mode for the duration of the loop. :) That's why memory usage
    > stays moderate.
    >