Skip navigation.
 
mlRe: Repetitive Appending of Strings
FROM : John Stiles
DATE : Tue Feb 12 23:01:09 2008

If you've got a little geekery left in you, I'd be highly curious to
know the times for doing a million push_back's into a vector<char>. I
wonder how it compares to NSMutableData.



Adam P Jenkins wrote:
> I agree with you for appending millions of chars.  My geekiness got
> the better of my and I wrote a little test program to time how long
> various methods of building a million digit numeric string 1 char at a
> time took.  On my computer the results were as follows:
>
> 0.42 seconds: Using NSMutableString's appendFormat:, starting with a
> string with 0 capacity
> 0.41 seconds: Using NSMutableString's appendFormat:, starting with
> capacity 1 million
> 0.032 seconds: Using NSMutableData appendBytes:length: took ~ 0.032
> seconds
> 0.0039 seconds: Assigning directly into a char array
>
> Notice that for NSMutableString appendFormat: it doesn't make much
> difference whether the string is initially created with enough
> capacity to hold the whole string or not, which shows that it does in
> fact use a smart allocation strategy like I described below to
> amortize the cost of growing the string.
>
> The fact that there's an order of magnitude difference between using
> NSMutableData appendBytes and assigning directly into an array also
> show that method call cost can be noticeable when we're talking
> millions of invocations a second.
>
> On the flip side though I would note that all these times are so small
> that it's really not worth worrying about this sort of thing in cases
> where the number of appends isn't so high.
>
> I've appended my test program in case you're interested.  If you
> create a Foundation Tool project in XCode, and paste this in as the
> main method, then you should be able to build and run it.  On my
> machine, a MacBook Pro with 2.33 Ghz CPU, the output is:
>
> String append with initial capacity 0 took 0.419497 seconds
> String append with initial capacity 1000000 took 0.406551 seconds
> NSMutableData appendBytes took 0.033865 seconds
> Assigning into char array took 0.003881 seconds
>
>
> On Feb 12, 2008, at 2:48 PM, Andrew Merenbach wrote:
>

>> Hi, Adam,
>>
>> I understand what you (and others) are saying about premature
>> optimization.  My feelings were only hunches, really, which is why
>> Shark might be a good idea.  A test of the buffer idea, however --
>> with an NSZoneCalloc'ed buffer of unichars -- and calculating a
>> quotient out to a million digits, yielded what was something like a
>> three-second time for the actual calculation -- compared with ten
>> seconds when using the -appendFormat: method.
>>
>> I can't imagine a case where a person would actually want more than a
>> thousand -- or even more than a hundred, really -- digits for their
>> equations, but I would like to make the program capable of this, if
>> only for completeness (now, there's also a practical limit, I
>> suppose, on how many bytes of data an NSTextView can hold, but I
>> don't know what that is).  Using a buffer seems to speed things up
>> immensely -- in fact, the slowest part seems now to be the actual
>> updating of the text view.
>>
>> Cheers,
>>    Andrew
>>
>> On Feb 12, 2008, at 10:53 AM, Adam P Jenkins wrote:
>>

>>> I agree that the method invocation itself is probably not a big
>>> problem.  However appending to a string repeatedly could be very
>>> inefficient depending on how it's implemented.  In the worst case,
>>> append always makes a new internal copy of its character buffer with
>>> just the right amount of extra space at the end to hold the appended
>>> content, so building up a string by appending one character at a
>>> time becomes a O(n^2) operation.  I assume this is what the
>>> original poster was worried about.
>>>
>>> You can easily avoid this worst case behavior by using the
>>> +stringWithCapacity: method of NSMutableString, passing in the max
>>> length of your digit string, so that your string already has an
>>> internal buffer preallocated to the max number of characters the
>>> string will grow to.  Then you can append one char at a time up to
>>> the internal capacity without causing any reallocate/copies to
>>> occur. This seems more straightforward to me than using an explicit
>>> char array which you have to manage and null-terminate yourself.
>>>
>>> Also, I don't know about NSMutableString specifically, but other
>>> string classes whose internals I've examined actually use a
>>> heuristic allocation strategy to avoid this worst case append
>>> behavior, at the expense of possibly using more memory than
>>> necessary.  When an append is performed that causes the internal
>>> char buffer to need to be extended, rather than just extending it
>>> only as much as necessary to hold the new characters, they double
>>> the size of the internal buffer.  This optimizes for the common
>>> usage of strings, where strings are built up from many small
>>> appends.  The amortized cost of building up a string from n 1
>>> character appends then becomes O(n) rather than O(n^2).
>>>
>>> The upshot of all this is, don't prematurely optimize.  It may be
>>> just fine to use -appendFormat: repeatedly if NSMutableString uses a
>>> heuristic like I described above.  And in any case you can use
>>> +stringWithCapacity: to preallocate the internal buffer if you know
>>> how big your string can grow to.  So that leaves you with only
>>> method call cost to worry about.
>>>
>>> Adam

>
> #import <Foundation/Foundation.h>
>
> int main (int argc, const char * argv[]) {
>    NSAutoreleasePool * pool = [NSAutoreleasePool new];
>
>    NSDate *tic, *toc;
>    NSMutableString *theString;
>    int niters = 1000000;
>    int i;
>   
>    // start the string off at 0 capacity and let appendFormat expand it
>    tic = [NSDate date];
>    theString = [NSMutableString stringWithCapacity:0];
>    for (i = 0; i < niters; i++)
>        [theString appendFormat:@"%d", (i % 10)];
>    toc = [NSDate date];
>    printf("String append with initial capacity 0 took %f seconds\n",
>            [toc timeIntervalSinceDate:tic]);
>   
>    [pool release];
>    pool = [NSAutoreleasePool new];
>   
>    // start the string off with enough capacity to hold the whole
> string, so no
>    // reallocations are necessary
>    tic = [NSDate date];
>    theString = [NSMutableString stringWithCapacity:niters];
>    for (i = 0; i < niters; i++)
>        [theString appendFormat:@"%d", (i % 10)];
>    toc = [NSDate date];
>    printf("String append with initial capacity %d took %f seconds\n",
>            niters, [toc timeIntervalSinceDate:tic]);
>   
>    [pool release];
>    pool = [NSAutoreleasePool new];
>
>   
>    // try using a buffer, using appendBytes:length
>    tic = [NSDate date];
>    NSMutableData *buffer = [NSMutableData dataWithCapacity:niters];
>    for (i = 0; i < niters; i++) {
>        char ch = (i % 10) + '0';
>        [buffer appendBytes:&ch length:1];
>    }
>    toc = [NSDate date];
>    printf("NSMutableData appendBytes took %f seconds\n",
>            [toc timeIntervalSinceDate:tic]);
>   
>    [pool release];
>    pool = [NSAutoreleasePool new];
>   
>    // finally, try assigning directly to a memory buffer
>    tic = [NSDate date];
>    buffer = [NSMutableData dataWithCapacity:niters];
>    char *chars = (char*)[buffer mutableBytes];
>    for (i = 0; i < niters; i++) {
>        char ch = (i % 10) + '0';
>        chars[i] = ch;
>    }
>    toc = [NSDate date];
>    printf("Assigning into char array took %f seconds\n",
>            [toc timeIntervalSinceDate:tic]);
>   
>    [pool release];
>    return 0;
> }
>
> _______________________________________________
>
> Cocoa-dev mailing list (<email_removed>)
>
> Please do not post admin requests or moderator comments to the list.
> Contact the moderators at cocoa-dev-admins(at)lists.apple.com
>
> Help/Unsubscribe/Update your Subscription:
> http://lists.apple.com/mailman/options/cocoa-dev/<email_removed>
>
> This email sent to <email_removed>

Related mailsAuthorDate
mlRepetitive Appending of Strings Andrew Merenbach Feb 12, 17:47
mlRe: Repetitive Appending of Strings glenn andreas Feb 12, 19:06
mlRe: Repetitive Appending of Strings John Stiles Feb 12, 19:10
mlRe: Repetitive Appending of Strings Andrew Merenbach Feb 12, 19:21
mlRe: Repetitive Appending of Strings Michael Ash Feb 12, 19:22
mlRe: Repetitive Appending of Strings John Stiles Feb 12, 19:44
mlRe: Repetitive Appending of Strings Adam P Jenkins Feb 12, 19:53
mlRe: Repetitive Appending of Strings glenn andreas Feb 12, 20:05
mlRe: Repetitive Appending of Strings Jens Alfke Feb 12, 20:21
mlRe: Repetitive Appending of Strings Adam P Jenkins Feb 12, 20:33
mlRe: Repetitive Appending of Strings Andrew Merenbach Feb 12, 20:48
mlRe: Repetitive Appending of Strings Adam P Jenkins Feb 12, 22:07
mlRe: Repetitive Appending of Strings John Stiles Feb 12, 23:01
mlRe: Repetitive Appending of Strings Adam P Jenkins Feb 12, 23:51
mlRe: Repetitive Appending of Strings Adam P Jenkins Feb 13, 00:25
mlRe: Repetitive Appending of Strings Chris Suter Feb 13, 00:49
mlRe: Repetitive Appending of Strings Jean-Daniel Dupas Feb 13, 00:54
mlRe: Repetitive Appending of Strings Adam P Jenkins Feb 13, 01:21