FROM : John Stiles
DATE : Tue Feb 12 23:01:09 2008
If you've got a little geekery left in you, I'd be highly curious to
know the times for doing a million push_back's into a vector<char>. I
wonder how it compares to NSMutableData.
Adam P Jenkins wrote:
> I agree with you for appending millions of chars. My geekiness got
> the better of my and I wrote a little test program to time how long
> various methods of building a million digit numeric string 1 char at a
> time took. On my computer the results were as follows:
>
> 0.42 seconds: Using NSMutableString's appendFormat:, starting with a
> string with 0 capacity
> 0.41 seconds: Using NSMutableString's appendFormat:, starting with
> capacity 1 million
> 0.032 seconds: Using NSMutableData appendBytes:length: took ~ 0.032
> seconds
> 0.0039 seconds: Assigning directly into a char array
>
> Notice that for NSMutableString appendFormat: it doesn't make much
> difference whether the string is initially created with enough
> capacity to hold the whole string or not, which shows that it does in
> fact use a smart allocation strategy like I described below to
> amortize the cost of growing the string.
>
> The fact that there's an order of magnitude difference between using
> NSMutableData appendBytes and assigning directly into an array also
> show that method call cost can be noticeable when we're talking
> millions of invocations a second.
>
> On the flip side though I would note that all these times are so small
> that it's really not worth worrying about this sort of thing in cases
> where the number of appends isn't so high.
>
> I've appended my test program in case you're interested. If you
> create a Foundation Tool project in XCode, and paste this in as the
> main method, then you should be able to build and run it. On my
> machine, a MacBook Pro with 2.33 Ghz CPU, the output is:
>
> String append with initial capacity 0 took 0.419497 seconds
> String append with initial capacity 1000000 took 0.406551 seconds
> NSMutableData appendBytes took 0.033865 seconds
> Assigning into char array took 0.003881 seconds
>
>
> On Feb 12, 2008, at 2:48 PM, Andrew Merenbach wrote:
>
>> Hi, Adam,
>>
>> I understand what you (and others) are saying about premature
>> optimization. My feelings were only hunches, really, which is why
>> Shark might be a good idea. A test of the buffer idea, however --
>> with an NSZoneCalloc'ed buffer of unichars -- and calculating a
>> quotient out to a million digits, yielded what was something like a
>> three-second time for the actual calculation -- compared with ten
>> seconds when using the -appendFormat: method.
>>
>> I can't imagine a case where a person would actually want more than a
>> thousand -- or even more than a hundred, really -- digits for their
>> equations, but I would like to make the program capable of this, if
>> only for completeness (now, there's also a practical limit, I
>> suppose, on how many bytes of data an NSTextView can hold, but I
>> don't know what that is). Using a buffer seems to speed things up
>> immensely -- in fact, the slowest part seems now to be the actual
>> updating of the text view.
>>
>> Cheers,
>> Andrew
>>
>> On Feb 12, 2008, at 10:53 AM, Adam P Jenkins wrote:
>>
>>> I agree that the method invocation itself is probably not a big
>>> problem. However appending to a string repeatedly could be very
>>> inefficient depending on how it's implemented. In the worst case,
>>> append always makes a new internal copy of its character buffer with
>>> just the right amount of extra space at the end to hold the appended
>>> content, so building up a string by appending one character at a
>>> time becomes a O(n^2) operation. I assume this is what the
>>> original poster was worried about.
>>>
>>> You can easily avoid this worst case behavior by using the
>>> +stringWithCapacity: method of NSMutableString, passing in the max
>>> length of your digit string, so that your string already has an
>>> internal buffer preallocated to the max number of characters the
>>> string will grow to. Then you can append one char at a time up to
>>> the internal capacity without causing any reallocate/copies to
>>> occur. This seems more straightforward to me than using an explicit
>>> char array which you have to manage and null-terminate yourself.
>>>
>>> Also, I don't know about NSMutableString specifically, but other
>>> string classes whose internals I've examined actually use a
>>> heuristic allocation strategy to avoid this worst case append
>>> behavior, at the expense of possibly using more memory than
>>> necessary. When an append is performed that causes the internal
>>> char buffer to need to be extended, rather than just extending it
>>> only as much as necessary to hold the new characters, they double
>>> the size of the internal buffer. This optimizes for the common
>>> usage of strings, where strings are built up from many small
>>> appends. The amortized cost of building up a string from n 1
>>> character appends then becomes O(n) rather than O(n^2).
>>>
>>> The upshot of all this is, don't prematurely optimize. It may be
>>> just fine to use -appendFormat: repeatedly if NSMutableString uses a
>>> heuristic like I described above. And in any case you can use
>>> +stringWithCapacity: to preallocate the internal buffer if you know
>>> how big your string can grow to. So that leaves you with only
>>> method call cost to worry about.
>>>
>>> Adam
>
> #import <Foundation/Foundation.h>
>
> int main (int argc, const char * argv[]) {
> NSAutoreleasePool * pool = [NSAutoreleasePool new];
>
> NSDate *tic, *toc;
> NSMutableString *theString;
> int niters = 1000000;
> int i;
>
> // start the string off at 0 capacity and let appendFormat expand it
> tic = [NSDate date];
> theString = [NSMutableString stringWithCapacity:0];
> for (i = 0; i < niters; i++)
> [theString appendFormat:@"%d", (i % 10)];
> toc = [NSDate date];
> printf("String append with initial capacity 0 took %f seconds\n",
> [toc timeIntervalSinceDate:tic]);
>
> [pool release];
> pool = [NSAutoreleasePool new];
>
> // start the string off with enough capacity to hold the whole
> string, so no
> // reallocations are necessary
> tic = [NSDate date];
> theString = [NSMutableString stringWithCapacity:niters];
> for (i = 0; i < niters; i++)
> [theString appendFormat:@"%d", (i % 10)];
> toc = [NSDate date];
> printf("String append with initial capacity %d took %f seconds\n",
> niters, [toc timeIntervalSinceDate:tic]);
>
> [pool release];
> pool = [NSAutoreleasePool new];
>
>
> // try using a buffer, using appendBytes:length
> tic = [NSDate date];
> NSMutableData *buffer = [NSMutableData dataWithCapacity:niters];
> for (i = 0; i < niters; i++) {
> char ch = (i % 10) + '0';
> [buffer appendBytes:&ch length:1];
> }
> toc = [NSDate date];
> printf("NSMutableData appendBytes took %f seconds\n",
> [toc timeIntervalSinceDate:tic]);
>
> [pool release];
> pool = [NSAutoreleasePool new];
>
> // finally, try assigning directly to a memory buffer
> tic = [NSDate date];
> buffer = [NSMutableData dataWithCapacity:niters];
> char *chars = (char*)[buffer mutableBytes];
> for (i = 0; i < niters; i++) {
> char ch = (i % 10) + '0';
> chars[i] = ch;
> }
> toc = [NSDate date];
> printf("Assigning into char array took %f seconds\n",
> [toc timeIntervalSinceDate:tic]);
>
> [pool release];
> return 0;
> }
>
> _______________________________________________
>
> Cocoa-dev mailing list (<email_removed>)
>
> Please do not post admin requests or moderator comments to the list.
> Contact the moderators at cocoa-dev-admins(at)lists.apple.com
>
> Help/Unsubscribe/Update your Subscription:
> http://lists.apple.com/mailman/options/cocoa-dev/<email_removed>
>
> This email sent to <email_removed>
DATE : Tue Feb 12 23:01:09 2008
If you've got a little geekery left in you, I'd be highly curious to
know the times for doing a million push_back's into a vector<char>. I
wonder how it compares to NSMutableData.
Adam P Jenkins wrote:
> I agree with you for appending millions of chars. My geekiness got
> the better of my and I wrote a little test program to time how long
> various methods of building a million digit numeric string 1 char at a
> time took. On my computer the results were as follows:
>
> 0.42 seconds: Using NSMutableString's appendFormat:, starting with a
> string with 0 capacity
> 0.41 seconds: Using NSMutableString's appendFormat:, starting with
> capacity 1 million
> 0.032 seconds: Using NSMutableData appendBytes:length: took ~ 0.032
> seconds
> 0.0039 seconds: Assigning directly into a char array
>
> Notice that for NSMutableString appendFormat: it doesn't make much
> difference whether the string is initially created with enough
> capacity to hold the whole string or not, which shows that it does in
> fact use a smart allocation strategy like I described below to
> amortize the cost of growing the string.
>
> The fact that there's an order of magnitude difference between using
> NSMutableData appendBytes and assigning directly into an array also
> show that method call cost can be noticeable when we're talking
> millions of invocations a second.
>
> On the flip side though I would note that all these times are so small
> that it's really not worth worrying about this sort of thing in cases
> where the number of appends isn't so high.
>
> I've appended my test program in case you're interested. If you
> create a Foundation Tool project in XCode, and paste this in as the
> main method, then you should be able to build and run it. On my
> machine, a MacBook Pro with 2.33 Ghz CPU, the output is:
>
> String append with initial capacity 0 took 0.419497 seconds
> String append with initial capacity 1000000 took 0.406551 seconds
> NSMutableData appendBytes took 0.033865 seconds
> Assigning into char array took 0.003881 seconds
>
>
> On Feb 12, 2008, at 2:48 PM, Andrew Merenbach wrote:
>
>> Hi, Adam,
>>
>> I understand what you (and others) are saying about premature
>> optimization. My feelings were only hunches, really, which is why
>> Shark might be a good idea. A test of the buffer idea, however --
>> with an NSZoneCalloc'ed buffer of unichars -- and calculating a
>> quotient out to a million digits, yielded what was something like a
>> three-second time for the actual calculation -- compared with ten
>> seconds when using the -appendFormat: method.
>>
>> I can't imagine a case where a person would actually want more than a
>> thousand -- or even more than a hundred, really -- digits for their
>> equations, but I would like to make the program capable of this, if
>> only for completeness (now, there's also a practical limit, I
>> suppose, on how many bytes of data an NSTextView can hold, but I
>> don't know what that is). Using a buffer seems to speed things up
>> immensely -- in fact, the slowest part seems now to be the actual
>> updating of the text view.
>>
>> Cheers,
>> Andrew
>>
>> On Feb 12, 2008, at 10:53 AM, Adam P Jenkins wrote:
>>
>>> I agree that the method invocation itself is probably not a big
>>> problem. However appending to a string repeatedly could be very
>>> inefficient depending on how it's implemented. In the worst case,
>>> append always makes a new internal copy of its character buffer with
>>> just the right amount of extra space at the end to hold the appended
>>> content, so building up a string by appending one character at a
>>> time becomes a O(n^2) operation. I assume this is what the
>>> original poster was worried about.
>>>
>>> You can easily avoid this worst case behavior by using the
>>> +stringWithCapacity: method of NSMutableString, passing in the max
>>> length of your digit string, so that your string already has an
>>> internal buffer preallocated to the max number of characters the
>>> string will grow to. Then you can append one char at a time up to
>>> the internal capacity without causing any reallocate/copies to
>>> occur. This seems more straightforward to me than using an explicit
>>> char array which you have to manage and null-terminate yourself.
>>>
>>> Also, I don't know about NSMutableString specifically, but other
>>> string classes whose internals I've examined actually use a
>>> heuristic allocation strategy to avoid this worst case append
>>> behavior, at the expense of possibly using more memory than
>>> necessary. When an append is performed that causes the internal
>>> char buffer to need to be extended, rather than just extending it
>>> only as much as necessary to hold the new characters, they double
>>> the size of the internal buffer. This optimizes for the common
>>> usage of strings, where strings are built up from many small
>>> appends. The amortized cost of building up a string from n 1
>>> character appends then becomes O(n) rather than O(n^2).
>>>
>>> The upshot of all this is, don't prematurely optimize. It may be
>>> just fine to use -appendFormat: repeatedly if NSMutableString uses a
>>> heuristic like I described above. And in any case you can use
>>> +stringWithCapacity: to preallocate the internal buffer if you know
>>> how big your string can grow to. So that leaves you with only
>>> method call cost to worry about.
>>>
>>> Adam
>
> #import <Foundation/Foundation.h>
>
> int main (int argc, const char * argv[]) {
> NSAutoreleasePool * pool = [NSAutoreleasePool new];
>
> NSDate *tic, *toc;
> NSMutableString *theString;
> int niters = 1000000;
> int i;
>
> // start the string off at 0 capacity and let appendFormat expand it
> tic = [NSDate date];
> theString = [NSMutableString stringWithCapacity:0];
> for (i = 0; i < niters; i++)
> [theString appendFormat:@"%d", (i % 10)];
> toc = [NSDate date];
> printf("String append with initial capacity 0 took %f seconds\n",
> [toc timeIntervalSinceDate:tic]);
>
> [pool release];
> pool = [NSAutoreleasePool new];
>
> // start the string off with enough capacity to hold the whole
> string, so no
> // reallocations are necessary
> tic = [NSDate date];
> theString = [NSMutableString stringWithCapacity:niters];
> for (i = 0; i < niters; i++)
> [theString appendFormat:@"%d", (i % 10)];
> toc = [NSDate date];
> printf("String append with initial capacity %d took %f seconds\n",
> niters, [toc timeIntervalSinceDate:tic]);
>
> [pool release];
> pool = [NSAutoreleasePool new];
>
>
> // try using a buffer, using appendBytes:length
> tic = [NSDate date];
> NSMutableData *buffer = [NSMutableData dataWithCapacity:niters];
> for (i = 0; i < niters; i++) {
> char ch = (i % 10) + '0';
> [buffer appendBytes:&ch length:1];
> }
> toc = [NSDate date];
> printf("NSMutableData appendBytes took %f seconds\n",
> [toc timeIntervalSinceDate:tic]);
>
> [pool release];
> pool = [NSAutoreleasePool new];
>
> // finally, try assigning directly to a memory buffer
> tic = [NSDate date];
> buffer = [NSMutableData dataWithCapacity:niters];
> char *chars = (char*)[buffer mutableBytes];
> for (i = 0; i < niters; i++) {
> char ch = (i % 10) + '0';
> chars[i] = ch;
> }
> toc = [NSDate date];
> printf("Assigning into char array took %f seconds\n",
> [toc timeIntervalSinceDate:tic]);
>
> [pool release];
> return 0;
> }
>
> _______________________________________________
>
> Cocoa-dev mailing list (<email_removed>)
>
> Please do not post admin requests or moderator comments to the list.
> Contact the moderators at cocoa-dev-admins(at)lists.apple.com
>
> Help/Unsubscribe/Update your Subscription:
> http://lists.apple.com/mailman/options/cocoa-dev/<email_removed>
>
> This email sent to <email_removed>






Cocoa mail archive

