Repetitive Appending of Strings
-
Hi, all,
I have been working on a program that displays an extended output for
a division equation, which involves a loop that continually appends a
string, based on a number, to a longer string that is displayed to the
user at the end of the process. My algorithm for everything past the
decimal point (out to a number of digits determined by the variable
"scale") looks like this:
for (n = 0; n < scale && !self.isCancelled; ++n) {
b = 10 * (b - q * a);
q = floor(b / a);
[quotientString appendFormat:@"%i", q];
}
Note that call to -appendFormat. This is a beautiful solution in
terms of simplicity, but for long outputs it's calling a method
repeatedly.
What are my options here? The most simple optimization would involve
using an IMP and creating a function pointer to a method, yes?
Are there, however, other options available, ones that might be
better? For example, would it be more efficient (I'd imagine that it
would be) if I malloc'ed, or NSZoneMalloc'ed, a unichar pointer and
then used -initWithCharacters:length: or -
initWithCharactersNoCopy:length:freeWhenDone:?
But can people localize their numbers, thus rendering the unichar
option insufficient due to its lack of ability to be localized easily?
or am I missing an obvious solution for the unichar method, which --
if it's more efficient, and also sufficient -- I'd really like to use?
Cheers,
Andrew -
On Feb 12, 2008, at 10:47 AM, Andrew Merenbach wrote:
> Hi, all,
>
> I have been working on a program that displays an extended output
> for a division equation, which involves a loop that continually
> appends a string, based on a number, to a longer string that is
> displayed to the user at the end of the process. My algorithm for
> everything past the decimal point (out to a number of digits
> determined by the variable "scale") looks like this:
>
> for (n = 0; n < scale && !self.isCancelled; ++n) {
> b = 10 * (b - q * a);
> q = floor(b / a);
> [quotientString appendFormat:@"%i", q];
> }
>
> Note that call to -appendFormat. This is a beautiful solution in
> terms of simplicity, but for long outputs it's calling a method
> repeatedly.
>
> What are my options here? The most simple optimization would
> involve using an IMP and creating a function pointer to a method, yes?
>
First of all, use Shark to determine if this really is a problem - no
point in wasting time optimizing something that isn't taking a
noticeable portion of your programs time.
One possible approach is to use an array of components and then
combine them all together:
NSMutableArray *parts = [NSMutableArray arrayWithCapacity: scale];
for (n = 0; n < scale && !self.isCancelled; ++n) {
b = 10 * (b - q * a);
q = floor(b / a);
[parts addObject: [NSString stringWithFormat: @"%i",q]];
}
[quotientString appendString: [parts componentsJoinedBySeparator:@""]];
Not sure of this will be faster or slower - again, without using Shark
to determine the bottlenecks, it's unclear if this will help.
> But can people localize their numbers, thus rendering the unichar
> option insufficient due to its lack of ability to be localized
> easily? or am I missing an obvious solution for the unichar method,
> which -- if it's more efficient, and also sufficient -- I'd really
> like to use?
Yes, people can localize their numbers, but IIRC, you won't see any
difference for this specific case (i.e., the %i format looks the same
everywhere) - it's only once you get to floating point number (which
you'll have to handle upstream in your code) or formatters that things
get special. Fortunately, integer numbers are pretty standard (indo-
arabic numbers 0..9 are understood pretty much everywhere - while
various forms of gematria and other number forms can be used in
certain places, I'm not aware of any localization that requires them).
The ultimate question is what are these numbers that you are using?
If they are standard floats, doubles, or long doubles, why not just
specify the precision with the built in formatting strings - they will
do a better job than you can. If these number are something like a C+
+ numeric package for extended range numbers, said package probably
also has a formatting command - again, with higher capabilities than
what you're doing here.
Glenn Andreas <gandreas...>
<http://www.gandreas.com/> wicked fun!
quadrium2 | build, mutate, evolve, animate | images, textures,
fractals, art -
If this is outputting a single digit per iteration of the loop, it seems
like an extremely simple, effective optimization would be to allocate a
buffer of "scale+1" bytes and then write one byte into it per loop
iteration, specifially "q + '0'". You can't get much faster than that.
Then at the end of the loop, append a zero terminator to your buffer and
you've got yourself a C string buffer with a number in it. You can
easily convert that to an NSString with a single method call.
glenn andreas wrote:
>
> On Feb 12, 2008, at 10:47 AM, Andrew Merenbach wrote:
>
>> Hi, all,
>>
>> I have been working on a program that displays an extended output for
>> a division equation, which involves a loop that continually appends a
>> string, based on a number, to a longer string that is displayed to
>> the user at the end of the process. My algorithm for everything past
>> the decimal point (out to a number of digits determined by the
>> variable "scale") looks like this:
>>
>> for (n = 0; n < scale && !self.isCancelled; ++n) {
>> b = 10 * (b - q * a);
>> q = floor(b / a);
>> [quotientString appendFormat:@"%i", q];
>> }
>>
>> Note that call to -appendFormat. This is a beautiful solution in
>> terms of simplicity, but for long outputs it's calling a method
>> repeatedly.
>>
>> What are my options here? The most simple optimization would involve
>> using an IMP and creating a function pointer to a method, yes?
>>
>
> First of all, use Shark to determine if this really is a problem - no
> point in wasting time optimizing something that isn't taking a
> noticeable portion of your programs time.
>
> One possible approach is to use an array of components and then
> combine them all together:
>
> NSMutableArray *parts = [NSMutableArray arrayWithCapacity: scale];
> for (n = 0; n < scale && !self.isCancelled; ++n) {
> b = 10 * (b - q * a);
> q = floor(b / a);
> [parts addObject: [NSString stringWithFormat: @"%i",q]];
> }
> [quotientString appendString: [parts componentsJoinedBySeparator:@""]];
>
> Not sure of this will be faster or slower - again, without using Shark
> to determine the bottlenecks, it's unclear if this will help.
>
>
>> But can people localize their numbers, thus rendering the unichar
>> option insufficient due to its lack of ability to be localized
>> easily? or am I missing an obvious solution for the unichar method,
>> which -- if it's more efficient, and also sufficient -- I'd really
>> like to use?
>
>
> Yes, people can localize their numbers, but IIRC, you won't see any
> difference for this specific case (i.e., the %i format looks the same
> everywhere) - it's only once you get to floating point number (which
> you'll have to handle upstream in your code) or formatters that things
> get special. Fortunately, integer numbers are pretty standard
> (indo-arabic numbers 0..9 are understood pretty much everywhere -
> while various forms of gematria and other number forms can be used in
> certain places, I'm not aware of any localization that requires them).
>
> The ultimate question is what are these numbers that you are using?
> If they are standard floats, doubles, or long doubles, why not just
> specify the precision with the built in formatting strings - they will
> do a better job than you can. If these number are something like a
> C++ numeric package for extended range numbers, said package probably
> also has a formatting command - again, with higher capabilities than
> what you're doing here.
>
>
> Glenn Andreas <gandreas...>
> <http://www.gandreas.com/> wicked fun!
> quadrium2 | build, mutate, evolve, animate | images, textures,
> fractals, art
-
Hi, Glenn and John,
I have a better idea of what I'll need to do now. I get the feeling
that, since the only method call in the loop is the line in question,
that will be the slow part. When users try to output hundreds of
thousands of digits (not really that useful, most likely, but
something which I feel should be supported -- if only for
completeness), it might be a good idea to use a buffer, yes.
But I will try Shark and see what it says about the actual time
involved in my loop.
Many thanks to you both!
Cheers,
Andrew
On Feb 12, 2008, at 10:10 AM, John Stiles wrote:
> If this is outputting a single digit per iteration of the loop, it
> seems like an extremely simple, effective optimization would be to
> allocate a buffer of "scale+1" bytes and then write one byte into it
> per loop iteration, specifially "q + '0'". You can't get much
> faster than that.
>
> Then at the end of the loop, append a zero terminator to your buffer
> and you've got yourself a C string buffer with a number in it. You
> can easily convert that to an NSString with a single method call.
>
>
> glenn andreas wrote:
>>
>> On Feb 12, 2008, at 10:47 AM, Andrew Merenbach wrote:
>>
>>> Hi, all,
>>>
>>> I have been working on a program that displays an extended output
>>> for a division equation, which involves a loop that continually
>>> appends a string, based on a number, to a longer string that is
>>> displayed to the user at the end of the process. My algorithm for
>>> everything past the decimal point (out to a number of digits
>>> determined by the variable "scale") looks like this:
>>>
>>> for (n = 0; n < scale && !self.isCancelled; ++n) {
>>> b = 10 * (b - q * a);
>>> q = floor(b / a);
>>> [quotientString appendFormat:@"%i", q];
>>> }
>>>
>>> Note that call to -appendFormat. This is a beautiful solution in
>>> terms of simplicity, but for long outputs it's calling a method
>>> repeatedly.
>>>
>>> What are my options here? The most simple optimization would
>>> involve using an IMP and creating a function pointer to a method,
>>> yes?
>>>
>>
>> First of all, use Shark to determine if this really is a problem -
>> no point in wasting time optimizing something that isn't taking a
>> noticeable portion of your programs time.
>>
>> One possible approach is to use an array of components and then
>> combine them all together:
>>
>> NSMutableArray *parts = [NSMutableArray arrayWithCapacity: scale];
>> for (n = 0; n < scale && !self.isCancelled; ++n) {
>> b = 10 * (b - q * a);
>> q = floor(b / a);
>> [parts addObject: [NSString stringWithFormat: @"%i",q]];
>> }
>> [quotientString appendString: [parts
>> componentsJoinedBySeparator:@""]];
>>
>> Not sure of this will be faster or slower - again, without using
>> Shark to determine the bottlenecks, it's unclear if this will help.
>>
>>
>>> But can people localize their numbers, thus rendering the unichar
>>> option insufficient due to its lack of ability to be localized
>>> easily? or am I missing an obvious solution for the unichar
>>> method, which -- if it's more efficient, and also sufficient --
>>> I'd really like to use?
>>
>>
>> Yes, people can localize their numbers, but IIRC, you won't see any
>> difference for this specific case (i.e., the %i format looks the
>> same everywhere) - it's only once you get to floating point number
>> (which you'll have to handle upstream in your code) or formatters
>> that things get special. Fortunately, integer numbers are pretty
>> standard (indo-arabic numbers 0..9 are understood pretty much
>> everywhere - while various forms of gematria and other number forms
>> can be used in certain places, I'm not aware of any localization
>> that requires them).
>>
>> The ultimate question is what are these numbers that you are
>> using? If they are standard floats, doubles, or long doubles, why
>> not just specify the precision with the built in formatting strings
>> - they will do a better job than you can. If these number are
>> something like a C++ numeric package for extended range numbers,
>> said package probably also has a formatting command - again, with
>> higher capabilities than what you're doing here.
>>
>>
>> Glenn Andreas <gandreas...>
>> <http://www.gandreas.com/> wicked fun!
>> quadrium2 | build, mutate, evolve, animate | images, textures,
>> fractals, art
-
On Feb 12, 2008 11:47 AM, Andrew Merenbach <andrew.merenbach...> wrote:
> Note that call to -appendFormat. This is a beautiful solution in
> terms of simplicity, but for long outputs it's calling a method
> repeatedly.
>
> What are my options here? The most simple optimization would involve
> using an IMP and creating a function pointer to a method, yes?
>
> Are there, however, other options available, ones that might be
> better? For example, would it be more efficient (I'd imagine that it
> would be) if I malloc'ed, or NSZoneMalloc'ed, a unichar pointer and
> then used -initWithCharacters:length: or -
> initWithCharactersNoCopy:length:freeWhenDone:?
An Objective-C message send takes on the order of a dozen cycles to
execute. Put it another way, on any Mac you can buy today, an
Objective-C message will take less than ten *nanoseconds* to complete.
Unless you're outputting a ridiculously large number of digits, and
you've actually measured the code and determined that this is where
the problem is, it's simply not worth worrying about.
It's tempting to fret about Objective-C messages because they appear
to do so much, but they're ridiculously fast and it almost never pays
to try to avoid them.
Mike -
Honestly I suspect the method call itself won't be that expensive, but
parsing and executing a format string does take a lot of time.
Andrew Merenbach wrote:
> Hi, Glenn and John,
>
> I have a better idea of what I'll need to do now. I get the feeling
> that, since the only method call in the loop is the line in question,
> that will be the slow part. When users try to output hundreds of
> thousands of digits (not really that useful, most likely, but
> something which I feel should be supported -- if only for
> completeness), it might be a good idea to use a buffer, yes.
>
> But I will try Shark and see what it says about the actual time
> involved in my loop.
>
> Many thanks to you both!
>
> Cheers,
> Andrew
>
> On Feb 12, 2008, at 10:10 AM, John Stiles wrote:
>
>> If this is outputting a single digit per iteration of the loop, it
>> seems like an extremely simple, effective optimization would be to
>> allocate a buffer of "scale+1" bytes and then write one byte into it
>> per loop iteration, specifially "q + '0'". You can't get much faster
>> than that.
>>
>> Then at the end of the loop, append a zero terminator to your buffer
>> and you've got yourself a C string buffer with a number in it. You
>> can easily convert that to an NSString with a single method call.
>>
>>
>> glenn andreas wrote:
>>>
>>> On Feb 12, 2008, at 10:47 AM, Andrew Merenbach wrote:
>>>
>>>> Hi, all,
>>>>
>>>> I have been working on a program that displays an extended output
>>>> for a division equation, which involves a loop that continually
>>>> appends a string, based on a number, to a longer string that is
>>>> displayed to the user at the end of the process. My algorithm for
>>>> everything past the decimal point (out to a number of digits
>>>> determined by the variable "scale") looks like this:
>>>>
>>>> for (n = 0; n < scale && !self.isCancelled; ++n) {
>>>> b = 10 * (b - q * a);
>>>> q = floor(b / a);
>>>> [quotientString appendFormat:@"%i", q];
>>>> }
>>>>
>>>> Note that call to -appendFormat. This is a beautiful solution in
>>>> terms of simplicity, but for long outputs it's calling a method
>>>> repeatedly.
>>>>
>>>> What are my options here? The most simple optimization would
>>>> involve using an IMP and creating a function pointer to a method, yes?
>>>>
>>>
>>> First of all, use Shark to determine if this really is a problem -
>>> no point in wasting time optimizing something that isn't taking a
>>> noticeable portion of your programs time.
>>>
>>> One possible approach is to use an array of components and then
>>> combine them all together:
>>>
>>> NSMutableArray *parts = [NSMutableArray arrayWithCapacity: scale];
>>> for (n = 0; n < scale && !self.isCancelled; ++n) {
>>> b = 10 * (b - q * a);
>>> q = floor(b / a);
>>> [parts addObject: [NSString stringWithFormat: @"%i",q]];
>>> }
>>> [quotientString appendString: [parts componentsJoinedBySeparator:@""]];
>>>
>>> Not sure of this will be faster or slower - again, without using
>>> Shark to determine the bottlenecks, it's unclear if this will help.
>>>
>>>
>>>> But can people localize their numbers, thus rendering the unichar
>>>> option insufficient due to its lack of ability to be localized
>>>> easily? or am I missing an obvious solution for the unichar method,
>>>> which -- if it's more efficient, and also sufficient -- I'd really
>>>> like to use?
>>>
>>>
>>> Yes, people can localize their numbers, but IIRC, you won't see any
>>> difference for this specific case (i.e., the %i format looks the
>>> same everywhere) - it's only once you get to floating point number
>>> (which you'll have to handle upstream in your code) or formatters
>>> that things get special. Fortunately, integer numbers are pretty
>>> standard (indo-arabic numbers 0..9 are understood pretty much
>>> everywhere - while various forms of gematria and other number forms
>>> can be used in certain places, I'm not aware of any localization
>>> that requires them).
>>>
>>> The ultimate question is what are these numbers that you are using?
>>> If they are standard floats, doubles, or long doubles, why not just
>>> specify the precision with the built in formatting strings - they
>>> will do a better job than you can. If these number are something
>>> like a C++ numeric package for extended range numbers, said package
>>> probably also has a formatting command - again, with higher
>>> capabilities than what you're doing here.
>>>
>>>
>>> Glenn Andreas <gandreas...>
>>> <http://www.gandreas.com/> wicked fun!
>>> quadrium2 | build, mutate, evolve, animate | images, textures,
>>> fractals, art
-
I agree that the method invocation itself is probably not a big
problem. However appending to a string repeatedly could be very
inefficient depending on how it's implemented. In the worst case,
append always makes a new internal copy of its character buffer with
just the right amount of extra space at the end to hold the appended
content, so building up a string by appending one character at a time
becomes a O(n^2) operation. I assume this is what the original
poster was worried about.
You can easily avoid this worst case behavior by using the
+stringWithCapacity: method of NSMutableString, passing in the max
length of your digit string, so that your string already has an
internal buffer preallocated to the max number of characters the
string will grow to. Then you can append one char at a time up to the
internal capacity without causing any reallocate/copies to occur. This
seems more straightforward to me than using an explicit char array
which you have to manage and null-terminate yourself.
Also, I don't know about NSMutableString specifically, but other
string classes whose internals I've examined actually use a heuristic
allocation strategy to avoid this worst case append behavior, at the
expense of possibly using more memory than necessary. When an append
is performed that causes the internal char buffer to need to be
extended, rather than just extending it only as much as necessary to
hold the new characters, they double the size of the internal buffer.
This optimizes for the common usage of strings, where strings are
built up from many small appends. The amortized cost of building up a
string from n 1 character appends then becomes O(n) rather than O(n^2).
The upshot of all this is, don't prematurely optimize. It may be
just fine to use -appendFormat: repeatedly if NSMutableString uses a
heuristic like I described above. And in any case you can use
+stringWithCapacity: to preallocate the internal buffer if you know
how big your string can grow to. So that leaves you with only method
call cost to worry about.
Adam
On Feb 12, 2008, at 1:22 PM, Michael Ash wrote:
> An Objective-C message send takes on the order of a dozen cycles to
> execute. Put it another way, on any Mac you can buy today, an
> Objective-C message will take less than ten *nanoseconds* to complete.
>
> Unless you're outputting a ridiculously large number of digits, and
> you've actually measured the code and determined that this is where
> the problem is, it's simply not worth worrying about.
>
> It's tempting to fret about Objective-C messages because they appear
> to do so much, but they're ridiculously fast and it almost never pays
> to try to avoid them.
>
> Mike
-
On Feb 12, 2008, at 12:21 PM, Andrew Merenbach wrote:
> Hi, Glenn and John,
>
> I have a better idea of what I'll need to do now. I get the feeling
> that, since the only method call in the loop is the line in
> question, that will be the slow part. When users try to output
> hundreds of thousands of digits (not really that useful, most
> likely, but something which I feel should be supported -- if only
> for completeness), it might be a good idea to use a buffer, yes.
>
If what you're trying to do is to basically allow the user to specify
(at run time) the precision of the number, you could just do:
[NSString stringWithFormat: [NSString stringWithFormat: @"%%%df",
numberOfDigits], n]
which will make the format string dynamically. So if numberOfDigits
were 1000, it would be equivalent to [NSString stringWithFormat:
@"%1000f", n] (with the exact format string with regards to decimal
places and the like left as an exercise for the reader and time spent
with the numeric formating page). And of course, asking for even 100
decimal places is pointless, since a double has approximately 16
decimal digits of precision, with long double giving around 34...
Alternately look into NSNumberFormatter for more formatting options
than you can shake a significant digit at.
Glenn Andreas <gandreas...>
<http://www.gandreas.com/> wicked fun!
quadrium | prime : build, mutate, evolve, animate : the next
generation of fractal art -
On 12 Feb '08, at 10:44 AM, John Stiles wrote:
> Honestly I suspect the method call itself won't be that expensive,
> but parsing and executing a format string does take a lot of time.
I have run into exactly this as a real hot-spot in a particular app.
(It was generating a hex string from a block of data; basically the
same kind of thing.)
My solution was to build the output in an NSMutableData, adding one or
two ASCII characters at a time using -appendBytes:length:, and then
create an NSString from the data using the NSAsciiStringEncoding. This
is pretty efficient, especially if you initialize the data object with
sufficient capacity.
On the other hand, it would probably take more than 1000 digits for
one instance of this to consume any user-detectable amount of time.
—Jens -
Ah, you can do that with only a single stringWithFormat: call, since
the precision can be passed as an argument to format as well by using
* as the precision in the format string. For example
int scale = 4;
float f = 42.4242;
NSString *string = [NSString stringWithFormat:@"%.*f", scale, f];
is the same as
NSString *string = [NSString stringWithFormat:@"%.4f", f];
except the scale isn't hardcoded.
On Feb 12, 2008, at 2:05 PM, glenn andreas wrote:
> If what you're trying to do is to basically allow the user to
> specify (at run time) the precision of the number, you could just do:
>
> [NSString stringWithFormat: [NSString stringWithFormat: @"%%%df",
> numberOfDigits], n]
>
> which will make the format string dynamically. So if numberOfDigits
> were 1000, it would be equivalent to [NSString stringWithFormat:
> @"%1000f", n] (with the exact format string with regards to decimal
> places and the like left as an exercise for the reader and time
> spent with the numeric formating page). And of course, asking for
> even 100 decimal places is pointless, since a double has
> approximately 16 decimal digits of precision, with long double
> giving around 34...
>
> Alternately look into NSNumberFormatter for more formatting options
> than you can shake a significant digit at.
>
>
> Glenn Andreas <gandreas...>
> <http://www.gandreas.com/> wicked fun!
> quadrium | prime : build, mutate, evolve, animate : the next
> generation of fractal art
-
Hi, Adam,
I understand what you (and others) are saying about premature
optimization. My feelings were only hunches, really, which is why
Shark might be a good idea. A test of the buffer idea, however --
with an NSZoneCalloc'ed buffer of unichars -- and calculating a
quotient out to a million digits, yielded what was something like a
three-second time for the actual calculation -- compared with ten
seconds when using the -appendFormat: method.
I can't imagine a case where a person would actually want more than a
thousand -- or even more than a hundred, really -- digits for their
equations, but I would like to make the program capable of this, if
only for completeness (now, there's also a practical limit, I suppose,
on how many bytes of data an NSTextView can hold, but I don't know
what that is). Using a buffer seems to speed things up immensely --
in fact, the slowest part seems now to be the actual updating of the
text view.
Cheers,
Andrew
On Feb 12, 2008, at 10:53 AM, Adam P Jenkins wrote:
> I agree that the method invocation itself is probably not a big
> problem. However appending to a string repeatedly could be very
> inefficient depending on how it's implemented. In the worst case,
> append always makes a new internal copy of its character buffer with
> just the right amount of extra space at the end to hold the appended
> content, so building up a string by appending one character at a
> time becomes a O(n^2) operation. I assume this is what the
> original poster was worried about.
>
> You can easily avoid this worst case behavior by using the
> +stringWithCapacity: method of NSMutableString, passing in the max
> length of your digit string, so that your string already has an
> internal buffer preallocated to the max number of characters the
> string will grow to. Then you can append one char at a time up to
> the internal capacity without causing any reallocate/copies to
> occur. This seems more straightforward to me than using an explicit
> char array which you have to manage and null-terminate yourself.
>
> Also, I don't know about NSMutableString specifically, but other
> string classes whose internals I've examined actually use a
> heuristic allocation strategy to avoid this worst case append
> behavior, at the expense of possibly using more memory than
> necessary. When an append is performed that causes the internal
> char buffer to need to be extended, rather than just extending it
> only as much as necessary to hold the new characters, they double
> the size of the internal buffer. This optimizes for the common
> usage of strings, where strings are built up from many small
> appends. The amortized cost of building up a string from n 1
> character appends then becomes O(n) rather than O(n^2).
>
> The upshot of all this is, don't prematurely optimize. It may be
> just fine to use -appendFormat: repeatedly if NSMutableString uses a
> heuristic like I described above. And in any case you can use
> +stringWithCapacity: to preallocate the internal buffer if you know
> how big your string can grow to. So that leaves you with only
> method call cost to worry about.
>
> Adam
>
>
> On Feb 12, 2008, at 1:22 PM, Michael Ash wrote:
>> An Objective-C message send takes on the order of a dozen cycles to
>> execute. Put it another way, on any Mac you can buy today, an
>> Objective-C message will take less than ten *nanoseconds* to
>> complete.
>>
>> Unless you're outputting a ridiculously large number of digits, and
>> you've actually measured the code and determined that this is where
>> the problem is, it's simply not worth worrying about.
>>
>> It's tempting to fret about Objective-C messages because they appear
>> to do so much, but they're ridiculously fast and it almost never pays
>> to try to avoid them.
>>
>> Mike
-
I agree with you for appending millions of chars. My geekiness got
the better of my and I wrote a little test program to time how long
various methods of building a million digit numeric string 1 char at a
time took. On my computer the results were as follows:
0.42 seconds: Using NSMutableString's appendFormat:, starting with a
string with 0 capacity
0.41 seconds: Using NSMutableString's appendFormat:, starting with
capacity 1 million
0.032 seconds: Using NSMutableData appendBytes:length: took ~ 0.032
seconds
0.0039 seconds: Assigning directly into a char array
Notice that for NSMutableString appendFormat: it doesn't make much
difference whether the string is initially created with enough
capacity to hold the whole string or not, which shows that it does in
fact use a smart allocation strategy like I described below to
amortize the cost of growing the string.
The fact that there's an order of magnitude difference between using
NSMutableData appendBytes and assigning directly into an array also
show that method call cost can be noticeable when we're talking
millions of invocations a second.
On the flip side though I would note that all these times are so small
that it's really not worth worrying about this sort of thing in cases
where the number of appends isn't so high.
I've appended my test program in case you're interested. If you
create a Foundation Tool project in XCode, and paste this in as the
main method, then you should be able to build and run it. On my
machine, a MacBook Pro with 2.33 Ghz CPU, the output is:
String append with initial capacity 0 took 0.419497 seconds
String append with initial capacity 1000000 took 0.406551 seconds
NSMutableData appendBytes took 0.033865 seconds
Assigning into char array took 0.003881 seconds
On Feb 12, 2008, at 2:48 PM, Andrew Merenbach wrote:
> Hi, Adam,
>
> I understand what you (and others) are saying about premature
> optimization. My feelings were only hunches, really, which is why
> Shark might be a good idea. A test of the buffer idea, however --
> with an NSZoneCalloc'ed buffer of unichars -- and calculating a
> quotient out to a million digits, yielded what was something like a
> three-second time for the actual calculation -- compared with ten
> seconds when using the -appendFormat: method.
>
> I can't imagine a case where a person would actually want more than
> a thousand -- or even more than a hundred, really -- digits for
> their equations, but I would like to make the program capable of
> this, if only for completeness (now, there's also a practical limit,
> I suppose, on how many bytes of data an NSTextView can hold, but I
> don't know what that is). Using a buffer seems to speed things up
> immensely -- in fact, the slowest part seems now to be the actual
> updating of the text view.
>
> Cheers,
> Andrew
>
> On Feb 12, 2008, at 10:53 AM, Adam P Jenkins wrote:
>
>> I agree that the method invocation itself is probably not a big
>> problem. However appending to a string repeatedly could be very
>> inefficient depending on how it's implemented. In the worst case,
>> append always makes a new internal copy of its character buffer
>> with just the right amount of extra space at the end to hold the
>> appended content, so building up a string by appending one
>> character at a time becomes a O(n^2) operation. I assume this is
>> what the original poster was worried about.
>>
>> You can easily avoid this worst case behavior by using the
>> +stringWithCapacity: method of NSMutableString, passing in the max
>> length of your digit string, so that your string already has an
>> internal buffer preallocated to the max number of characters the
>> string will grow to. Then you can append one char at a time up to
>> the internal capacity without causing any reallocate/copies to
>> occur. This seems more straightforward to me than using an explicit
>> char array which you have to manage and null-terminate yourself.
>>
>> Also, I don't know about NSMutableString specifically, but other
>> string classes whose internals I've examined actually use a
>> heuristic allocation strategy to avoid this worst case append
>> behavior, at the expense of possibly using more memory than
>> necessary. When an append is performed that causes the internal
>> char buffer to need to be extended, rather than just extending it
>> only as much as necessary to hold the new characters, they double
>> the size of the internal buffer. This optimizes for the common
>> usage of strings, where strings are built up from many small
>> appends. The amortized cost of building up a string from n 1
>> character appends then becomes O(n) rather than O(n^2).
>>
>> The upshot of all this is, don't prematurely optimize. It may be
>> just fine to use -appendFormat: repeatedly if NSMutableString uses
>> a heuristic like I described above. And in any case you can use
>> +stringWithCapacity: to preallocate the internal buffer if you know
>> how big your string can grow to. So that leaves you with only
>> method call cost to worry about.
>>
>> Adam
#import <Foundation/Foundation.h>
int main (int argc, const char * argv[]) {
NSAutoreleasePool * pool = [NSAutoreleasePool new];
NSDate *tic, *toc;
NSMutableString *theString;
int niters = 1000000;
int i;
// start the string off at 0 capacity and let appendFormat expand it
tic = [NSDate date];
theString = [NSMutableString stringWithCapacity:0];
for (i = 0; i < niters; i++)
[theString appendFormat:@"%d", (i % 10)];
toc = [NSDate date];
printf("String append with initial capacity 0 took %f seconds\n",
[toc timeIntervalSinceDate:tic]);
[pool release];
pool = [NSAutoreleasePool new];
// start the string off with enough capacity to hold the whole
string, so no
// reallocations are necessary
tic = [NSDate date];
theString = [NSMutableString stringWithCapacity:niters];
for (i = 0; i < niters; i++)
[theString appendFormat:@"%d", (i % 10)];
toc = [NSDate date];
printf("String append with initial capacity %d took %f seconds\n",
niters, [toc timeIntervalSinceDate:tic]);
[pool release];
pool = [NSAutoreleasePool new];
// try using a buffer, using appendBytes:length
tic = [NSDate date];
NSMutableData *buffer = [NSMutableData dataWithCapacity:niters];
for (i = 0; i < niters; i++) {
char ch = (i % 10) + '0';
[buffer appendBytes:&ch length:1];
}
toc = [NSDate date];
printf("NSMutableData appendBytes took %f seconds\n",
[toc timeIntervalSinceDate:tic]);
[pool release];
pool = [NSAutoreleasePool new];
// finally, try assigning directly to a memory buffer
tic = [NSDate date];
buffer = [NSMutableData dataWithCapacity:niters];
char *chars = (char*)[buffer mutableBytes];
for (i = 0; i < niters; i++) {
char ch = (i % 10) + '0';
chars[i] = ch;
}
toc = [NSDate date];
printf("Assigning into char array took %f seconds\n",
[toc timeIntervalSinceDate:tic]);
[pool release];
return 0;
} -
If you've got a little geekery left in you, I'd be highly curious to
know the times for doing a million push_back's into a vector<char>. I
wonder how it compares to NSMutableData.
Adam P Jenkins wrote:
> I agree with you for appending millions of chars. My geekiness got
> the better of my and I wrote a little test program to time how long
> various methods of building a million digit numeric string 1 char at a
> time took. On my computer the results were as follows:
>
> 0.42 seconds: Using NSMutableString's appendFormat:, starting with a
> string with 0 capacity
> 0.41 seconds: Using NSMutableString's appendFormat:, starting with
> capacity 1 million
> 0.032 seconds: Using NSMutableData appendBytes:length: took ~ 0.032
> seconds
> 0.0039 seconds: Assigning directly into a char array
>
> Notice that for NSMutableString appendFormat: it doesn't make much
> difference whether the string is initially created with enough
> capacity to hold the whole string or not, which shows that it does in
> fact use a smart allocation strategy like I described below to
> amortize the cost of growing the string.
>
> The fact that there's an order of magnitude difference between using
> NSMutableData appendBytes and assigning directly into an array also
> show that method call cost can be noticeable when we're talking
> millions of invocations a second.
>
> On the flip side though I would note that all these times are so small
> that it's really not worth worrying about this sort of thing in cases
> where the number of appends isn't so high.
>
> I've appended my test program in case you're interested. If you
> create a Foundation Tool project in XCode, and paste this in as the
> main method, then you should be able to build and run it. On my
> machine, a MacBook Pro with 2.33 Ghz CPU, the output is:
>
> String append with initial capacity 0 took 0.419497 seconds
> String append with initial capacity 1000000 took 0.406551 seconds
> NSMutableData appendBytes took 0.033865 seconds
> Assigning into char array took 0.003881 seconds
>
>
> On Feb 12, 2008, at 2:48 PM, Andrew Merenbach wrote:
>
>> Hi, Adam,
>>
>> I understand what you (and others) are saying about premature
>> optimization. My feelings were only hunches, really, which is why
>> Shark might be a good idea. A test of the buffer idea, however --
>> with an NSZoneCalloc'ed buffer of unichars -- and calculating a
>> quotient out to a million digits, yielded what was something like a
>> three-second time for the actual calculation -- compared with ten
>> seconds when using the -appendFormat: method.
>>
>> I can't imagine a case where a person would actually want more than a
>> thousand -- or even more than a hundred, really -- digits for their
>> equations, but I would like to make the program capable of this, if
>> only for completeness (now, there's also a practical limit, I
>> suppose, on how many bytes of data an NSTextView can hold, but I
>> don't know what that is). Using a buffer seems to speed things up
>> immensely -- in fact, the slowest part seems now to be the actual
>> updating of the text view.
>>
>> Cheers,
>> Andrew
>>
>> On Feb 12, 2008, at 10:53 AM, Adam P Jenkins wrote:
>>
>>> I agree that the method invocation itself is probably not a big
>>> problem. However appending to a string repeatedly could be very
>>> inefficient depending on how it's implemented. In the worst case,
>>> append always makes a new internal copy of its character buffer with
>>> just the right amount of extra space at the end to hold the appended
>>> content, so building up a string by appending one character at a
>>> time becomes a O(n^2) operation. I assume this is what the
>>> original poster was worried about.
>>>
>>> You can easily avoid this worst case behavior by using the
>>> +stringWithCapacity: method of NSMutableString, passing in the max
>>> length of your digit string, so that your string already has an
>>> internal buffer preallocated to the max number of characters the
>>> string will grow to. Then you can append one char at a time up to
>>> the internal capacity without causing any reallocate/copies to
>>> occur. This seems more straightforward to me than using an explicit
>>> char array which you have to manage and null-terminate yourself.
>>>
>>> Also, I don't know about NSMutableString specifically, but other
>>> string classes whose internals I've examined actually use a
>>> heuristic allocation strategy to avoid this worst case append
>>> behavior, at the expense of possibly using more memory than
>>> necessary. When an append is performed that causes the internal
>>> char buffer to need to be extended, rather than just extending it
>>> only as much as necessary to hold the new characters, they double
>>> the size of the internal buffer. This optimizes for the common
>>> usage of strings, where strings are built up from many small
>>> appends. The amortized cost of building up a string from n 1
>>> character appends then becomes O(n) rather than O(n^2).
>>>
>>> The upshot of all this is, don't prematurely optimize. It may be
>>> just fine to use -appendFormat: repeatedly if NSMutableString uses a
>>> heuristic like I described above. And in any case you can use
>>> +stringWithCapacity: to preallocate the internal buffer if you know
>>> how big your string can grow to. So that leaves you with only
>>> method call cost to worry about.
>>>
>>> Adam
>
> #import <Foundation/Foundation.h>
>
> int main (int argc, const char * argv[]) {
> NSAutoreleasePool * pool = [NSAutoreleasePool new];
>
> NSDate *tic, *toc;
> NSMutableString *theString;
> int niters = 1000000;
> int i;
>
> // start the string off at 0 capacity and let appendFormat expand it
> tic = [NSDate date];
> theString = [NSMutableString stringWithCapacity:0];
> for (i = 0; i < niters; i++)
> [theString appendFormat:@"%d", (i % 10)];
> toc = [NSDate date];
> printf("String append with initial capacity 0 took %f seconds\n",
> [toc timeIntervalSinceDate:tic]);
>
> [pool release];
> pool = [NSAutoreleasePool new];
>
> // start the string off with enough capacity to hold the whole
> string, so no
> // reallocations are necessary
> tic = [NSDate date];
> theString = [NSMutableString stringWithCapacity:niters];
> for (i = 0; i < niters; i++)
> [theString appendFormat:@"%d", (i % 10)];
> toc = [NSDate date];
> printf("String append with initial capacity %d took %f seconds\n",
> niters, [toc timeIntervalSinceDate:tic]);
>
> [pool release];
> pool = [NSAutoreleasePool new];
>
>
> // try using a buffer, using appendBytes:length
> tic = [NSDate date];
> NSMutableData *buffer = [NSMutableData dataWithCapacity:niters];
> for (i = 0; i < niters; i++) {
> char ch = (i % 10) + '0';
> [buffer appendBytes:&ch length:1];
> }
> toc = [NSDate date];
> printf("NSMutableData appendBytes took %f seconds\n",
> [toc timeIntervalSinceDate:tic]);
>
> [pool release];
> pool = [NSAutoreleasePool new];
>
> // finally, try assigning directly to a memory buffer
> tic = [NSDate date];
> buffer = [NSMutableData dataWithCapacity:niters];
> char *chars = (char*)[buffer mutableBytes];
> for (i = 0; i < niters; i++) {
> char ch = (i % 10) + '0';
> chars[i] = ch;
> }
> toc = [NSDate date];
> printf("Assigning into char array took %f seconds\n",
> [toc timeIntervalSinceDate:tic]);
>
> [pool release];
> return 0;
> }
-
I tried it. It takes ~ 0.004 seconds for a million
vector<char>::push_back calls, if I build with -O3, so only very
slightly slower than the direct assignment into a char array.
That's not surprising since vector<char>::push_back is not a virtual
method and can be inlined, so the machine code should be the same as a
simple array element assignment + size increment. In fact, if instead
of using push_back I simply resize the vector to have a million
elements and then use simple element assignments, then the vector
performance is the same as using a raw array, in fact a little faster
for some reason.
Here's the output, and I've also attached the modified program.
String append with initial capacity 0 took 0.409247 seconds
String append with initial capacity 1000000 took 0.407078 seconds
NSMutableData appendBytes took 0.033926 seconds
Assigning into char array took 0.003776 seconds
vector.push_back took 0.004577 seconds
vector[i]=ch took 0.002886 seconds
On Feb 12, 2008, at 5:01 PM, John Stiles wrote:
> If you've got a little geekery left in you, I'd be highly curious to
> know the times for doing a million push_back's into a vector<char>.
> I wonder how it compares to NSMutableData.
#import <Foundation/Foundation.h>
#include <vector>
int main (int argc, const char * argv[]) {
NSAutoreleasePool * pool = [NSAutoreleasePool new];
NSDate *tic, *toc;
NSMutableString *theString;
int niters = 1000000;
int i;
// start the string off at 0 capacity and let appendFormat expand it
tic = [NSDate date];
theString = [NSMutableString stringWithCapacity:0];
for (i = 0; i < niters; i++)
[theString appendFormat:@"%d", (i % 10)];
toc = [NSDate date];
printf("String append with initial capacity 0 took %f seconds\n",
[toc timeIntervalSinceDate:tic]);
[pool release];
pool = [NSAutoreleasePool new];
// start the string off with enough capacity to hold the whole
string, so no
// reallocations are necessary
tic = [NSDate date];
theString = [NSMutableString stringWithCapacity:niters];
for (i = 0; i < niters; i++)
[theString appendFormat:@"%d", (i % 10)];
toc = [NSDate date];
printf("String append with initial capacity %d took %f seconds\n",
niters, [toc timeIntervalSinceDate:tic]);
[pool release];
pool = [NSAutoreleasePool new];
// try using a buffer, using appendBytes:length
tic = [NSDate date];
NSMutableData *buffer = [NSMutableData dataWithCapacity:niters];
for (i = 0; i < niters; i++) {
char ch = (i % 10) + '0';
[buffer appendBytes:&ch length:1];
}
toc = [NSDate date];
printf("NSMutableData appendBytes took %f seconds\n",
[toc timeIntervalSinceDate:tic]);
[pool release];
pool = [NSAutoreleasePool new];
// finally, try assigning directly to a memory buffer
tic = [NSDate date];
buffer = [NSMutableData dataWithCapacity:niters];
char *chars = (char*)[buffer mutableBytes];
for (i = 0; i < niters; i++) {
char ch = (i % 10) + '0';
chars[i] = ch;
}
toc = [NSDate date];
printf("Assigning into char array took %f seconds\n",
[toc timeIntervalSinceDate:tic]);
std::vector<char> vecchars;
vecchars.reserve(niters);
tic = [NSDate date];
for (i = 0; i < niters; i++)
vecchars.push_back((i%10) + '0');
toc = [NSDate date];
printf("vector.push_back took %f seconds\n",
[toc timeIntervalSinceDate:tic]);
vecchars.clear();
vecchars.resize(niters);
tic = [NSDate date];
for (i = 0; i < niters; i++)
vecchars[i] = (i%10) + '0';
toc = [NSDate date];
printf("vector[i]=ch took %f seconds\n",
[toc timeIntervalSinceDate:tic]);
[pool release];
return 0;
} -
Yeah, it was about 0.007 seconds without the reserve() call, but since
I was doing the equivalent of reserve with the NSMutableData case, it
seems more even to do the same with the vector<char> case.
As for clear(), I tried changing the code to use the swap trick you
showed, and the timing results were the same. I don't understand why
the vector[i]=ch example runs faster than the raw array access
example. It must just be an artifact of some allocation and caching
behavior. I'm fine with just calling them the same.
In any case it shows that for small methods where you don't actually
need dynamic dispatch, C++'s inlining and non-virtual methods can
yield much faster code than you can get with Objective-C classes.
Adam
On Feb 12, 2008, at 5:57 PM, John Stiles wrote:
> Wow. Go go standard library!
> I'd be interested in the timings of vector<char> without a call to
> reserve; I bet it's still pretty favorable.
> BTW, I'd suggest that "vecchars.clear()" is not an entirely fair
> test since a cleared vector is not entirely the same as a brand new
> vector. The cleared vector doesn't release its storage; vectors
> never shrink, only grow. In this case it's irrelevant since you
> don't seem to be timing the allocations, just the push-backs, but
> thought I'd mention it.
> You can really-truly clear out an existing vector by swapping it out
> for a newly constructed vector.
>
> vector<char>().swap(vecchars);
>
>
>
> Adam P Jenkins wrote:
>>
>> I tried it. It takes ~ 0.004 seconds for a million
>> vector<char>::push_back calls, if I build with -O3, so only very
>> slightly slower than the direct assignment into a char array.
>> That's not surprising since vector<char>::push_back is not a
>> virtual method and can be inlined, so the machine code should be
>> the same as a simple array element assignment + size increment. In
>> fact, if instead of using push_back I simply resize the vector to
>> have a million elements and then use simple element assignments,
>> then the vector performance is the same as using a raw array, in
>> fact a little faster for some reason.
>>
>> Here's the output, and I've also attached the modified program.
>>
>> String append with initial capacity 0 took 0.409247 seconds
>> String append with initial capacity 1000000 took 0.407078 seconds
>> NSMutableData appendBytes took 0.033926 seconds
>> Assigning into char array took 0.003776 seconds
>> vector.push_back took 0.004577 seconds
>> vector[i]=ch took 0.002886 seconds
>>
>>
>>
>> On Feb 12, 2008, at 5:01 PM, John Stiles wrote:
>>
>>> If you've got a little geekery left in you, I'd be highly curious
>>> to know the times for doing a million push_back's into a
>>> vector<char>. I wonder how it compares to NSMutableData.
>>
>> #import <Foundation/Foundation.h>
>> #include <vector>
>>
>> int main (int argc, const char * argv[]) {
>> NSAutoreleasePool * pool = [NSAutoreleasePool new];
>>
>> NSDate *tic, *toc;
>> NSMutableString *theString;
>> int niters = 1000000;
>> int i;
>>
>> // start the string off at 0 capacity and let appendFormat
>> expand it
>> tic = [NSDate date];
>> theString = [NSMutableString stringWithCapacity:0];
>> for (i = 0; i < niters; i++)
>> [theString appendFormat:@"%d", (i % 10)];
>> toc = [NSDate date];
>> printf("String append with initial capacity 0 took %f seconds\n",
>> [toc timeIntervalSinceDate:tic]);
>>
>> [pool release];
>> pool = [NSAutoreleasePool new];
>>
>> // start the string off with enough capacity to hold the whole
>> string, so no
>> // reallocations are necessary
>> tic = [NSDate date];
>> theString = [NSMutableString stringWithCapacity:niters];
>> for (i = 0; i < niters; i++)
>> [theString appendFormat:@"%d", (i % 10)];
>> toc = [NSDate date];
>> printf("String append with initial capacity %d took %f seconds
>> \n",
>> niters, [toc timeIntervalSinceDate:tic]);
>>
>> [pool release];
>> pool = [NSAutoreleasePool new];
>>
>>
>> // try using a buffer, using appendBytes:length
>> tic = [NSDate date];
>> NSMutableData *buffer = [NSMutableData dataWithCapacity:niters];
>> for (i = 0; i < niters; i++) {
>> char ch = (i % 10) + '0';
>> [buffer appendBytes:&ch length:1];
>> }
>> toc = [NSDate date];
>> printf("NSMutableData appendBytes took %f seconds\n",
>> [toc timeIntervalSinceDate:tic]);
>>
>> [pool release];
>> pool = [NSAutoreleasePool new];
>>
>> // finally, try assigning directly to a memory buffer
>> tic = [NSDate date];
>> buffer = [NSMutableData dataWithCapacity:niters];
>> char *chars = (char*)[buffer mutableBytes];
>> for (i = 0; i < niters; i++) {
>> char ch = (i % 10) + '0';
>> chars[i] = ch;
>> }
>> toc = [NSDate date];
>> printf("Assigning into char array took %f seconds\n",
>> [toc timeIntervalSinceDate:tic]);
>>
>> std::vector<char> vecchars;
>> vecchars.reserve(niters);
>> tic = [NSDate date];
>> for (i = 0; i < niters; i++)
>> vecchars.push_back((i%10) + '0');
>> toc = [NSDate date];
>> printf("vector.push_back took %f seconds\n",
>> [toc timeIntervalSinceDate:tic]);
>>
>> vecchars.clear();
>> vecchars.resize(niters);
>> tic = [NSDate date];
>> for (i = 0; i < niters; i++)
>> vecchars[i] = (i%10) + '0';
>> toc = [NSDate date];
>> printf("vector[i]=ch took %f seconds\n",
>> [toc timeIntervalSinceDate:tic]);
>>
>> [pool release];
>> return 0;
>> }
>>
-
On 13/02/2008, at 10:25 AM, Adam P Jenkins wrote:
> Yeah, it was about 0.007 seconds without the reserve() call, but
> since I was doing the equivalent of reserve with the NSMutableData
> case, it seems more even to do the same with the vector<char> case.
>
> As for clear(), I tried changing the code to use the swap trick you
> showed, and the timing results were the same. I don't understand
> why the vector[i]=ch example runs faster than the raw array access
> example. It must just be an artifact of some allocation and caching
> behavior. I'm fine with just calling them the same.
>
> In any case it shows that for small methods where you don't actually
> need dynamic dispatch, C++'s inlining and non-virtual methods can
> yield much faster code than you can get with Objective-C classes.
>
> Adam
I did point this out off-list, but I think it's worth mentioning it to
the list. I personally would be wary about these results because I
think they're too small. Time-slicing on OS X is the order of tens of
milliseconds (I believe) so a context switch could put these results
out.
Furthermore, the first two tests use format strings whilst the others
don't. I'd also worry about how one test was interacting with the next
test.
- Chris -
Le 13 févr. 08 à 00:25, Adam P Jenkins a écrit :
> Yeah, it was about 0.007 seconds without the reserve() call, but
> since I was doing the equivalent of reserve with the NSMutableData
> case, it seems more even to do the same with the vector<char> case.
>
> As for clear(), I tried changing the code to use the swap trick you
> showed, and the timing results were the same. I don't understand
> why the vector[i]=ch example runs faster than the raw array access
> example. It must just be an artifact of some allocation and caching
> behavior. I'm fine with just calling them the same.
>
> In any case it shows that for small methods where you don't actually
> need dynamic dispatch, C++'s inlining and non-virtual methods can
> yield much faster code than you can get with Objective-C classes.
>
> Adam
The String way is slow due to the format parsing. For bench I think I
will do this:
{
UniChar uch = (i % 10) + '0';
CFStringAppendCharacters((CFMutableStringRef)str, &uch, 1);
} -
I just added this test case. It takes about 0.05 seconds. Definitely
much better than using appendFormat:, and still lets you just use a
String object rather than resorting to lower level types. I like it.
On Feb 12, 2008, at 6:54 PM, Jean-Daniel Dupas wrote:
>
> The String way is slow due to the format parsing. For bench I think
> I will do this:
>
> {
> UniChar uch = (i % 10) + '0';
> CFStringAppendCharacters((CFMutableStringRef)str, &uch, 1);
> }
>
>



