FROM : Daniel Vollmer
DATE : Tue May 13 21:55:43 2008
On May 13, 2008, at 17:00, Jens Alfke wrote:
>
> On 12 May '08, at 11:38 PM, Daniel Vollmer wrote:
>
>> I'm parsing a rather large text-file (usually >20MB) and in doing
>> so I'm iterating over its lines with [String
>> getParagraphStart::::]. I've found a rather noticeable speed-up in
>> the parsing operation if I create the string in question from an
>> NSData object (created via initWithContentsOfMappedFile) using
>> [String initWithData:encoding:].
>
> It sounds like you're creating a single NSString containing the
> entire contents of the file, then?
Yes. Is that something I shouldn't do? I mean, I feel a tiny bit silly
creating such huge strings but I didn't find a nice alternative (e.g.
like the Ruby for each line iterators on file objects).
>> 2) Are substrings created from the original string (e.g.
>> substringWithRange etc.) still backed properly after the original
>> string and the NSData object are released?
>
> Yes. Even if the NSString is still using the NSData's contents for
> its buffer, it retained them, so releasing the NSData won't make it
> go away until the string is done with it.
But now that means that the strings are "endangered" from in-place
file modification for the lifetime of my objects created during
parsing, not just the initial parsing itself, correct?
Also, it feels a bit silly to have a retain on the 20MB NSData object
while I still hold references to about 5KB of string bytes from
various places in the file. Usually all this "behind-the-scenes"
storage retaining doesn't matter much, but I'd quite like to make sure
I drop most of the 20MB once I'm done parsing. This question of course
also applies if I'm not mapping the file and creating a String from it
directly
FWIW, my current iteration looks like this (String being the big 20MB
one);
NSUInteger length = [String length];
NSUInteger paraStart = 0, paraEnd = 0, contentsEnd = 0;
while (paraEnd < length)
{
[String getParagraphStart:¶Start end:¶End
contentsEnd:&contentsEnd forRange:NSMakeRange(paraEnd, 0)];
line = [String substringWithRange:NSMakeRange(paraStart, contentsEnd
- paraStart)];
// do lots of menial parsing of line
}
If I leave the mmaped reading in, it sounds like a sensible idea to
check whether the file is on the same drive as the app. So thanks for
that suggestion.
Thanks for any further insight,
Daniel.
DATE : Tue May 13 21:55:43 2008
On May 13, 2008, at 17:00, Jens Alfke wrote:
>
> On 12 May '08, at 11:38 PM, Daniel Vollmer wrote:
>
>> I'm parsing a rather large text-file (usually >20MB) and in doing
>> so I'm iterating over its lines with [String
>> getParagraphStart::::]. I've found a rather noticeable speed-up in
>> the parsing operation if I create the string in question from an
>> NSData object (created via initWithContentsOfMappedFile) using
>> [String initWithData:encoding:].
>
> It sounds like you're creating a single NSString containing the
> entire contents of the file, then?
Yes. Is that something I shouldn't do? I mean, I feel a tiny bit silly
creating such huge strings but I didn't find a nice alternative (e.g.
like the Ruby for each line iterators on file objects).
>> 2) Are substrings created from the original string (e.g.
>> substringWithRange etc.) still backed properly after the original
>> string and the NSData object are released?
>
> Yes. Even if the NSString is still using the NSData's contents for
> its buffer, it retained them, so releasing the NSData won't make it
> go away until the string is done with it.
But now that means that the strings are "endangered" from in-place
file modification for the lifetime of my objects created during
parsing, not just the initial parsing itself, correct?
Also, it feels a bit silly to have a retain on the 20MB NSData object
while I still hold references to about 5KB of string bytes from
various places in the file. Usually all this "behind-the-scenes"
storage retaining doesn't matter much, but I'd quite like to make sure
I drop most of the 20MB once I'm done parsing. This question of course
also applies if I'm not mapping the file and creating a String from it
directly
FWIW, my current iteration looks like this (String being the big 20MB
one);
NSUInteger length = [String length];
NSUInteger paraStart = 0, paraEnd = 0, contentsEnd = 0;
while (paraEnd < length)
{
[String getParagraphStart:¶Start end:¶End
contentsEnd:&contentsEnd forRange:NSMakeRange(paraEnd, 0)];
line = [String substringWithRange:NSMakeRange(paraStart, contentsEnd
- paraStart)];
// do lots of menial parsing of line
}
If I leave the mmaped reading in, it sounds like a sensible idea to
check whether the file is on the same drive as the app. So thanks for
that suggestion.
Thanks for any further insight,
Daniel.






Cocoa mail archive

