Two Questions About Parsing a String
-
Hi,
I'm looking for some help parsing a string from a file.
Firstly, getting the string is causing some issues. I read the String
Programming Guide for Cocoa, and got this:
NSString *path = ...;
NSData *data = [NSData dataWithContentsOfFile:path];
// assuming data is in UTF8
NSString *string = [NSString stringWithUTF8String:[data bytes]];
along with a warning that you must not use:
stringWithContentsOfFile:
So, I tried to do as I was told, but using the first example, I find
that successfully getting a string from the data is quite random. When
I start the application, my first attempt to load the file may or may
not result and a string being created, but if not, repeatedly trying
to load the file eventually works and I get the string. I'm not sure
what the encoding is, (I'm trying to create a .obj reader), but
setting it at UTF8 in Xcode doesn't help.
So, I took a peek at the dark side, and tried the soon to be
deprecated stringWithContentsOfFile: which works fine all the time.
Any thoughts on why the first example might be failing randomly?
My second question is NSScanner related. Going through the .obj file,
I have managed to get a system going where it scans for key characters
(e.g. "v" for vertex, "#" for commented text etc). Roughly:
while ([theScanner isAtEnd] == NO)
{
[theScanner scanUpToCharactersFromSet:vCharacters intoString:NULL];
[theScanner scanCharactersFromSet:vCharacters intoString:&testString];
if ([testString isEqualToString:@"#"]) {
[theScanner scanUpToString:@"\n" intoString:&dumpString];
NSLog(@"dumpString is %@", dumpString);
}
else if ([testString isEqualToString:@"o"]) {
[theScanner scanUpToString:@"\n" intoString:&theObjectName];
NSLog(@"name: %@", theObjectName);
}
else if ([testString isEqualToString:@"v"]) {
[theScanner scanFloat:&xVert];
[theScanner scanFloat:&yVert];
[theScanner scanFloat:&zVert];
NSLog(@"Vertex %i is: x = %f, y = %f, z = %f", ++i, xVert, yVert,
zVert);
}
}
}
However, the files also include identifiers such as "usemtl", which
could appear at any time. So any ideas how you go about searching for
a set of characters, and a set of strings simultaneously? i.e. how do
I search for the characters without momentarily ignoring the strings
or vice versa? This seems to be quite straightforward with fscanf, but
it seems a bit odd going to C, when I'm trying to do this in Objective-
C.
Any help with either question would be much appreciated.
Thank you,
Ian. -
On 22-Jul-08, at 11:09 AM, Ian Jackson wrote:> I'm looking for some help parsing a string from a file.
>
> Firstly, getting the string is causing some issues. I read the
> String Programming Guide for Cocoa, and got this:
>
> NSString *path = ...;
> NSData *data = [NSData dataWithContentsOfFile:path];
>
> // assuming data is in UTF8
> NSString *string = [NSString stringWithUTF8String:[data bytes]];
>
> along with a warning that you must not use:
>
> stringWithContentsOfFile:
>
> So, I tried to do as I was told, but using the first example, I find
> that successfully getting a string from the data is quite random.
> When I start the application, my first attempt to load the file may
> or may not result and a string being created, but if not, repeatedly
> trying to load the file eventually works and I get the string. I'm
> not sure what the encoding is, (I'm trying to create a .obj reader),
> but setting it at UTF8 in Xcode doesn't help.
> So, I took a peek at the dark side, and tried the soon to be
> deprecated stringWithContentsOfFile: which works fine all the time.
> Any thoughts on why the first example might be failing randomly?
>
>
Have you tried the non-deprecated
stringWithContentsOfFile:usedEncoding:error: or
stringWithContentsOfFile:encoding:error: ? The former actually attemps
to determine the encoding used for the file and returns that by
reference. They also allow error handling, so you can determine why
your files may not be read successfully.
Cheers, Patrick -
On 22 Jul '08, at 2:09 AM, Ian Jackson wrote:> NSString *path = ...;
> NSData *data = [NSData dataWithContentsOfFile:path];
> // assuming data is in UTF8
> NSString *string = [NSString stringWithUTF8String:[data bytes]];
The reason this doesn't work is that -stringWithUTF8String: expects a
NUL-terminated C string, but [data bytes] just returns the raw
contents of the data block. So the string factory method will keep
reading past the end of the data until it finds a zero byte in
whatever happens to be randomly out there. That means it'll read
garbage past the end of the string, and if that garbage doesn't look
like valid UTF-8, it'll fail.
The correct call to make would be
[[NSString alloc] initWithData: data encoding: NSUTF8StringEncoding]
although as Patrick already replied, the best way to read a string
from a file is +stringWithContentsOfFile:usedEncoding:error:, which
will attempt to determine the encoding.> However, the files also include identifiers such as "usemtl", which
> could appear at any time. So any ideas how you go about searching
> for a set of characters, and a set of strings simultaneously? i.e.
> how do I search for the characters without momentarily ignoring the
> strings or vice versa? This seems to be quite straightforward with
> fscanf, but it seems a bit odd going to C, when I'm trying to do
> this in Objective-C.
This is beyond what NSScanner can do. You have a number of options,
like scanning the string character by character using a 'for' loop,
using a parser generator like ANTLR, or simply calling fscanf.
(There's nothing wrong with using C APIs, when appropriate.)
âJens -
Thanks for your responses.
Looks like stringWithContentsOfFile:encoding:error: does what I need.
Jens, at least I know not to pursue the NSScanner thing any further in
this case.
Thanks,
Ian.
On 23/07/2008, at 3:39 AM, Jens Alfke wrote:>
> On 22 Jul '08, at 2:09 AM, Ian Jackson wrote:
>
>> NSString *path = ...;
>> NSData *data = [NSData dataWithContentsOfFile:path];
>> // assuming data is in UTF8
>> NSString *string = [NSString stringWithUTF8String:[data bytes]];
>
> The reason this doesn't work is that -stringWithUTF8String: expects
> a NUL-terminated C string, but [data bytes] just returns the raw
> contents of the data block. So the string factory method will keep
> reading past the end of the data until it finds a zero byte in
> whatever happens to be randomly out there. That means it'll read
> garbage past the end of the string, and if that garbage doesn't look
> like valid UTF-8, it'll fail.
>
> The correct call to make would be
> [[NSString alloc] initWithData: data encoding: NSUTF8StringEncoding]
> although as Patrick already replied, the best way to read a string
> from a file is +stringWithContentsOfFile:usedEncoding:error:, which
> will attempt to determine the encoding.
>
>> However, the files also include identifiers such as "usemtl", which
>> could appear at any time. So any ideas how you go about searching
>> for a set of characters, and a set of strings simultaneously? i.e.
>> how do I search for the characters without momentarily ignoring the
>> strings or vice versa? This seems to be quite straightforward with
>> fscanf, but it seems a bit odd going to C, when I'm trying to do
>> this in Objective-C.
>
> This is beyond what NSScanner can do. You have a number of options,
> like scanning the string character by character using a 'for' loop,
> using a parser generator like ANTLR, or simply calling fscanf.
> (There's nothing wrong with using C APIs, when appropriate.)
>
> âJens


