Skip navigation.
 
mlInvalid strings and related bugs (#5775749)
FROM : Nir Soffer
DATE : Sun Mar 02 03:44:20 2008

I found that it is possible to get invalid strings from a PowerPoint 
file using applescript. The invalid string can not be converted to 
UTF-8 and corrupt NSXMLDocument.

The problem occur when iterating paragraphs in a shape. Iterating 
lines returns correct string. However, the issue is that NSString 
accept invalid data without any error, and returns an invalid 
instance. You can trim the invalid instance, get a mutable copy and 
replace characters etc. When you try to use it in NSXMLDocument, it 
will corrupted the output silently.

When you try to log such string with NSLog - it fails silently - the 
log line never appear! CFShow does print the string and show the some 
junk inside it.

On 10.4.11, the string is truncated by applescript. It can be 
converted to UTF-8 and logged with NSLog. When using in 
NSXMLDocument, it still corrupt the document, but does not truncate it.

I reported the bug (#5775749), but I guess that others would like to 
know about this issue.

Here is example code that show the bug. To reproduce, download the 
source and example files from <http://nirs.freeshell.org/files/
invalid-string.tbz
>


// Run this in the directory where "Slide Text.scpt" is located.
// Compile: cc invalid-string.m -o invalid-string -framework Cocoa

#import <Cocoa/Cocoa.h>

int main () {
    NSAutoreleasePool * pool = [[NSAutoreleasePool alloc] init];

    NSDictionary *errorInfo = nil;

    // Get a list of shape text

    NSURL *url = [NSURL fileURLWithPath:@"Slide Text.scpt"];

    NSAppleScript *script = [[[NSAppleScript alloc] 
initWithContentsOfURL:url
                                                                     
error:&errorInfo] autorelease];
    if (script == nil) {
        NSLog(@"Cannot load script: %@", errorInfo);
        exit(1);
    }

    NSAppleEventDescriptor *result = [script 
executeAndReturnError:&errorInfo];
    if (result == nil) {
        NSLog(@"Script error: %@", errorInfo);
        exit(1);
    }

    NSString *slideText = [result stringValue];


    //// Bugs:


    // 1. The string contains junk - probably PowerPoint bug - but 
NSString should return nil or truncate the invalid data

    CFShow(slideText);


    // 2. NSLog fail silently when printing this string - the log 
line is simply missing!

    NSLog(@"slide text: %@", slideText);


    // 3. The string can not be converted to utf8 (returns NULL):

    printf("utf8 string: %s\n", [slideText UTF8String]);


    // 4. xml data is corrupted without any error; the element 
containing the invalid string is missing, part of the string apear, 
and the document is truncated after the invalid string:

    NSXMLDocument *doc = [NSXMLDocument document];
    [doc setCharacterEncoding:@"UTF-8"];
    NSXMLElement *root = [NSXMLElement elementWithName:@"doc"];
    [doc setRootElement:root];
    NSXMLElement *slide = [NSXMLElement elementWithName:@"slide"];
    [root addChild:slide];
    [slide setStringValue:slideText];
    NSData *data = [doc XMLDataWithOptions:NSXMLNodePrettyPrint];
    NSString *xmlString = [[[NSString alloc] initWithData:data
                                                 
encoding:NSUTF8StringEncoding] autorelease];

    printf("%s\n", [xmlString UTF8String]);


    [pool release];
    return 0;
}



Best Regards,

Nir Soffer

Related mailsAuthorDate
No related mails found.