NSTask/NSPipe STDIN hangs on large data, sometimes...

  • Greetings!

    This being my first post to the list, permit me the obligatory passing
    mention that I've been having a blast with Cocoa and am very impressed with
    how quickly I've been able to build useful applications. I'd used PowerPlant
    in the past, but the combination of the supplied Frameworks with Objective-C
    is, IMHO, much, much better (read: easier to use!)...not to mention that
    awesome OS under the hood.

    Of course, I wouldn't be posting if I didn't have an issue.

    After months of learning on my own (many thanks to the great books that are
    out there) and trial-and-error with the classes, I'm finally completely
    stumped.

    I have fully functioning implementations of synchronous and asynch
    NSTasks/NSPipes to execute UNIX command line programs, passing and
    retrieving data via STDIO. [Many thanks to the Moriarty sample code, the G&M
    O'Reilly Book, and A/B/Y's Sams book!!!]

    Mostly, this works just fine.

    HOWEVER, some commands (or the pipe that's feeding them their STDIN) seem to
    choke if I'm sending a large amount of data into the command via STDIN.

    For example, on the terminal command line, I can do this:

    % cat /var/log/httpd/access_log | grep Jan

    And even though the access_log is > 1MB, it works fine, as expected.

    HOWEVER, if I try the same thing using NSTask/NSPipe in Cocoa, it hangs
    during writeData: to the STDIN file handle, but only if there is a lot of
    data being sent to grep.

    I have done some research on the problem (in part to be sure this is really
    a problem before posting to the list), and I can tell you that some commands
    (eg, tail, sort, wc) work great, even with large stdin writes (those three
    in particular were tested up to 8MB, and they worked fine!).

    In particular, grep (and/or egrep) starts to hang if the STDIN is 40K or
    larger and uniq hangs at about 32K. And I mean hang, the app doesn't die or
    exit, it just hangs during writeData: to the Task's stdIn pipe's file handle
    for writing. The exact same code is used for all the tests described above.

    I have tried changing the NSUnbufferedIO environment setting, no help.

    What else can I try? What am I missing?

    Has anyone else experienced this problem and solved it or am I a lone corner
    case? I googled a little and have looked at some of the Cocoa sites (and
    this list), but nothing jumped out at me that was addressing or solving this
    issue.

    Is this a problem with my code (I'll post a spike of the app if needed) or
    the frameworks or Darwin, OR?

    I'm running 10.1.5 (uname -a below) on a PBG4/550Mhz/GBEther. I have a
    non-development Jaguar box I could test this on (actually, it's a
    half-sphere, not a box at all!), but it's burning a DVD for the next hour or
    two, so I don't really want to muck with it at the moment to test this.
    Especially since I want this to work even on 10.1.5 and hopefully since
    someone out there has already solved this.

    Thanks for your help, and thanks for Cocoa!

    Joe Pezzillo
    Boulder, CO | Santa Cruz, CA
    <joe...>

    % uname -a
    Darwin silvern 5.5 Darwin Kernel Version 5.5: Thu May 30 14:51:26 PDT 2002;
    root:xnu/xnu-201.42.3.obj~1/RELEASE_PPC  Power Macintosh powerpc
    _______________________________________________
    cocoa-dev mailing list | <cocoa-dev...>
    Help/Unsubscribe/Archives: http://www.lists.apple.com/mailman/listinfo/cocoa-dev
    Do not post admin requests to the list. They will be ignored.
  • On Thursday, Jan 16, 2003, at 21:11 US/Eastern,
    <cocoa-dev-request...> wrote:
    > Has anyone else experienced this problem and solved it or am I a lone
    > corner
    > case? I googled a little and have looked at some of the Cocoa sites
    > (and
    > this list), but nothing jumped out at me that was addressing or
    > solving this
    > issue.
    >
    > Is this a problem with my code (I'll post a spike of the app if
    > needed) or
    > the frameworks or Darwin, OR?

    Are you using the 'readInBackgroundAndNotify' mode on NSFileHandle?  If
    not, that is the cause of your problem as the buffering in NSFileHandle
    will end up blocking on itself.  This isn't so much a bug as fallout
    from the way the system works (and one for which an easy workaround
    exists -- readInBackgroundAndNotify).

    Also, availableData, readDataToEndOfFile, and-- I think--
    readDataOfLength: can all cause the NSFileHandle instance to block
    without readInBackgroundAndNotify having been activated.

    You can do non-blocking I/O without -readInBackgroundAndNotify.  An
    example that works across Windows, OS X, and Solaris (yes, the code is
    old -- still works, though you will need to rip out some stuff):

    // platform specific low-level I/O
    #ifdef WIN32
    #import <System/windows.h>
    #elif defined(__MACH__)
    #import <bsd/libc.h>
    #elif defined(__svr4__)
    #import <unistd.h>
    #import <sys/filio.h>
    #endif

    // import APIs to NeXT provided frameworks
    #import <Foundation/Foundation.h>

    // import local framework's API
    #import "CFBFoundation.h"

    // import API required througout this file's scope
    #import "NSFileHandle_CFBNonBlockingIO.h"

    @implementation NSFileHandle (CFBNonBlockingIO)
    /*"
      * Adds non-blocking I/O API to NSFileHandle.
    "*/

    - (NSData *)availableDataNonBlocking;
        /*"
          * Returns an NSData object containing all of the currently
    available data.  Does not block if there is no data; returns nil
          * instead.
        "*/
    {
        return [self readDataOfLengthNonBlocking: UINT_MAX];
    }

    - (NSData *)readDataToEndOfFileNonBlocking;
        /*"
          * Returns an NSData object containing all of the currently
    available data.  Does not block if there is no data; returns nil
        * instead.  Cover for #{-availableDataNonBlocking}.
        "*/
    {
        return [self readDataOfLengthNonBlocking: UINT_MAX];
    }

    - (unsigned int) _availableByteCountNonBlocking
    {
    #ifdef WIN32
        HANDLE nativeHandle = [self nativeHandle];
        DWORD lpTotalBytesAvail;
        BOOL peekSuccess;

        peekSuccess = PeekNamedPipe(nativeHandle, NULL, 0L, NULL,
    &lpTotalBytesAvail, NULL);

        if (peekSuccess == NO)
            [NSException raise: NSFileHandleOperationException
                        format: @"PeekNamedPipe() NT Err # %d",
    GetLastError()];

        return lpTotalBytesAvail;
    #elif defined(__MACH__) || defined(__svr4__)
        int numBytes;
        int fd = [self fileDescriptor];

        if(ioctl(fd, FIONREAD, (char *) &numBytes) == -1)
            [NSException raise: NSFileHandleOperationException
                        format: @"ioctl() Err # %d", errno];

        return numBytes;
    #else
    #warning Non-blocking I/O not supported on this platform....
        abort();
        return nil;
    #endif
    }

    - (NSData *)readDataOfLengthNonBlocking:(unsigned int)length;
        /*"
          * Reads up to length bytes of data from the file handle.  If no
    data is available, returns nil.  Does not block.
        "*/
    {
        unsigned int readLength;

        readLength = [self _availableByteCountNonBlocking];
        readLength = (readLength < length) ? readLength : length;

        if (readLength == 0)
            return nil;

        return [self readDataOfLength: readLength];
    }

    @end
    _______________________________________________
    cocoa-dev mailing list | <cocoa-dev...>
    Help/Unsubscribe/Archives: http://www.lists.apple.com/mailman/listinfo/cocoa-dev
    Do not post admin requests to the list. They will be ignored.
  • Bill-

    Thanks for the prompt reply, that looks like some very useful code, too...I
    hadn't yet considered (or desired) porting, but it's good to know that it
    can be done!

    Sadly, yes, I am already doing readInBackgroundAndNotify, at least on the
    asynch version. The synchronous version uses readDataToEndOfFile.

    But remember that my problem is that writeData hangs as part of "launching"
    the command (not the specific [task launch] message, but the set-up to
    making the command do anything by piping it some STDIN to chew on after it's
    been launched), so in the synchronous version, I never get a chance to
    readDataToEndOfFile, it just hangs.

    Similarly, the asynch version posts a notification request, launches, and
    then tells the task's stdOut fileHandleforReading to
    readInBackgroundAndNotify. Then, once the task is launched I write the data
    to stdIn, but since it hangs there, I never get any notifications of data
    coming back.

    I looked for a "writeDataInBackgroundAndNotify" or anything else related to
    asynch writing in the NSFileHandle header file but I didn't find anything
    new.

    Since the previous post, I've tried syncrhonizeFile before writing, that
    didn't work (NSFileHandOpExcp:invalid argument).

    Nor did trying to get NSFileHandle to give me fileHandleWithStandardInput
    and then write to that (bad file descriptor).

    Also note that this only seems to affect a few UNIX commands so far,
    /usr/bin/grep (or egrep), and /usr/bin/uniq. Other commands (specifically:
    tail, wc, sort) work just fine with large data written to STDIN using the
    same code. Doesn't mean it's not still my fault, but it does prove to me
    that, as long as I don't use grep or uniq with more than 32K of data, I've
    got a working implementation.

    I also tried making a method that chunks the write operations into blocks of
    less than 32K each, but that didn't help either...as soon as a cumulative
    32K of data has been written to the file handle, even in smaller chunks, it
    hangs...but only with grep/egrep or uniq (so far, those are the cmds that I
    have tested and found this problem). I tested the new chunking write method
    with other commands (tail, wc, sort) and they work fine.

    I guess I'll start getting the project ready to post...

    Thanks for your help!

    Joe
    <joe...>

    On 1/17/03 9:25 AM, "Bill Bumgarner" <bbum...> wrote:

    > On Thursday, Jan 16, 2003, at 21:11 US/Eastern,
    > <cocoa-dev-request...> wrote:
    >> Has anyone else experienced this problem and solved it or am I a lone
    >> corner
    >> case? I googled a little and have looked at some of the Cocoa sites
    >> (and
    >> this list), but nothing jumped out at me that was addressing or
    >> solving this
    >> issue.
    >>
    >> Is this a problem with my code (I'll post a spike of the app if
    >> needed) or
    >> the frameworks or Darwin, OR?
    >
    > Are you using the 'readInBackgroundAndNotify' mode on NSFileHandle?  If
    > not, that is the cause of your problem as the buffering in NSFileHandle
    > will end up blocking on itself.  This isn't so much a bug as fallout
    > from the way the system works (and one for which an easy workaround
    > exists -- readInBackgroundAndNotify).
    >
    > Also, availableData, readDataToEndOfFile, and-- I think--
    > readDataOfLength: can all cause the NSFileHandle instance to block
    > without readInBackgroundAndNotify having been activated.
    >
    > [...]
    _______________________________________________
    cocoa-dev mailing list | <cocoa-dev...>
    Help/Unsubscribe/Archives: http://www.lists.apple.com/mailman/listinfo/cocoa-dev
    Do not post admin requests to the list. They will be ignored.
  • Using pipes to write to and read from a command often produces deadlock
    unless great care is taken.  The reason that some commands works on the
    cmdline but not in your program is that many commands will line buffer
    output to a tty (interactive), but block buffer their output to a pipe.
      This is why grep and uniq are causing problems.

    In particular, grep will use an 8KB buffer for pipe output.  Until that
    buffer fills, it won't be flushed unless the input pipe is explicitly
    closed.  Thus a single instantiation of a grep process is often not
    useful as a general filtering mechanism.  One solution is to
    instantiate a new process each time filtering is required, send the
    data on the input pipe, close the input pipe.  Even then deadlock may
    result unless the output pipe is polled after each input line is sent.

    Psuedo-terminals (ptys) are another approach that can trick commands
    like grep into using line buffered output.  Feel free to google for
    details and then whether it's worth the effort.  The unix command set
    often tries to be ultra-efficient with its buffering, sometimes to the
    point of crippling its own usefulness.

    On Friday, January 17, 2003, at 12:22  PM, Joe Pezzillo wrote:

    > Bill-
    >
    > Thanks for the prompt reply, that looks like some very useful code,
    > too...I
    > hadn't yet considered (or desired) porting, but it's good to know that
    > it
    > can be done!
    >
    > Sadly, yes, I am already doing readInBackgroundAndNotify, at least on
    > the
    > asynch version. The synchronous version uses readDataToEndOfFile.
    >
    > But remember that my problem is that writeData hangs as part of
    > "launching"
    > the command (not the specific [task launch] message, but the set-up to
    > making the command do anything by piping it some STDIN to chew on
    > after it's
    > been launched), so in the synchronous version, I never get a chance to
    > readDataToEndOfFile, it just hangs.
    >
    > Similarly, the asynch version posts a notification request, launches,
    > and
    > then tells the task's stdOut fileHandleforReading to
    > readInBackgroundAndNotify. Then, once the task is launched I write the
    > data
    > to stdIn, but since it hangs there, I never get any notifications of
    > data
    > coming back.
    >
    > I looked for a "writeDataInBackgroundAndNotify" or anything else
    > related to
    > asynch writing in the NSFileHandle header file but I didn't find
    > anything
    > new.
    >
    > Since the previous post, I've tried syncrhonizeFile before writing,
    > that
    > didn't work (NSFileHandOpExcp:invalid argument).
    >
    > Nor did trying to get NSFileHandle to give me
    > fileHandleWithStandardInput
    > and then write to that (bad file descriptor).
    >
    > Also note that this only seems to affect a few UNIX commands so far,
    > /usr/bin/grep (or egrep), and /usr/bin/uniq. Other commands
    > (specifically:
    > tail, wc, sort) work just fine with large data written to STDIN using
    > the
    > same code. Doesn't mean it's not still my fault, but it does prove to
    > me
    > that, as long as I don't use grep or uniq with more than 32K of data,
    > I've
    > got a working implementation.
    >
    > I also tried making a method that chunks the write operations into
    > blocks of
    > less than 32K each, but that didn't help either...as soon as a
    > cumulative
    > 32K of data has been written to the file handle, even in smaller
    > chunks, it
    > hangs...but only with grep/egrep or uniq (so far, those are the cmds
    > that I
    > have tested and found this problem). I tested the new chunking write
    > method
    > with other commands (tail, wc, sort) and they work fine.
    >
    > I guess I'll start getting the project ready to post...
    >
    > Thanks for your help!
    >
    > Joe
    > <joe...>
    >
    >
    >
    > On 1/17/03 9:25 AM, "Bill Bumgarner" <bbum...> wrote:
    >
    >> On Thursday, Jan 16, 2003, at 21:11 US/Eastern,
    >> <cocoa-dev-request...> wrote:
    >>> Has anyone else experienced this problem and solved it or am I a lone
    >>> corner
    >>> case? I googled a little and have looked at some of the Cocoa sites
    >>> (and
    >>> this list), but nothing jumped out at me that was addressing or
    >>> solving this
    >>> issue.
    >>>
    >>> Is this a problem with my code (I'll post a spike of the app if
    >>> needed) or
    >>> the frameworks or Darwin, OR?
    >>
    >> Are you using the 'readInBackgroundAndNotify' mode on NSFileHandle?
    >> If
    >> not, that is the cause of your problem as the buffering in
    >> NSFileHandle
    >> will end up blocking on itself.  This isn't so much a bug as fallout
    >> from the way the system works (and one for which an easy workaround
    >> exists -- readInBackgroundAndNotify).
    >>
    >> Also, availableData, readDataToEndOfFile, and-- I think--
    >> readDataOfLength: can all cause the NSFileHandle instance to block
    >> without readInBackgroundAndNotify having been activated.
    >>
    >> [...]
    > _______________________________________________
    > cocoa-dev mailing list | <cocoa-dev...>
    > Help/Unsubscribe/Archives:
    > http://www.lists.apple.com/mailman/listinfo/cocoa-dev
    > Do not post admin requests to the list. They will be ignored.
    >
    >
    Daryn
    _______________________________________________
    cocoa-dev mailing list | <cocoa-dev...>
    Help/Unsubscribe/Archives: http://www.lists.apple.com/mailman/listinfo/cocoa-dev
    Do not post admin requests to the list. They will be ignored.
  • Daryn-

    Thanks for your insightful reply.

    Based on your info regarding the pipe buffer size, I tried yet another
    approach to the problem which was to not only chunk the STDIN data into
    smaller chunks, but then to try and writeData + close to flush the input
    pipe after each chunk is written. However, even if I try to get the Pipe and
    its fileHandleForWriting before each write, once I've closed the handle I
    can't get it back, so the first chunk gets written but that's it before an
    exception is raised. Did you have another idea of how I could implement
    around this 8KB buffer so I can send more data to (certain) UNIX commands
    via an NSTask/NSPipe? How would I do the output polling you mention (quoted
    below) other than via readInBackgroundAndNotify or a blocking read loop?
    Especially if the pipe hangs during the write command, I'll never get a
    chance to poll for any output on STDOUT.

    Luckily, I've also discovered another clue, but I haven't quite figured out
    how to use it to my advantage yet. Get this: if the large data that's sent
    to the Task/Pipe is from certain files, even if they are large, it also
    works. I'm suspecting something related to the difference between Mac and
    UNIX line feeds, but I haven't been able to confirm it (I've tried tacking
    on a trailing 0A and 0D0A to the STDIN data if it wasn't already there just
    before writing, but it still hangs, I also tried looking for a trailing
    0/NULL, but it wasn't there on the successful runs). I discovered this while
    building a test app to share regarding this, and wanting to provide the
    option of testing this against a known data source instead of just random
    data that I've also used. If I feed grep "/usr/share/dict/words" (about
    2.4MB) loaded via NSString's stringWithContentsOfFile:  using my Task/Pipe
    handler, it works! If I feed it "/var/log/httpd/access_log" (about 1.5MB),
    loaded the same way, it hangs. My original random data tests are a
    randomized NSString assemblage with some CRLFs in it every 80 chars or so.

    So, that said, I'm also glad to be able to report that I've written a
    workaround, at least for grep, that appears to function (and tests OK).
    Instead of doing the chunking of the STDIN after the task is created, I
    chunk the STDIN before creating the task, and then do as many tasks as it
    takes to handle the entire STDIN in the smaller chunks (instantiating and
    releasing each one as I go). [This may have been what you were implying
    below with the "new process" solution instead of what I tried (above), and
    in any event, I credit you with forcing me think about how I could craft a
    workaround using a new NSTask each time. THANKS!]

    One reason it may only be good for grep, and not for another command like
    uniq, is that I'd need to bridge the "uniq" function across chunk
    boundaries, whereas the current workaround simply chunks on a line boundary
    before the chunk size, without regard for what is in any chunk or another.

    I also agree with your assessment that PTYs don't seem like they'd be worth
    the effort, plus they seem a little kludgy to me when NSTask/NSPipe is the
    recommended method and supposed to work, presumably consistently.

    And, perhaps best of all for the list, I have a project stub of this problem
    (including my workaround and the two chunking write methods and some fully
    working examples of other commands) ready to post to the list. I think it
    might also be yet another useful introduction for others on how to use
    NSTask/NSPipe both synchronously and asynch (of course, I learned it all
    from the existing books and sample code, and I think anyone else can, too).

    However, before I do so, this being my first time posting code or software
    here, what is the policy (if any) on such a post? I'm presuming I'm going to
    include a license to protect myself should this code crash anyone's machine
    or otherwise run afoul. Is that uncouth here? I could restrict it to a
    smaller set of routines and excerpts that I post in e-mail, but that doesn't
    seem as useful.

    Thanks!

    Joe
    <joe...>

    PS - I also finally ran many of the same tests on this same program on the
    10.2.3 machine with the same results.

    On 1/18/03 7:41 PM, "Daryn" <cryx...> wrote:

    > Using pipes to write to and read from a command often produces deadlock
    > unless great care is taken.  The reason that some commands works on the
    > cmdline but not in your program is that many commands will line buffer
    > output to a tty (interactive), but block buffer their output to a pipe.
    > This is why grep and uniq are causing problems.
    >
    > In particular, grep will use an 8KB buffer for pipe output.  Until that
    > buffer fills, it won't be flushed unless the input pipe is explicitly
    > closed.  Thus a single instantiation of a grep process is often not
    > useful as a general filtering mechanism.  One solution is to
    > instantiate a new process each time filtering is required, send the
    > data on the input pipe, close the input pipe.  Even then deadlock may
    > result unless the output pipe is polled after each input line is sent.
    >
    > Psuedo-terminals (ptys) are another approach that can trick commands
    > like grep into using line buffered output.  Feel free to google for
    > details and then whether it's worth the effort.  The unix command set
    > often tries to be ultra-efficient with its buffering, sometimes to the
    > point of crippling its own usefulness.
    >
    > On Friday, January 17, 2003, at 12:22  PM, Joe Pezzillo wrote:
    >
    >> Bill-
    >>
    >> Thanks for the prompt reply, that looks like some very useful code,
    >> too...I
    >> hadn't yet considered (or desired) porting, but it's good to know that
    >> it
    >> can be done!
    >>
    >> Sadly, yes, I am already doing readInBackgroundAndNotify, at least on
    >> the
    >> asynch version. The synchronous version uses readDataToEndOfFile.
    >>
    >
    > ...
    _______________________________________________
    cocoa-dev mailing list | <cocoa-dev...>
    Help/Unsubscribe/Archives: http://www.lists.apple.com/mailman/listinfo/cocoa-dev
    Do not post admin requests to the list. They will be ignored.
  • What I was actually suggesting, but I wasn't entirely clear, was to
    send the input line by line (newline delimited), and to poll for output
    after sending each line.  That should generally avoid deadlock cases.
    For polling, you can use fcntl(2) to set O_NONBLOCK on the file
    descriptor, or use select(2) with a zero second timeout.

    I think CFStreams might be worth investigating too.

    On Monday, January 20, 2003, at 10:16  PM, Joe Pezzillo wrote:
    > Daryn-
    >
    > Thanks for your insightful reply.
    >
    > Based on your info regarding the pipe buffer size, I tried yet another
    > approach to the problem which was to not only chunk the STDIN data into
    > smaller chunks, but then to try and writeData + close to flush the
    > input
    > pipe after each chunk is written. However, even if I try to get the
    > Pipe and
    > its fileHandleForWriting before each write, once I've closed the
    > handle I
    > can't get it back, so the first chunk gets written but that's it
    > before an
    > exception is raised. Did you have another idea of how I could implement
    > around this 8KB buffer so I can send more data to (certain) UNIX
    > commands
    > via an NSTask/NSPipe? How would I do the output polling you mention
    > (quoted
    > below) other than via readInBackgroundAndNotify or a blocking read
    > loop?
    > Especially if the pipe hangs during the write command, I'll never get a
    > chance to poll for any output on STDOUT.

    Daryn
    _______________________________________________
    cocoa-dev mailing list | <cocoa-dev...>
    Help/Unsubscribe/Archives: http://www.lists.apple.com/mailman/listinfo/cocoa-dev
    Do not post admin requests to the list. They will be ignored.
previous month january 2003 next month
MTWTFSS
    1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30 31    
Go to today