NSTask/NSPipe STDIN hangs on large data, sometimes...
-
Greetings!
This being my first post to the list, permit me the obligatory passing
mention that I've been having a blast with Cocoa and am very impressed with
how quickly I've been able to build useful applications. I'd used PowerPlant
in the past, but the combination of the supplied Frameworks with Objective-C
is, IMHO, much, much better (read: easier to use!)...not to mention that
awesome OS under the hood.
Of course, I wouldn't be posting if I didn't have an issue.
After months of learning on my own (many thanks to the great books that are
out there) and trial-and-error with the classes, I'm finally completely
stumped.
I have fully functioning implementations of synchronous and asynch
NSTasks/NSPipes to execute UNIX command line programs, passing and
retrieving data via STDIO. [Many thanks to the Moriarty sample code, the G&M
O'Reilly Book, and A/B/Y's Sams book!!!]
Mostly, this works just fine.
HOWEVER, some commands (or the pipe that's feeding them their STDIN) seem to
choke if I'm sending a large amount of data into the command via STDIN.
For example, on the terminal command line, I can do this:
% cat /var/log/httpd/access_log | grep Jan
And even though the access_log is > 1MB, it works fine, as expected.
HOWEVER, if I try the same thing using NSTask/NSPipe in Cocoa, it hangs
during writeData: to the STDIN file handle, but only if there is a lot of
data being sent to grep.
I have done some research on the problem (in part to be sure this is really
a problem before posting to the list), and I can tell you that some commands
(eg, tail, sort, wc) work great, even with large stdin writes (those three
in particular were tested up to 8MB, and they worked fine!).
In particular, grep (and/or egrep) starts to hang if the STDIN is 40K or
larger and uniq hangs at about 32K. And I mean hang, the app doesn't die or
exit, it just hangs during writeData: to the Task's stdIn pipe's file handle
for writing. The exact same code is used for all the tests described above.
I have tried changing the NSUnbufferedIO environment setting, no help.
What else can I try? What am I missing?
Has anyone else experienced this problem and solved it or am I a lone corner
case? I googled a little and have looked at some of the Cocoa sites (and
this list), but nothing jumped out at me that was addressing or solving this
issue.
Is this a problem with my code (I'll post a spike of the app if needed) or
the frameworks or Darwin, OR?
I'm running 10.1.5 (uname -a below) on a PBG4/550Mhz/GBEther. I have a
non-development Jaguar box I could test this on (actually, it's a
half-sphere, not a box at all!), but it's burning a DVD for the next hour or
two, so I don't really want to muck with it at the moment to test this.
Especially since I want this to work even on 10.1.5 and hopefully since
someone out there has already solved this.
Thanks for your help, and thanks for Cocoa!
Joe Pezzillo
Boulder, CO | Santa Cruz, CA
<joe...>
% uname -a
Darwin silvern 5.5 Darwin Kernel Version 5.5: Thu May 30 14:51:26 PDT 2002;
root:xnu/xnu-201.42.3.obj~1/RELEASE_PPC Power Macintosh powerpc
_______________________________________________
cocoa-dev mailing list | <cocoa-dev...>
Help/Unsubscribe/Archives: http://www.lists.apple.com/mailman/listinfo/cocoa-dev
Do not post admin requests to the list. They will be ignored. -
On Thursday, Jan 16, 2003, at 21:11 US/Eastern,
<cocoa-dev-request...> wrote:
> Has anyone else experienced this problem and solved it or am I a lone
> corner
> case? I googled a little and have looked at some of the Cocoa sites
> (and
> this list), but nothing jumped out at me that was addressing or
> solving this
> issue.
>
> Is this a problem with my code (I'll post a spike of the app if
> needed) or
> the frameworks or Darwin, OR?
Are you using the 'readInBackgroundAndNotify' mode on NSFileHandle? If
not, that is the cause of your problem as the buffering in NSFileHandle
will end up blocking on itself. This isn't so much a bug as fallout
from the way the system works (and one for which an easy workaround
exists -- readInBackgroundAndNotify).
Also, availableData, readDataToEndOfFile, and-- I think--
readDataOfLength: can all cause the NSFileHandle instance to block
without readInBackgroundAndNotify having been activated.
You can do non-blocking I/O without -readInBackgroundAndNotify. An
example that works across Windows, OS X, and Solaris (yes, the code is
old -- still works, though you will need to rip out some stuff):
// platform specific low-level I/O
#ifdef WIN32
#import <System/windows.h>
#elif defined(__MACH__)
#import <bsd/libc.h>
#elif defined(__svr4__)
#import <unistd.h>
#import <sys/filio.h>
#endif
// import APIs to NeXT provided frameworks
#import <Foundation/Foundation.h>
// import local framework's API
#import "CFBFoundation.h"
// import API required througout this file's scope
#import "NSFileHandle_CFBNonBlockingIO.h"
@implementation NSFileHandle (CFBNonBlockingIO)
/*"
* Adds non-blocking I/O API to NSFileHandle.
"*/
- (NSData *)availableDataNonBlocking;
/*"
* Returns an NSData object containing all of the currently
available data. Does not block if there is no data; returns nil
* instead.
"*/
{
return [self readDataOfLengthNonBlocking: UINT_MAX];
}
- (NSData *)readDataToEndOfFileNonBlocking;
/*"
* Returns an NSData object containing all of the currently
available data. Does not block if there is no data; returns nil
* instead. Cover for #{-availableDataNonBlocking}.
"*/
{
return [self readDataOfLengthNonBlocking: UINT_MAX];
}
- (unsigned int) _availableByteCountNonBlocking
{
#ifdef WIN32
HANDLE nativeHandle = [self nativeHandle];
DWORD lpTotalBytesAvail;
BOOL peekSuccess;
peekSuccess = PeekNamedPipe(nativeHandle, NULL, 0L, NULL,
&lpTotalBytesAvail, NULL);
if (peekSuccess == NO)
[NSException raise: NSFileHandleOperationException
format: @"PeekNamedPipe() NT Err # %d",
GetLastError()];
return lpTotalBytesAvail;
#elif defined(__MACH__) || defined(__svr4__)
int numBytes;
int fd = [self fileDescriptor];
if(ioctl(fd, FIONREAD, (char *) &numBytes) == -1)
[NSException raise: NSFileHandleOperationException
format: @"ioctl() Err # %d", errno];
return numBytes;
#else
#warning Non-blocking I/O not supported on this platform....
abort();
return nil;
#endif
}
- (NSData *)readDataOfLengthNonBlocking:(unsigned int)length;
/*"
* Reads up to length bytes of data from the file handle. If no
data is available, returns nil. Does not block.
"*/
{
unsigned int readLength;
readLength = [self _availableByteCountNonBlocking];
readLength = (readLength < length) ? readLength : length;
if (readLength == 0)
return nil;
return [self readDataOfLength: readLength];
}
@end
_______________________________________________
cocoa-dev mailing list | <cocoa-dev...>
Help/Unsubscribe/Archives: http://www.lists.apple.com/mailman/listinfo/cocoa-dev
Do not post admin requests to the list. They will be ignored. -
Bill-
Thanks for the prompt reply, that looks like some very useful code, too...I
hadn't yet considered (or desired) porting, but it's good to know that it
can be done!
Sadly, yes, I am already doing readInBackgroundAndNotify, at least on the
asynch version. The synchronous version uses readDataToEndOfFile.
But remember that my problem is that writeData hangs as part of "launching"
the command (not the specific [task launch] message, but the set-up to
making the command do anything by piping it some STDIN to chew on after it's
been launched), so in the synchronous version, I never get a chance to
readDataToEndOfFile, it just hangs.
Similarly, the asynch version posts a notification request, launches, and
then tells the task's stdOut fileHandleforReading to
readInBackgroundAndNotify. Then, once the task is launched I write the data
to stdIn, but since it hangs there, I never get any notifications of data
coming back.
I looked for a "writeDataInBackgroundAndNotify" or anything else related to
asynch writing in the NSFileHandle header file but I didn't find anything
new.
Since the previous post, I've tried syncrhonizeFile before writing, that
didn't work (NSFileHandOpExcp:invalid argument).
Nor did trying to get NSFileHandle to give me fileHandleWithStandardInput
and then write to that (bad file descriptor).
Also note that this only seems to affect a few UNIX commands so far,
/usr/bin/grep (or egrep), and /usr/bin/uniq. Other commands (specifically:
tail, wc, sort) work just fine with large data written to STDIN using the
same code. Doesn't mean it's not still my fault, but it does prove to me
that, as long as I don't use grep or uniq with more than 32K of data, I've
got a working implementation.
I also tried making a method that chunks the write operations into blocks of
less than 32K each, but that didn't help either...as soon as a cumulative
32K of data has been written to the file handle, even in smaller chunks, it
hangs...but only with grep/egrep or uniq (so far, those are the cmds that I
have tested and found this problem). I tested the new chunking write method
with other commands (tail, wc, sort) and they work fine.
I guess I'll start getting the project ready to post...
Thanks for your help!
Joe
<joe...>
On 1/17/03 9:25 AM, "Bill Bumgarner" <bbum...> wrote:
> On Thursday, Jan 16, 2003, at 21:11 US/Eastern,_______________________________________________
> <cocoa-dev-request...> wrote:
>> Has anyone else experienced this problem and solved it or am I a lone
>> corner
>> case? I googled a little and have looked at some of the Cocoa sites
>> (and
>> this list), but nothing jumped out at me that was addressing or
>> solving this
>> issue.
>>
>> Is this a problem with my code (I'll post a spike of the app if
>> needed) or
>> the frameworks or Darwin, OR?
>
> Are you using the 'readInBackgroundAndNotify' mode on NSFileHandle? If
> not, that is the cause of your problem as the buffering in NSFileHandle
> will end up blocking on itself. This isn't so much a bug as fallout
> from the way the system works (and one for which an easy workaround
> exists -- readInBackgroundAndNotify).
>
> Also, availableData, readDataToEndOfFile, and-- I think--
> readDataOfLength: can all cause the NSFileHandle instance to block
> without readInBackgroundAndNotify having been activated.
>
> [...]
cocoa-dev mailing list | <cocoa-dev...>
Help/Unsubscribe/Archives: http://www.lists.apple.com/mailman/listinfo/cocoa-dev
Do not post admin requests to the list. They will be ignored. -
Using pipes to write to and read from a command often produces deadlock
unless great care is taken. The reason that some commands works on the
cmdline but not in your program is that many commands will line buffer
output to a tty (interactive), but block buffer their output to a pipe.
This is why grep and uniq are causing problems.
In particular, grep will use an 8KB buffer for pipe output. Until that
buffer fills, it won't be flushed unless the input pipe is explicitly
closed. Thus a single instantiation of a grep process is often not
useful as a general filtering mechanism. One solution is to
instantiate a new process each time filtering is required, send the
data on the input pipe, close the input pipe. Even then deadlock may
result unless the output pipe is polled after each input line is sent.
Psuedo-terminals (ptys) are another approach that can trick commands
like grep into using line buffered output. Feel free to google for
details and then whether it's worth the effort. The unix command set
often tries to be ultra-efficient with its buffering, sometimes to the
point of crippling its own usefulness.
On Friday, January 17, 2003, at 12:22 PM, Joe Pezzillo wrote:
> Bill-Daryn
>
> Thanks for the prompt reply, that looks like some very useful code,
> too...I
> hadn't yet considered (or desired) porting, but it's good to know that
> it
> can be done!
>
> Sadly, yes, I am already doing readInBackgroundAndNotify, at least on
> the
> asynch version. The synchronous version uses readDataToEndOfFile.
>
> But remember that my problem is that writeData hangs as part of
> "launching"
> the command (not the specific [task launch] message, but the set-up to
> making the command do anything by piping it some STDIN to chew on
> after it's
> been launched), so in the synchronous version, I never get a chance to
> readDataToEndOfFile, it just hangs.
>
> Similarly, the asynch version posts a notification request, launches,
> and
> then tells the task's stdOut fileHandleforReading to
> readInBackgroundAndNotify. Then, once the task is launched I write the
> data
> to stdIn, but since it hangs there, I never get any notifications of
> data
> coming back.
>
> I looked for a "writeDataInBackgroundAndNotify" or anything else
> related to
> asynch writing in the NSFileHandle header file but I didn't find
> anything
> new.
>
> Since the previous post, I've tried syncrhonizeFile before writing,
> that
> didn't work (NSFileHandOpExcp:invalid argument).
>
> Nor did trying to get NSFileHandle to give me
> fileHandleWithStandardInput
> and then write to that (bad file descriptor).
>
> Also note that this only seems to affect a few UNIX commands so far,
> /usr/bin/grep (or egrep), and /usr/bin/uniq. Other commands
> (specifically:
> tail, wc, sort) work just fine with large data written to STDIN using
> the
> same code. Doesn't mean it's not still my fault, but it does prove to
> me
> that, as long as I don't use grep or uniq with more than 32K of data,
> I've
> got a working implementation.
>
> I also tried making a method that chunks the write operations into
> blocks of
> less than 32K each, but that didn't help either...as soon as a
> cumulative
> 32K of data has been written to the file handle, even in smaller
> chunks, it
> hangs...but only with grep/egrep or uniq (so far, those are the cmds
> that I
> have tested and found this problem). I tested the new chunking write
> method
> with other commands (tail, wc, sort) and they work fine.
>
> I guess I'll start getting the project ready to post...
>
> Thanks for your help!
>
> Joe
> <joe...>
>
>
>
> On 1/17/03 9:25 AM, "Bill Bumgarner" <bbum...> wrote:
>
>> On Thursday, Jan 16, 2003, at 21:11 US/Eastern,
>> <cocoa-dev-request...> wrote:
>>> Has anyone else experienced this problem and solved it or am I a lone
>>> corner
>>> case? I googled a little and have looked at some of the Cocoa sites
>>> (and
>>> this list), but nothing jumped out at me that was addressing or
>>> solving this
>>> issue.
>>>
>>> Is this a problem with my code (I'll post a spike of the app if
>>> needed) or
>>> the frameworks or Darwin, OR?
>>
>> Are you using the 'readInBackgroundAndNotify' mode on NSFileHandle?
>> If
>> not, that is the cause of your problem as the buffering in
>> NSFileHandle
>> will end up blocking on itself. This isn't so much a bug as fallout
>> from the way the system works (and one for which an easy workaround
>> exists -- readInBackgroundAndNotify).
>>
>> Also, availableData, readDataToEndOfFile, and-- I think--
>> readDataOfLength: can all cause the NSFileHandle instance to block
>> without readInBackgroundAndNotify having been activated.
>>
>> [...]
> _______________________________________________
> cocoa-dev mailing list | <cocoa-dev...>
> Help/Unsubscribe/Archives:
> http://www.lists.apple.com/mailman/listinfo/cocoa-dev
> Do not post admin requests to the list. They will be ignored.
>
>
_______________________________________________
cocoa-dev mailing list | <cocoa-dev...>
Help/Unsubscribe/Archives: http://www.lists.apple.com/mailman/listinfo/cocoa-dev
Do not post admin requests to the list. They will be ignored. -
Daryn-
Thanks for your insightful reply.
Based on your info regarding the pipe buffer size, I tried yet another
approach to the problem which was to not only chunk the STDIN data into
smaller chunks, but then to try and writeData + close to flush the input
pipe after each chunk is written. However, even if I try to get the Pipe and
its fileHandleForWriting before each write, once I've closed the handle I
can't get it back, so the first chunk gets written but that's it before an
exception is raised. Did you have another idea of how I could implement
around this 8KB buffer so I can send more data to (certain) UNIX commands
via an NSTask/NSPipe? How would I do the output polling you mention (quoted
below) other than via readInBackgroundAndNotify or a blocking read loop?
Especially if the pipe hangs during the write command, I'll never get a
chance to poll for any output on STDOUT.
Luckily, I've also discovered another clue, but I haven't quite figured out
how to use it to my advantage yet. Get this: if the large data that's sent
to the Task/Pipe is from certain files, even if they are large, it also
works. I'm suspecting something related to the difference between Mac and
UNIX line feeds, but I haven't been able to confirm it (I've tried tacking
on a trailing 0A and 0D0A to the STDIN data if it wasn't already there just
before writing, but it still hangs, I also tried looking for a trailing
0/NULL, but it wasn't there on the successful runs). I discovered this while
building a test app to share regarding this, and wanting to provide the
option of testing this against a known data source instead of just random
data that I've also used. If I feed grep "/usr/share/dict/words" (about
2.4MB) loaded via NSString's stringWithContentsOfFile: using my Task/Pipe
handler, it works! If I feed it "/var/log/httpd/access_log" (about 1.5MB),
loaded the same way, it hangs. My original random data tests are a
randomized NSString assemblage with some CRLFs in it every 80 chars or so.
So, that said, I'm also glad to be able to report that I've written a
workaround, at least for grep, that appears to function (and tests OK).
Instead of doing the chunking of the STDIN after the task is created, I
chunk the STDIN before creating the task, and then do as many tasks as it
takes to handle the entire STDIN in the smaller chunks (instantiating and
releasing each one as I go). [This may have been what you were implying
below with the "new process" solution instead of what I tried (above), and
in any event, I credit you with forcing me think about how I could craft a
workaround using a new NSTask each time. THANKS!]
One reason it may only be good for grep, and not for another command like
uniq, is that I'd need to bridge the "uniq" function across chunk
boundaries, whereas the current workaround simply chunks on a line boundary
before the chunk size, without regard for what is in any chunk or another.
I also agree with your assessment that PTYs don't seem like they'd be worth
the effort, plus they seem a little kludgy to me when NSTask/NSPipe is the
recommended method and supposed to work, presumably consistently.
And, perhaps best of all for the list, I have a project stub of this problem
(including my workaround and the two chunking write methods and some fully
working examples of other commands) ready to post to the list. I think it
might also be yet another useful introduction for others on how to use
NSTask/NSPipe both synchronously and asynch (of course, I learned it all
from the existing books and sample code, and I think anyone else can, too).
However, before I do so, this being my first time posting code or software
here, what is the policy (if any) on such a post? I'm presuming I'm going to
include a license to protect myself should this code crash anyone's machine
or otherwise run afoul. Is that uncouth here? I could restrict it to a
smaller set of routines and excerpts that I post in e-mail, but that doesn't
seem as useful.
Thanks!
Joe
<joe...>
PS - I also finally ran many of the same tests on this same program on the
10.2.3 machine with the same results.
On 1/18/03 7:41 PM, "Daryn" <cryx...> wrote:
> Using pipes to write to and read from a command often produces deadlock_______________________________________________
> unless great care is taken. The reason that some commands works on the
> cmdline but not in your program is that many commands will line buffer
> output to a tty (interactive), but block buffer their output to a pipe.
> This is why grep and uniq are causing problems.
>
> In particular, grep will use an 8KB buffer for pipe output. Until that
> buffer fills, it won't be flushed unless the input pipe is explicitly
> closed. Thus a single instantiation of a grep process is often not
> useful as a general filtering mechanism. One solution is to
> instantiate a new process each time filtering is required, send the
> data on the input pipe, close the input pipe. Even then deadlock may
> result unless the output pipe is polled after each input line is sent.
>
> Psuedo-terminals (ptys) are another approach that can trick commands
> like grep into using line buffered output. Feel free to google for
> details and then whether it's worth the effort. The unix command set
> often tries to be ultra-efficient with its buffering, sometimes to the
> point of crippling its own usefulness.
>
> On Friday, January 17, 2003, at 12:22 PM, Joe Pezzillo wrote:
>
>> Bill-
>>
>> Thanks for the prompt reply, that looks like some very useful code,
>> too...I
>> hadn't yet considered (or desired) porting, but it's good to know that
>> it
>> can be done!
>>
>> Sadly, yes, I am already doing readInBackgroundAndNotify, at least on
>> the
>> asynch version. The synchronous version uses readDataToEndOfFile.
>>
>
> ...
cocoa-dev mailing list | <cocoa-dev...>
Help/Unsubscribe/Archives: http://www.lists.apple.com/mailman/listinfo/cocoa-dev
Do not post admin requests to the list. They will be ignored. -
What I was actually suggesting, but I wasn't entirely clear, was to
send the input line by line (newline delimited), and to poll for output
after sending each line. That should generally avoid deadlock cases.
For polling, you can use fcntl(2) to set O_NONBLOCK on the file
descriptor, or use select(2) with a zero second timeout.
I think CFStreams might be worth investigating too.
On Monday, January 20, 2003, at 10:16 PM, Joe Pezzillo wrote:
> Daryn-
>
> Thanks for your insightful reply.
>
> Based on your info regarding the pipe buffer size, I tried yet another
> approach to the problem which was to not only chunk the STDIN data into
> smaller chunks, but then to try and writeData + close to flush the
> input
> pipe after each chunk is written. However, even if I try to get the
> Pipe and
> its fileHandleForWriting before each write, once I've closed the
> handle I
> can't get it back, so the first chunk gets written but that's it
> before an
> exception is raised. Did you have another idea of how I could implement
> around this 8KB buffer so I can send more data to (certain) UNIX
> commands
> via an NSTask/NSPipe? How would I do the output polling you mention
> (quoted
> below) other than via readInBackgroundAndNotify or a blocking read
> loop?
> Especially if the pipe hangs during the write command, I'll never get a
> chance to poll for any output on STDOUT.
Daryn
_______________________________________________
cocoa-dev mailing list | <cocoa-dev...>
Help/Unsubscribe/Archives: http://www.lists.apple.com/mailman/listinfo/cocoa-dev
Do not post admin requests to the list. They will be ignored.


