Imaginary File Path

  • I'm working with a C++ command line tool - written for a PC platform
    but compiled for the Mac - that takes a file path as one of its
    parameters.

    I'm creating data that gets processed by this tool. Right now, I'm
    saving all the data to a temporary file and feeding the tool with the
    path to the temporary file.

    What I'd like to do instead, is to find a way to present my data as
    if it were a file, without actually saving the data to disk. If I
    have the data in memory, I'd like to be able to persuade the command
    line tool that the file is "really" on the disk, by giving it an
    "imaginary" file path.

    Ideally, I could even stream the data to the tool, as long as the
    tool thought it was working with a file.

    What I'm looking for is something that's the complement of an
    NSStream object. I can create an NSStream with either an NSData
    pointer, or an NSString pointer representing a file path. I'd like to
    take a stream, or a pointer to some data, and represent it as a path
    to a file.

    I can't tell if this idea is insane or if the solution is obvious.
    Would anyone like to set me straight?
  • John Goodman <mailto:<johngoodman...> wrote (Friday,
    December 7, 2007 7:35 AM -0500):
    > What I'd like to do instead, is to find a way to present my data as if
    > it were a file, without actually saving the data to disk. If I have
    > the data in memory, I'd like to be able to persuade the command line
    > tool that the file is "really" on the disk, by giving it an
    > "imaginary" file path.

    You'll have to involve the file system. By definition, a "path"
    is a name in a filesystem. To access the data associated with
    that path you'll have to create an entity in the filesystem or
    the filesystem simply won't know what your C++ tool is talking about.

    > Ideally, I could even stream the data to the tool, as long as the tool
    > thought it was working with a file.

    The only simple choices that I can think of are a pipe or a
    socket, both of which would require that the C++ tool read the
    data in the file sequentially. To use a pipe your C++ would have
    to read the data from stdin, for a socket your first process
    would create a UNIX domain socket using bind (the socket will
    appear in the filesystem as a socket file), then pass the path
    of the socket to the tool.

    If the tool needs to randomly access the data in the file,
    things will get considerably more complicated. You might
    experiment with mmap(), although I doubt this will help much
    because ultimately the data will still end up on the disk.

    The only other solution that leaps to mind is something like
    FUSE where you could create a virtual filesystem that presented
    your data as a file. But that's like renting a crane to take out
    the garbage.

    Or, just find yourself a RAM disk driver and write the data to a
    file on the RAM disk.

    --
    James Bucanek
  • There's a built-in RAM disk you can leverage for this sort of thing,
    actually. I've used it in internal tools. It should work for you. I
    wouldn't rely on it in a shipping app, though, because if one of your
    tools dies somewhere along the way, you've left a big honking RAM disk
    open on the user's system, chewing up resources. Also, it involves
    shelling out to a bunch of external processes and parsing their output,
    which always gives me the heebie-jeebies for a shipping app.

    Other than that, like James says, pipes are good, and shared memory
    would be another way to go.

    When I was creating RAM disks, I was actually working in Perl, but
    here's how you do it. This allocates an ~2GB RAM disk (adjust the
    '4600000' value to change the disk size):

            File::Path::mkpath( "$tempPath/RAMDisk" );
            $ramDiskA = `hdiutil attach ram://4600000 -nomount`;
            $ramDiskA =~ s|\s||g; # for some reason hdiutil returns a bunch
    of spaces
            system( "newfs_hfs \"$ramDiskA\"" );
            system( "mount_hfs \"$ramDiskA\" \"$tempPath/RAMDisk\"" );

    and this frees it:

        $ramDiskA =~ s|^/dev/||;
        `umount \"$tempPath/RAMDisk"`;
        `hdiutil eject \"$ramDiskA\"`;

    As I was saying before, neglecting to free the RAM disk is the
    equivalent of spawning a new process on the user's computer, mallocing
    2GB of RAM with it, and then never letting that process quit.

    James Bucanek wrote:
    > John Goodman <mailto:<johngoodman...> wrote (Friday, December 7,
    > 2007 7:35 AM -0500):
    >> What I'd like to do instead, is to find a way to present my data as if
    >> it were a file, without actually saving the data to disk. If I have
    >> the data in memory, I'd like to be able to persuade the command line
    >> tool that the file is "really" on the disk, by giving it an
    >> "imaginary" file path.
    >
    > You'll have to involve the file system. By definition, a "path" is a
    > name in a filesystem. To access the data associated with that path
    > you'll have to create an entity in the filesystem or the filesystem
    > simply won't know what your C++ tool is talking about.
    >
    >> Ideally, I could even stream the data to the tool, as long as the tool
    >> thought it was working with a file.
    >
    > The only simple choices that I can think of are a pipe or a socket,
    > both of which would require that the C++ tool read the data in the
    > file sequentially. To use a pipe your C++ would have to read the data
    > from stdin, for a socket your first process would create a UNIX domain
    > socket using bind (the socket will appear in the filesystem as a
    > socket file), then pass the path of the socket to the tool.
    >
    > If the tool needs to randomly access the data in the file, things will
    > get considerably more complicated. You might experiment with mmap(),
    > although I doubt this will help much because ultimately the data will
    > still end up on the disk.
    >
    > The only other solution that leaps to mind is something like FUSE
    > where you could create a virtual filesystem that presented your data
    > as a file. But that's like renting a crane to take out the garbage.
    >
    > Or, just find yourself a RAM disk driver and write the data to a file
    > on the RAM disk.
    >
  • > To use a pipe your C++ would have to read the data from stdin,

    I'm not the author of the C++ tool; I'm just compiling code I get
    from someone else's project. The tool unfortunately doesn't use
    stdin, or any stream, for the input data.

    > for a socket your first process would create a UNIX domain socket
    > using bind (the socket will appear in the filesystem as a socket
    > file), then pass the path of the socket to the tool.

    This sounds promising. My exposure to UNIX is limited to what I've
    seen funneled through Mac OS X, so there's a lot I don't know. But
    the description of sockets and bind that I'm seeing makes me think I
    can learn enough to make this work. ("A socket is created with no
    name. A remote process has no way to refer to a socket until an
    address is bound to the socket... The bind() call enables a process
    to specify the local address of the socket.")

    > If the tool needs to randomly access the data in the file, things
    > will get considerably more complicated.

    I don't believe this is the case, but you're making it clear to me
    that I'd better check. If the tool *does* need random access, then
    I'm going to punt on my stream idea.

    Thank you for the suggestions and for pointing me down a new (non-
    imaginary) path to explore.

    On Dec 7, 2007, at 11:19 AM, James Bucanek wrote:

    > John Goodman <mailto:<johngoodman...> wrote (Friday, December
    > 7, 2007 7:35 AM -0500):
    >> What I'd like to do instead, is to find a way to present my data
    >> as if
    >> it were a file, without actually saving the data to disk. If I have
    >> the data in memory, I'd like to be able to persuade the command line
    >> tool that the file is "really" on the disk, by giving it an
    >> "imaginary" file path.
    >
    > You'll have to involve the file system. By definition, a "path" is
    > a name in a filesystem. To access the data associated with that
    > path you'll have to create an entity in the filesystem or the
    > filesystem simply won't know what your C++ tool is talking about.
    >
    >> Ideally, I could even stream the data to the tool, as long as the
    >> tool
    >> thought it was working with a file.
    >
    > The only simple choices that I can think of are a pipe or a socket,
    > both of which would require that the C++ tool read the data in the
    > file sequentially. To use a pipe your C++ would have to read the
    > data from stdin, for a socket your first process would create a
    > UNIX domain socket using bind (the socket will appear in the
    > filesystem as a socket file), then pass the path of the socket to
    > the tool.
    >
    > If the tool needs to randomly access the data in the file, things
    > will get considerably more complicated. You might experiment with
    > mmap(), although I doubt this will help much because ultimately the
    > data will still end up on the disk.
    >
    > The only other solution that leaps to mind is something like FUSE
    > where you could create a virtual filesystem that presented your
    > data as a file. But that's like renting a crane to take out the
    > garbage.
    >
    > Or, just find yourself a RAM disk driver and write the data to a
    > file on the RAM disk.
    >
    > --
    > James Bucanek
    >
  • John Goodman <mailto:<johngoodman...> wrote (Friday,
    December 7, 2007 10:02 AM -0500):

    >> To use a pipe your C++ would have to read the data from stdin,
    >
    > I'm not the author of the C++ tool; I'm just compiling code I
    > get from someone else's project. The tool unfortunately doesn't
    > use stdin, or any stream, for the input data.

    Check out the source of the tool. Virtually all well-behaved
    UNIX command-line tools that operate on the contents of a file
    use the convention: If the file is supplied as an argument, use
    the path specified. If either no path is supplied as an
    argument, or the argument is simply "-", read the contents from stdin.

    If the tool reads the file sequentially, then there's a better
    than 90% chance that the tool will read from stdin if you omit
    the filename in the argument list.

    >> for a socket your first process would create a UNIX domain socket
    >> using bind (the socket will appear in the filesystem as a socket
    >> file), then pass the path of the socket to the tool.
    >
    > This sounds promising. My exposure to UNIX is limited to what I've
    > seen funneled through Mac OS X, so there's a lot I don't know. But the
    > description of sockets and bind that I'm seeing makes me think I can
    > learn enough to make this work. ("A socket is created with no name. A
    > remote process has no way to refer to a socket until an address is
    > bound to the socket... The bind() call enables a process to specify
    > the local address of the socket.")

    Using sockets would probably require source code changes to your
    tool, so this probably isn't the solution you're looking for.

    A UNIX domain socket works just like a TCP socket, except
    instead of defining a port number you use the AF_UNIX socket
    type and create a structure that specifies a path in the
    filesystem. When the bind() call is finished, a new socket file
    will appear (look in /var/run for examples). These take up no
    filesystem space; they just define a named conduit by which two
    processes can send packets of data to one another. Again, these
    are streams so it's strictly serial access.

    --
    James Bucanek
  • > Virtually all well-behaved UNIX command-line tools that operate on
    > the contents of a file use the convention: If the file is supplied
    > as an argument, use the path specified. If either no path is
    > supplied as an argument, or the argument is simply "-", read the
    > contents from stdin.

    I'll have to see how well behaved it is. It will be nice if I can
    work with stdin instead of a file path. I'm already using NSTask's
    setStandardOutput to get data *out* of the tool and into NSPipe's
    NSFileHandle. (The tool outputs only through printfs.)

    I'll see what I can learn about the tool's handling of its argument
    list. Maybe there's more than one way in.

    Thanks again for all the food for thought.

    John

    On Dec 7, 2007, at 12:30 PM, James Bucanek wrote:

    > John Goodman <mailto:<johngoodman...> wrote (Friday, December
    > 7, 2007 10:02 AM -0500):
    >
    >>> To use a pipe your C++ would have to read the data from stdin,
    >>
    >> I'm not the author of the C++ tool; I'm just compiling code I get
    >> from someone else's project. The tool unfortunately doesn't use
    >> stdin, or any stream, for the input data.
    >
    > Check out the source of the tool. Virtually all well-behaved UNIX
    > command-line tools that operate on the contents of a file use the
    > convention: If the file is supplied as an argument, use the path
    > specified. If either no path is supplied as an argument, or the
    > argument is simply "-", read the contents from stdin.
    >
    > If the tool reads the file sequentially, then there's a better than
    > 90% chance that the tool will read from stdin if you omit the
    > filename in the argument list.
    >
    >>> for a socket your first process would create a UNIX domain socket
    >>> using bind (the socket will appear in the filesystem as a socket
    >>> file), then pass the path of the socket to the tool.
    >>
    >> This sounds promising. My exposure to UNIX is limited to what I've
    >> seen funneled through Mac OS X, so there's a lot I don't know. But
    >> the
    >> description of sockets and bind that I'm seeing makes me think I can
    >> learn enough to make this work. ("A socket is created with no name. A
    >> remote process has no way to refer to a socket until an address is
    >> bound to the socket... The bind() call enables a process to specify
    >> the local address of the socket.")
    >
    > Using sockets would probably require source code changes to your
    > tool, so this probably isn't the solution you're looking for.
    >
    > A UNIX domain socket works just like a TCP socket, except instead
    > of defining a port number you use the AF_UNIX socket type and
    > create a structure that specifies a path in the filesystem. When
    > the bind() call is finished, a new socket file will appear (look
    > in /var/run for examples). These take up no filesystem space; they
    > just define a named conduit by which two processes can send packets
    > of data to one another. Again, these are streams so it's strictly
    > serial access.
    >
    > --
    > James Bucanek
    >
previous month december 2007 next month
MTWTFSS
          1 2
3 4 5 6 7 8 9
10 11 12 13 14 15 16
17 18 19 20 21 22 23
24 25 26 27 28 29 30
31            
Go to today