dropping incoming DO error

  • Anyone know what causes this?  Anyone know how I can figure out where
    it's coming from?

    *** -[NSMachPort handlePortMessage:]: dropping incoming DO message
    because the connection or ports are invalid

    -steve
  • On Nov 1, 2007, at 8:45 PM, Steve Gehrman wrote:
    > Anyone know what causes this?  Anyone know how I can figure out
    > where it's coming from?
    >
    > *** -[NSMachPort handlePortMessage:]: dropping incoming DO message
    > because the connection or ports are invalid
    >
    > -steve

    I can take a stab at this ...

    Once one side of a DO connection sends a message to the other side, a
    race begins.  Or, three races.  While the message is "in-
    flight" (stored in the kernel for this NSMachPort case), the process
    which is going to receive it might decide, for whatever reason, to
    shutdown the connection.  Thus, when the message is pulled from the
    kernel, the connection is already invalid and data structures torn
    down, and there's nothing to be done with the message.

    The second/third races are more interesting and likely.  While the
    message is in-flight, the sending process might decide to invalidate
    the connection, or terminate.  Either of those will invalidate the
    sender's Mach ports in the message.  Plus, port death notification
    messages get generated by the kernel and sent to the process with the
    other end of the connection (since it is interested).

    When the receiving process gets those port death notifications it is
    going to act on them by invalidating the connection and tearing down
    data structures.  Then the message might be pulled from the kernel.
    So this is kind of a variation on the first race but still
    interesting in its own right.

    Or, the message that was sent might be pulled from the kernel first
    before those port death notifications, but still all is not well.
    When a port goes invalid, the kernel scribbles MACH_PORT_DEAD over
    the previous port identifier for that port in all messages still
    waiting in the kernel.  When the receiving process pulls the message
    from the kernel, it sees MACH_PORT_DEAD rather than the sender's port
    identifier.  Well, the sender's port is part of the information that
    the lowest layers use to figure out which connection the message is
    destined for, since of course a process can have many connections to
    other processes.  The received message lacks the information needed
    to do the mapping, so it must be dropped (and likely, connection
    invalidation is imminent in any case).  And of course one could be
    using multiple threads and more than one thread could be doing some
    of these steps.

    All these races can potentially apply to all versions of Mac OS X, if
    the timing is "right".

    So one can imagine how this might occur pretty easily in the
    termination case.  A sender sends, say, a oneway DO message to
    another process and immediately quits.  Maybe it is sending a
    "goodbye" message.  There is then a race between the receipt of the
    in-flight message on the other side and the death of those ports in
    the sender; the sender is inadvertently (probably!) screwing it's own
    last message by terminating "too quickly".  And the insidious thing
    is that this can work sometimes, and not others, depending on the
    timing.  Or it can work in one OS release, and not another, because,
    say, performance of the something during the shutdown improves (or
    the kernel gets faster) and the sender now goes away  a little
    quicker than it used to.  Or it might not happen on a slower machine
    but does on a faster (or multicore) machine.  Or whatever.

    As for figuring out where it is coming from ...
    So the thing to look for I suppose is where you have processes
    terminating, or manual connection invalidations going on.  Do a
    thought experiment where you imagine messages that are sent back and
    forth remain in-flight for an hour, and think about whether you're
    doing something that might cause this race to crop up.  (Assuming you
    saw this message with one of your own apps.)

    Chris Kane
    Cocoa Frameworks, Apple
previous month november 2007 next month
MTWTFSS
      1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30    
Go to today