NSMutableData capacity

  • Is there an easy way to figure out the capacity of a NSMutableData object - that is the number of bytes that can be written to the data object without causing it to reallocate the internal buffer or invalidating the base pointer to the internal buffer?

    Example:

    NSMutableData* data  =  ..

    [data increaseLengthBy:1234];
    void * p = [data mutableBytes];
    NSUInteger data_capacity = [data capacity];  // An API like this would be fine!
    [data setLength: data_capacity];
    assert(p == [data mutableBytes]);

    That would be a "nice to have" feature. I wouldn't use an approach that requires tricky hacks into the allocator.

    Any ideas?

    Regards
    Andreas
  • On May 23, 2012, at 8:26 AM, Andreas Grosam wrote:

    > Is there an easy way to figure out the capacity of a NSMutableData object - that is the number of bytes that can be written to the data object without causing it to reallocate the internal buffer or invalidating the base pointer to the internal buffer?

    I don't think so. You should always assume that any call to -setLength: will invalidate the -bytes/-mutableBytes pointer.

    —Jens
  • > Is there an easy way to figure out the capacity of a NSMutableData
    > object - that is the number of bytes that can be written to the data
    > object without causing it to reallocate the internal buffer or
    > invalidating the base pointer to the internal buffer?

    Can you not track the capacity yourself? Unless I'm missing something,
    if you use

    +dataWithCapacity:some_capacity

    or

    -initWithCapacity:some_capacity

    to guarantee it has the capacity (but not necessarily allocated yet)
    followed by

    -initWithLength:some_capacity

    or

    -setLength:some_capacity

    to guarantee the allocated length = capacity so far, surely thereafter
    you would know its capacity and could keep track of it based on any
    changes you make to the data / length / capacity yourself?

    I appreciate that if you didn't create the NSMutableData in the first
    place or you don't have complete control over who else puts data in to
    it then you can't track it like this, but your post didn't say if you
    had full control so I'm only guessing.

    --
    Jason Teagle
  • On 23 May 2012, at 1:41 PM, Jason Teagle wrote:

    > Unless I'm missing something, if you use
    ...
    > -initWithCapacity:some_capacity
    >
    > to guarantee it has the capacity (but not necessarily allocated yet) followed by
    >
    > -initWithLength:some_capacity

    DO NOT DO THIS.

    -init... is a one-time operation in Cocoa. The call is privileged to make destructive initializations to the object, to assume that its initial state can be ignored, and even to replace the object entirely. Subsequent -init... calls will in turn assume they can make destructive initializations, and the object will almost certainly (maybe not today, maybe not tomorrow, but soon, and for the rest of its life) break.

    I can't make assumptions about how NSMutableData is implemented, but I'd guess that -initWithLength: would assume that its buffer pointer is uninitialized, and discard the buffer it got from -initWithCapacity: without freeing it. And that's the best case I can imagine.

    — F
  • > DO NOT DO THIS.
    >
    > -init... is a one-time operation in Cocoa. The call is privileged to make
    > destructive initializations to the object, to assume that its initial
    state
    > can be ignored, and even to replace the object entirely. Subsequent
    -init...
    > calls will in turn assume they can make destructive initializations, and the
    > object will almost certainly (maybe not today, maybe not tomorrow, but
    soon,
    > and for the rest of its life) break.

    Fair point, and well noted. I think it's a shame that objects would be
    so cavalier about their existing state (isn't that just sloppy? Makes it
    far too easy to leak memory) but there *is* some logic to the whole
    -initXXX thing being destructive and if that's the way it's done... so
    be it. Glad I learned this now rather than *after* a problem...

    --
    Jason Teagle
  • On May 23, 2012, at 12:15 PM, Jason Teagle wrote:

    > Fair point, and well noted. I think it's a shame that objects would be so cavalier about their existing state (isn't that just sloppy? Makes it far too easy to leak memory)

    No, it's part of the contract of object initialization. Initializer methods may only be called on 'blank' objects returned from +alloc. They're like constructors in C++/Java, just not built into the language*.

    If you want a method to _reset_ an existing object, that's something different. Some classes have this, like -[NSMutableArray removeAllObjects]. It fundamentally doesn't make sense for immutable objects like NSString, however, since it could change their contents.

    I guess you could say it's 'easy' to leak memory by calling -init more than once, but only in the sense that it's 'easy' to leak by calling -retain too many times (without ARC), or to crash by writing past the end of a data's -mutableBytes. Objective-C isn't a totally safe language like Java or Ruby. :)

    —Jens

    * Keep in mind that Objective-C has been since the early '80s and was primarily influenced by Smalltalk-80; the alloc/init dichotomy comes directly from Smalltalk. C++ existed at the time but hadn't become mainstream yet.
  • > Can you not track the capacity yourself? Unless I'm missing something, if you use
    >
    > +dataWithCapacity:some_capacity

    The docs note that this doesn't necessarily pre-allocate the given capacity.  You can test that trivially by asking for a capacity of several gigabytes.

    In a nutshell, there's no way to "lock" the underlying bytes of NSMutableData (or NSMutableString, or anything else like them).  If you mutate them via method calls, you need to reset any interior pointers you may have.  Strictly speaking you should reset your interior pointers every time you invoke any method on them, since they're technically free to re-arrange their internals however they like, even for what are [externally] non-mutating methods.

    If there's a performance concern, using IMP caching.  The cost of the function call to retrieve the bytes pointer is really trivial.
  • On May 23, 2012, at 7:30 PM, Wade Tregaskis wrote:

    > If there's a performance concern, using IMP caching.  The cost of the function call to retrieve the bytes pointer is really trivial.

    Or just manage your own memory block using malloc / realloc / free. It's not that much more work than using an NSData.

    —Jens
  • >> +dataWithCapacity:some_capacity
    >
    > The docs note that this doesn't necessarily pre-allocate the given capacity.

    Which is exactly why I said, in the very same sentence, "to guarantee it
    has the capacity (but not necessarily allocated yet)".

    That's why you then call -setLength:, which writes out the entire length
    you specify, and so *has* to allocate the memory (because it fills any
    extra bytes with zeroes) - otherwise it isn't writing out the full
    length asked for.

    > In a nutshell, there's no way to "lock" the underlying bytes of NSMutableData

    We weren't talking about locking them - the OP was asking about knowing
    when it would need to allocate new memory. If you guarantee the capacity
    and force it to be allocated yourself, then you can track any data you
    put in yourself to know when you're going to need another allocation to
    cope with data to be stored.

    --
    Jason Teagle
  • On 24 May 2012, at 06:02, Jason Teagle wrote:

    >>> +dataWithCapacity:some_capacity
    >>
    >> The docs note that this doesn't necessarily pre-allocate the given capacity.
    >
    > Which is exactly why I said, in the very same sentence, "to guarantee it has the capacity (but not necessarily allocated yet)".

    I'm not sure what you're hoping to achieve from this.  NSMutableData's capacity is not a hard and fast upper limit to how much data can be put in there; it's just a hint to the API for roughly how much space should be found for the data.

    Bob
  • > I'm not sure what you're hoping to achieve from this.  NSMutableData's capacity is
    > not a hard and fast upper limit to how much data can be put in there; it's just a hint
    > to the API for roughly how much space should be found for the data.

    (The point - with setLength: - was to pre-allocate the memory so that
    the OP would get that out of the way at a known point in time, and would
    know when the next allocation would be required so it wouldn't suddenly
    allocate memory at what might be a crucial point in time when speed and
    smoothness were of the essence. If they track the size of data being set
    in and do the capacity / allocations themselves, they *should* (in
    theory) be able to guarantee no unexpected memory juggling.)

    --
    Jason Teagle
  • Thank you for your replies. I appreciate your comments.

    Maybe you are interested in the background, and way I'm asking this. In fact, as some of you suspected, the reason for asking has to do with performance.

    The NSMutable data shall serve as an internal buffer of some "Streambuffer" class  which has roughly this interface:

    @interface MutableDataStreambuffer :  NSObject <StreambufferProtocol>
    - (id) init;
    - (id) initWithData:(NSData*)data;

    // StreambufferProtocol
    - (void) writeBytes:(const void*)buffer length:(NSUInteger)length;

    // MutableDataStreambuffer methods
    - (NSData*) data;
    @end

    The method data returns a *copy* of the internal buffer.
    -initWithData initializes the internal buffer through copying the content of param data to the internal buffer.

    Strictly, the internal buffer does not need to be a NSMutableData. But sometimes, in order to avoid copying, in private implementations, I would prefer to have a NSMutableData object, thus:

    @interface MutableDataStreambuffer (Private)
    - (NSMutableData*) buffer;
    @end

    When using a NSMutableData as the internal buffer, and in order to maintain good performance, I use interior pointers to the contents of the object, namely _begin, _p and _endcap;

    In order to rely on the validity of the pointers, I use
    -increaseLengthBy:  and -setLength:
    and manage the *actual* size of the buffer manually. This is slightly suboptimal, though.

    While I experienced, that this approach is much faster than appending bytes using NSMutableData methods, the above implementation could be still better when having something like a -capacity  method for a NSMutableData object which is the size of the raw memory buffer of the underlaying allocator, and not some potential "max size" or "hint".

    I understand, that capacity may not make sense in certain concrete implementations of NSMutableData, though.

    Regards, and thanks for your tips and comments!

    Andreas
  • On May 24, 2012, at 12:43 AM, Andreas Grosam wrote:

    > While I experienced, that this approach is much faster than appending bytes using NSMutableData methods, the above implementation could be still better when having something like a -capacity  method for a NSMutableData object which is the size of the raw memory buffer of the underlaying allocator, and not some potential "max size" or "hint".

    So don't worry about the internal implementation of the capacity of the data object; just set its .length explicitly to what you want, and then keep track yourself of how many bytes of it you're actually using.

    So if you want a capacity of 1MB, call data.length = 1024*1024. Keep your own 'actualLength' property, or a pointer to the end of the mutableBytes, or whatever. When you run out of that, increase data.length some more.

    Or as I said before, just keep your own malloc block instead of an NSData. Because really, NSMutableData is just a very thin OOP wrapper around malloc/realloc/free.

    —Jens
previous month may 2012 next month
MTWTFSS
  1 2 3 4 5 6
7 8 9 10 11 12 13
14 15 16 17 18 19 20
21 22 23 24 25 26 27
28 29 30 31      
Go to today