How best to archive in CSV format
-
Hi
I'm looking for advice on the best way to handle archiving my documents
in csv (comma separated variable) format.
I have written a small Coca application to be used for data capture.
The program accepts text and numbers typed into an NSTableView and
stores them in an array of objects. Following the document-based
application examples in Aaaron Hillegass' excellent book my app is able
to archive the array to disk in a coded format. I have implemented
encodeWithCoder and initWithCoder methods on the class that is stored
in the array. My MyDocument class has dataRepresentationOfType: and
loadDtaRepresentation:ofType: methods which use NSkeyedArchiver to save
or retrieve documents.
That all works very nicely (thanks Aaron) but in order to be useful I
need to change the document storage format to simple csv. Can I do
this through the encodeWithCoder initWithCoder mechanism? This seems
to me to be the logical place to write routines for transforming
objects to and from a disk file but maybe I'm misunderstanding the
function of NSCoder. Is it sensible to write a coder that converts
objects to UTF strings?
What I have at the moment is an extra method inside my object that
returns a string representation of the object by appending a
description of each item to an NSMutableString. Then I have replaced
the NSKeyedArchiver part of dataRepresentationOfType: with a loop that
retrieves this string from each item in the array and concatenates it
into a bigger string which is then returned with
return [string dataUsingEncoding: NSUTF8StringEncoding];
The encodeWithCoder method is not being used.
This works, but it looks clumsy. I'm having doubts that it will scale
as the string will grow very large. The init method for
NSMutableString requires that I specify a capacity which it says is a
"hint" of how much memory to allocate. I don't know how binding (no,
not the Cooca sort of binding) a "hint" is. Am I free to exceed the
capacity, or do I have to anticipate the maximum string size? These
doubts make me think I'm going about this all wrong. Should I be using
the encode and decode system to convert to and from csvs format?
regards
Denis
Denis Stanton
Orcon Internet Limited
(09) 480 9299
http://www.orcon.net.nz -
On May 17, 2005, at 4:51 AM, Denis Stanton wrote:
> Hi
>
> I'm looking for advice on the best way to handle archiving my
> documents in csv (comma separated variable) format.
>
> These doubts make me think I'm going about this all wrong. Should
> I be using the encode and decode system to convert to and from csvs
> format?
Look at the references document to NSString and NSArray since they
have some methods for working with delimited data. You will find a
couple of methods for working with making and parsing strings from
arrays of strings and setting the delimiter to whatever you want,
(NSString)componentsSeparatedByString: and (NSArray)
componentsJoinedByString:. Also, you will find read and write
methods for creating files.
Brian -
On May 16, 2005, at 9:37 PM, Brian Smith wrote:
>
> On May 17, 2005, at 4:51 AM, Denis Stanton wrote:
>
>
>> I'm looking for advice on the best way to handle archiving my
>> documents in csv (comma separated variable) format.
>>
>> These doubts make me think I'm going about this all wrong.
>> Should I be using the encode and decode system to convert to and
>> from csvs format?
>>
>
> Look at the references document to NSString and NSArray since they
> have some methods for working with delimited data. You will find a
> couple of methods for working with making and parsing strings from
> arrays of strings and setting the delimiter to whatever you want,
> (NSString)componentsSeparatedByString: and (NSArray)
> componentsJoinedByString:. Also, you will find read and write
> methods for creating files.
That sounds like a really bad idea because you need to deal with
quoting and such. You'll have to code this up by hand unless you can
find existing C or Objective-C code to do it. NSStringScanner might
help.
One alternative is to look at PyObjC, because Python ships with a csv
module.. though it's not optimal since current versions have a
limitation such that it only knows how to deal with bytestrings, not
unicode, so you have to encode everything into utf-8 before putting
it into the csv and decode it from utf-8 after getting it out.
In other words, it's time to decide whether you REALLY need CSV :)
-bob -
On May 17, 2005, at 1:43 PM, Bob Ippolito wrote:
>
> On May 16, 2005, at 9:37 PM, Brian Smith wrote:
>
>>
>> On May 17, 2005, at 4:51 AM, Denis Stanton wrote:
>>
>>
>>> I'm looking for advice on the best way to handle archiving my
>>> documents in csv (comma separated variable) format.
>>>
>>> These doubts make me think I'm going about this all wrong. Should
>>> I be using the encode and decode system to convert to and from csvs
>>> format?
>>>
>>
>> Look at the references document to NSString and NSArray since they
>> have some methods for working with delimited data. You will find a
>> couple of methods for working with making and parsing strings from
>> arrays of strings and setting the delimiter to whatever you want,
>> (NSString)componentsSeparatedByString: and
>> (NSArray)componentsJoinedByString:. Also, you will find read and
>> write methods for creating files.
>
> That sounds like a really bad idea because you need to deal with
> quoting and such. You'll have to code this up by hand unless you can
> find existing C or Objective-C code to do it. NSStringScanner might
> help.
>
> One alternative is to look at PyObjC, because Python ships with a csv
> module.. though it's not optimal since current versions have a
> limitation such that it only knows how to deal with bytestrings, not
> unicode, so you have to encode everything into utf-8 before putting it
> into the csv and decode it from utf-8 after getting it out.
>
> In other words, it's time to decide whether you REALLY need CSV :)
>
> -bob
Thanks Brian and Bob
I have a reasonable idea of how to do the string handling parts with
componentsSeparatedByString and componentsJoinedByString.
That's not the part that worries me. My question whether I should be
writing this CSV conversion stuff inside the standard methods
encodeWithCoder and initWithCoder, and if so how. It seems the Cocoa
architecture has a well-thoughtout mechanism for archiving and I should
try and work within it. My problem is the example I have produces a
binary coded disk file and I need csv text. I want to make the Cocoa
archive mechanism work with csv.
I know that there are traps in this as one of my data columns could
contain commas, so I need to worry about quotes, but for the present
task I do have to conform to csv because I am going to propose this
Cocoa application as a replacement for an existing web-browser based
data entry program and it's important to show that it can simply
replace the older program with out requiring anybody else to change.
Denis
Denis Stanton
Orcon Internet Limited
(09) 480 9299
http://www.orcon.net.nz -
On May 17, 2005, at 9:43 AM, Bob Ippolito wrote:
> One alternative is to look at PyObjC, because Python ships with a
> csv module.. though it's not optimal since current versions have a
> limitation such that it only knows how to deal with bytestrings,
> not unicode, so you have to encode everything into utf-8 before
> putting it into the csv and decode it from utf-8 after getting it out.
I've used python to read a csv file and it can't handle mac line
endings too, which the files I need to read have. So, with PyObjC, I
used NSString's componentsSeparatedByString: method to read the file,
so I have found this to be useful, but obviously you have to
experiment given on csv files you have. I did have to strip some
quote marks from the ends of the array of strings, but I was able to
still do it easily with NSString and NSArray methods.
Brian -
On May 16, 2005, at 10:12 PM, Brian Smith wrote:
>
> On May 17, 2005, at 9:43 AM, Bob Ippolito wrote:
>
>
>> One alternative is to look at PyObjC, because Python ships with a
>> csv module.. though it's not optimal since current versions have a
>> limitation such that it only knows how to deal with bytestrings,
>> not unicode, so you have to encode everything into utf-8 before
>> putting it into the csv and decode it from utf-8 after getting it
>> out.
>
> I've used python to read a csv file and it can't handle mac line
> endings too, which the files I need to read have. So, with PyObjC,
> I used NSString's componentsSeparatedByString: method to read the
> file, so I have found this to be useful, but obviously you have to
> experiment given on csv files you have. I did have to strip some
> quote marks from the ends of the array of strings, but I was able
> to still do it easily with NSString and NSArray methods.
Actually it can read Mac line endings (bare '\r') just fine if you
open the file with universal newlines (the 'U' mode). I do this all
the time.
-bob -
On 17 maj 2005, at 04.08, Denis Stanton wrote:
> That's not the part that worries me. My question whether I should
> be writing this CSV conversion stuff inside the standard methods
> encodeWithCoder and initWithCoder, and if so how. It seems the
> Cocoa architecture has a well-thoughtout mechanism for archiving
> and I should try and work within it. My problem is the example I
> have produces a binary coded disk file and I need csv text. I want
> to make the Cocoa archive mechanism work with csv.
Why not do it in the dataRepresentationOfType: and
loadDataRepresentation:ofType: methods of your document subclass?
NSKeyedArchiver provides a way to store / restore an object graph,
and it provides it's own storage format. Since neither of that seems
to be something that's really useful for you, I'd suggest that you
avoid using it for this particular purpose.
j o a r -
Hello...
No, you probably wouldn't want to implement this in terms of an
NSCoder subclass or pattern. It might be possible, but it would make
things significantly more complicated than they need to be. NSCoder
is designed from a perspective of allowing each object to determine
the best way to encode itself, and allows an object to store it's
data in an arbitrary manner. If you attempted to implement CSV using
NSCoder, your goal would be the exact opposite: you would need to
make each object conform to a particular way of being encoded.
A more applicable pattern would be to add your own methods to the
relevant NS-base classes using categories and create your own
separate archiving and unarchiving process, similar to the methods
NSArray and NSDictionary objects have to read and write themselves in
XML.
Writing code to work for a variety of abstract cases is more
difficult than writing code to handle a specific case (and it's not
trivial to do it right). To start with, the best option is probably
just to implement the code within your document subclass, based on
the particular model you are using.
Essentially, in your document class implementation of
dataRepresentationOfType: and loadDataRepresentation:ofType:, instead
of using a flavor of NSArchiver, you would use your own code to read
and write CSV. Since you know the model you are using to feed the
tableview, you only need to worry about the objects that your model
uses or allows and implement the code to read and write your model as
the CSV format you've been provided requires.
You'll still need to work out all the "interesting" details of
reading and writing CSV, but you don't need to try to make it fit
within the existing NSCoder archiving mechanism.
Hope that helps,
Louis
>
> That's not the part that worries me. My question whether I should
> be writing this CSV conversion stuff inside the standard methods
> encodeWithCoder and initWithCoder, and if so how. It seems the
> Cocoa architecture has a well-thoughtout mechanism for archiving and
> I should try and work within it. My problem is the example I have
> produces a binary coded disk file and I need csv text. I want to
> make the Cocoa archive mechanism work with csv.
>
> I know that there are traps in this as one of my data columns could
> contain commas, so I need to worry about quotes, but for the present
> task I do have to conform to csv because I am going to propose this
> Cocoa application as a replacement for an existing web-browser based
> data entry program and it's important to show that it can simply
> replace the older program with out requiring anybody else to change.
>
> Denis
>
-
Hi Louis
Thanks a lot. This really answers my question. I have already gone
down this path, modifying dataRepresentationOfType: and
loadDataRepresentation:ofType:.
I was thinking that I could do something nicer using NSCoder
subclasses, but the problem is NSArchiver wants to put out a lot of
additional information about the object graph and all I want in the
file is the simple text contents of the end nodes of that graph. I'm
not going to be writing a generalised csv export tool at this stage as
this is really just a proof-of-concept program at this stage. The
concept being "that task that the accounts department hates would be so
much easier if you gave them this little program - and a Mac to run it
on"
Thanks for a really detailed answer, right on target.
Denis
On May 17, 2005, at 6:49 PM, Louis C. Sacha wrote:
> Hello...Denis Stanton
>
> No, you probably wouldn't want to implement this in terms of an
> NSCoder subclass or pattern. It might be possible, but it would make
> things significantly more complicated than they need to be. NSCoder is
> designed from a perspective of allowing each object to determine the
> best way to encode itself, and allows an object to store it's data in
> an arbitrary manner. If you attempted to implement CSV using NSCoder,
> your goal would be the exact opposite: you would need to make each
> object conform to a particular way of being encoded.
>
>
> A more applicable pattern would be to add your own methods to the
> relevant NS-base classes using categories and create your own separate
> archiving and unarchiving process, similar to the methods NSArray and
> NSDictionary objects have to read and write themselves in XML.
>
> Writing code to work for a variety of abstract cases is more difficult
> than writing code to handle a specific case (and it's not trivial to
> do it right). To start with, the best option is probably just to
> implement the code within your document subclass, based on the
> particular model you are using.
>
>
> Essentially, in your document class implementation of
> dataRepresentationOfType: and loadDataRepresentation:ofType:, instead
> of using a flavor of NSArchiver, you would use your own code to read
> and write CSV. Since you know the model you are using to feed the
> tableview, you only need to worry about the objects that your model
> uses or allows and implement the code to read and write your model as
> the CSV format you've been provided requires.
>
> You'll still need to work out all the "interesting" details of reading
> and writing CSV, but you don't need to try to make it fit within the
> existing NSCoder archiving mechanism.
>
> Hope that helps,
>
> Louis
>
>
>>
>> That's not the part that worries me. My question whether I should be
>> writing this CSV conversion stuff inside the standard methods
>> encodeWithCoder and initWithCoder, and if so how. It seems the
>> Cocoa architecture has a well-thoughtout mechanism for archiving and
>> I should try and work within it. My problem is the example I have
>> produces a binary coded disk file and I need csv text. I want to
>> make the Cocoa archive mechanism work with csv.
>>
>> I know that there are traps in this as one of my data columns could
>> contain commas, so I need to worry about quotes, but for the present
>> task I do have to conform to csv because I am going to propose this
>> Cocoa application as a replacement for an existing web-browser based
>> data entry program and it's important to show that it can simply
>> replace the older program with out requiring anybody else to change.
>>
>> Denis
>>
>>
Orcon Internet Limited
(09) 480 9299
http://www.orcon.net.nz -
Hi j o a r
Thank you, that answers my question precisely, even though the answer
is "don't do that". It matches whet I have done.
- Do modify dataRepresentationOfType and loadDataRepresentation:ofType:
- Don't mess with NSKeyedArchiver and coder. They have their own
agenda with handling an object graph. Even if I could write csv
versions of the class instances NSKeyedArchiver would want to include
the information that they were in an array so it would add information
to the output file.
Thanks for your help
Denis
On May 17, 2005, at 6:00 PM, j o a r wrote:
>Denis Stanton
> On 17 maj 2005, at 04.08, Denis Stanton wrote:
>
>> That's not the part that worries me. My question whether I should be
>> writing this CSV conversion stuff inside the standard methods
>> encodeWithCoder and initWithCoder, and if so how. It seems the
>> Cocoa architecture has a well-thoughtout mechanism for archiving and
>> I should try and work within it. My problem is the example I have
>> produces a binary coded disk file and I need csv text. I want to
>> make the Cocoa archive mechanism work with csv.
>
> Why not do it in the dataRepresentationOfType: and
> loadDataRepresentation:ofType: methods of your document subclass?
> NSKeyedArchiver provides a way to store / restore an object graph, and
> it provides it's own storage format. Since neither of that seems to be
> something that's really useful for you, I'd suggest that you avoid
> using it for this particular purpose.
>
> j o a r
>
>
>
Orcon Internet Limited
(09) 480 9299
http://www.orcon.net.nz -
On 17-mei-2005, at 4:36, Bob Ippolito wrote:
>Fields containing quotes and comma's require additional work. Using
> On May 16, 2005, at 10:12 PM, Brian Smith wrote:
>
>
>>
>> On May 17, 2005, at 9:43 AM, Bob Ippolito wrote:
>>
>>
>>
>>> One alternative is to look at PyObjC, because Python ships with a
>>> csv module.. though it's not optimal since current versions have
>>> a limitation such that it only knows how to deal with
>>> bytestrings, not unicode, so you have to encode everything into
>>> utf-8 before putting it into the csv and decode it from utf-8
>>> after getting it out.
>>>
>>
>> I've used python to read a csv file and it can't handle mac line
>> endings too, which the files I need to read have. So, with
>> PyObjC, I used NSString's componentsSeparatedByString: method to
>> read the file, so I have found this to be useful, but obviously
>> you have to experiment given on csv files you have. I did have to
>> strip some quote marks from the ends of the array of strings, but
>> I was able to still do it easily with NSString and NSArray methods.
the csv module requires less thought.
>>
>
> Actually it can read Mac line endings (bare '\r') just fine if you
> open the file with universal newlines (the 'U' mode). I do this
> all the time.
Technically that is not correct, you need to use lineterminator='\r'
in the dialect.
Ronald -
On May 18, 2005, at 4:49 PM, Ronald Oussoren wrote:
>
> On 17-mei-2005, at 4:36, Bob Ippolito wrote:
>
>
>>
>> On May 16, 2005, at 10:12 PM, Brian Smith wrote:
>>
>>
>>
>>>
>>> On May 17, 2005, at 9:43 AM, Bob Ippolito wrote:
>>>
>>>
>>>
>>>
>>>> One alternative is to look at PyObjC, because Python ships with
>>>> a csv module.. though it's not optimal since current versions
>>>> have a limitation such that it only knows how to deal with
>>>> bytestrings, not unicode, so you have to encode everything into
>>>> utf-8 before putting it into the csv and decode it from utf-8
>>>> after getting it out.
>>>>
>>>>
>>>
>>> I've used python to read a csv file and it can't handle mac line
>>> endings too, which the files I need to read have. So, with
>>> PyObjC, I used NSString's componentsSeparatedByString: method to
>>> read the file, so I have found this to be useful, but obviously
>>> you have to experiment given on csv files you have. I did have
>>> to strip some quote marks from the ends of the array of strings,
>>> but I was able to still do it easily with NSString and NSArray
>>> methods.
>>>
> Fields containing quotes and comma's require additional work. Using
> the csv module requires less thought.
Yeah, stripping quotes is not enough, because you'll end up with
columns that are broken in the middle due to a comma being present..
of course, you can hack around this until it works, but I'd highly
recommend using a mostly correct implementation of a CSV parser in
the first place :)
>> Actually it can read Mac line endings (bare '\r') just fine if you
>> open the file with universal newlines (the 'U' mode). I do this
>> all the time.
>>
>
> Technically that is not correct, you need to use
> lineterminator='\r' in the dialect.
It's perfectly correct for *reading* csv files. Universal newlines
will simply convert '\r' to '\n' before the csv reader sees it,
allowing you to read csv files of any line ending (even mixed)
without issues.
For writing csv files, you might care about the line terminator.. but
for reading them, it's VERY convenient to use universal newlines
especially when you're dealing with both the Windows and Macintosh
version of Excel, interchangeably, for example.
-bob -
On May 18, 2005, at 5:02 PM, Bob Ippolito wrote:
> For writing csv files, you might care about the line terminator.. but
> for reading them, it's VERY convenient to use universal newlines
> especially when you're dealing with both the Windows and Macintosh
> version of Excel, interchangeably, for example.
Speaking of Excel, Microsoft does not use standard CSV:
<http://ostermiller.org/utils/ExcelCSV.html>. This might or might not
matter to you -- just thought I'd point it out.
--Andy



