Skip navigation.
 
mlRe: Best way of identifying duplicate files in Cocoa
FROM : Frank Reiff
DATE : Tue Nov 20 22:42:53 2007

Hi Jean-Daniel,

Thanks for your response.

On 16 Nov 2007, at 14:46, Jean-Daniel Dupas wrote:

>
> Le 16 nov. 07 à 14:25, Frank Reiff a écrit :

>> Another issue is of course performance. Comparing byte-by-byte is 
>> certainly the simplest and most reliable way of doing this, but 
>> it's SLOW.. on the other hand I don't really know what the 
>> performance characteristics of an MD5, CRC32, or SHA hash are and 
>> whether or not you need to read in the whole file contents to apply 
>> them..
>>
>> It would thus be great if somebody, somewhere had published a ready-
>> to-use - (BOOL) file: (NSString*) path isIdenticalTo: (NSString*) 
>> path2; method :-)
>>
>> I've spent the last two hours searching the web, but I haven't 
>> found anything that comes close..

>
> You don't have to check byte-by-byte if the two files have a 
> different size.
> Then, comparing byte-per-byte is not so slow, as you can abort the 
> comparaison as soon as two bytes are differents.
>
> Using a hash method has no benefit to compare two files on the disk. 
> It's only usefull if you want to compare a remote file (with 
> precomputed hash) and a local file.


I'll probably be going with:

* check length
* check last few bytes (begin with the same bytes but do not finish 
with them)
* check byte-by-byte

Computing a hash could be interesting in situations where there are 
lots and lots of files with the same length. Instead of having to 
compare each file with all other files of the same length, one could 
simply compute the hash by traversing it once and then compare the 
hashes instead. Of course in order to be 100% certain one would need 
to then do another byte-by-byte check again. Alternatively one could 
cash the relationships between all files, e.g. A != B and B == C means 
A != C and C! = A

I can see this could be fun :-)

Best regards,

Frank_______________________________________________

Cocoa-dev mailing list (<email_removed>)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/<email_removed>

This email sent to <email_removed>

Related mailsAuthorDate
mlBest way of identifying duplicate files in Cocoa Frank Reiff Nov 16, 14:25
mlRe: Best way of identifying duplicate files in Cocoa Jean-Daniel Dupas Nov 16, 14:46
mlRe: Best way of identifying duplicate files in Cocoa matt.gough Nov 16, 14:57
mlRe: Best way of identifying duplicate files in Cocoa Frank Reiff Nov 20, 22:42
mlRe: Best way of identifying duplicate files in Cocoa Michael Watson Nov 20, 23:48
mlRe: Best way of identifying duplicate files in Cocoa Bill Bumgarner Nov 21, 00:33
mlRe: Best way of identifying duplicate files in Cocoa Jean-Daniel Dupas Nov 21, 10:33
mlRe: Best way of identifying duplicate files in Cocoa Bill Bumgarner Nov 21, 10:55
mlRe: Best way of identifying duplicate files in Cocoa Army Research Lab Nov 21, 13:21
mlRe: Best way of identifying duplicate files in Cocoa Frank Reiff Nov 21, 15:23
mlRe: Best way of identifying duplicate files in Cocoa Frank Reiff Nov 21, 15:32
mlRe: Best way of identifying duplicate files in Cocoa Frank Reiff Nov 21, 15:40