FCHECK

                                     FCHECK
                                     ======

          FCHECK is a suit of file checking and comparison routines


FDIFF

        diff = FDIFF(<file 1>, <file 2>)

File names can be supplied as strings, string expressions or names.

Does a byte-by-byte comparison of two files and returns the approximate
position of the first difference found, or zero. A zero return implies that
the files contain identical data.

A negative return implies some kind of file error in either of the files, or
insufficient memory.



FDIFL

        diff = FDIFL(<file 1>, <file 2>)

This is identical to FDIFF above, except its first port of call after
checking parameters and opening files is to check whether file lengths are
the same. If they are not, FDIFL returns 1.


Both these routines use longword comparisons (apart from any trailing odd
bytes) to speed up the process. They reserve (and release) a buffer of
$8000 b for both files, so multiple rounds of file tests while also doing
other things could cause heap fragmentation.


FCHKSUM

        FCHKSUM(<filename>)

The file name can be supplied as a string, a string expressions or a name.

This routine samples the file and does a "checksum" of those samples, ie
it doesnt check the whole file and the checksum routine is a home baked
routine (not CRC32 or anything really clever). This means that if the
checksums of two files differ, they are definitely not the same - but if
the checksums are identical there is a possibility that the files could
still be different!

In other words, if you wish to know with certainty whether two files
contain the identical data or not, you need to use FDIFF. However, using
FCHKSUM first, especially if you are checking lots of files, will,
potentially, reduce the need for byte-by-byte comparison considerably, as
you only need to check files that return the same checksum.


Duplicates
==========

The Duplicates program is a simple demo on how this command suite could be
used. It could easily be tailored to specific needs.

Note: Duplicates is not a polished program but a simple script to
demonstrate the use of these tools, and for you to mould to your own
requirements if desired. On slow systems or with many tens of thousands of
files it would be better to order files by their lengths first and only do
an FCHKSUM on files of the same length..

Requirements
------------

Needs a recent TK2. The Compiled version also needs Qlib_run V3.36+ or
similar. To RUN or EXECute (SBASIC) the uncompiled version the Dupl_bin
toolkit must be preloaded.

To use
------

        EX <path>Duplicates_obj [; <devN_dir_>]

If no device and directory is given on the command line, then you will be
prompted for these. However, all output will be directed to its window
instead of a file. On many systems the display will scroll too fast
to be read, but it will still give you an idea of what to expect.

If you provide a path on the command line, as suggensted above, the
directory tree provided to be scanned and, in this case, the names and
details of any duplicate files it finds will be copied into a text file
called ram1_duplicates_txt. It will sound a helpful BEEP when it has
finished, but otherwise no other sign of progress.


Program notes
=============

The FCHECK toolkit suite was developed (long ago!) for the purpose of
various utilities to backup, mirror, export, and tidy disks. By each file
having a checksum it would speed up comparisons and reduce the size of any
file sets that might need to be copied over unreliable networks or sent by
slow modems over the Internet. (For example, duplicates would only be
included once in an export file set and sorted out at the receiving end.)

I started work on a proper CRC32 checksum command, but in the end, there
is no absolute guarantee that a 32-bit number can tell whether two files
are identical short of a byte-by-byte comparison.

My FCHKSUM is really lazy. It doesnt even check the whole file, but
samples a buffer load from the beginning, middle and end of a file. I
havent done any scientific tests, but on my normal file set of some 14000
files I had a failure rate of about 0.2%. But of course it all depends on
the kind of files. Lots of files with minor incremental edits would return
worse statistics.


Program status
==============

V0.01, pjw, 1995 Aug
V0.02, pjw, 2021 May 19
V0.02, pjw, 2026 Feb 26, packaged and documented.

               Conditions of use and DISCLAIMER as per Knoware.no
Generated by QuickHTM, 2026 Mar 01
<-Back
ToP