Cogent Engineering Limited

ArrayDiff

ArrayDiff is a C++ class which provides a very simple and versatile API to perform comparisons between arrays. The code is based on Michael B Allen's code which has the following description:

This algorithm is basically Myers' solution to SES/LCS with the Hirschberg linear space refinement as described in the following publication:

E. Myers, "An O(ND) Difference Algorithm and Its Variations," Algorithmica 1, 2 (1986), 251-266.

This is the same algorithm used by GNU diff(1).

The output from the ArrayDiff class is the Shortest Edit Script (SES) which can be used to manipulate a Working Array in order to make it match a Target Array. The arrays can be of different types but as long as there is a common element to either type then the Target Array can be used to manipulate and re-order the Working Array. In addition, a comprehensive test module is included for verification purposes.

The source code is designed to be as platform independent as possible. It should run with only the smallest of modifications (if any) in any C++ system. The source code files can be obtained using the links below:

ArrayDiff.h Class definition header file with public API for ArrayDiff
ArrayDiff.cpp Class implementation file
ArrayDiff_Debug.cpp Debug / test functions
Array_TPtr.h Array template implementation to optionally replace std::vector

Some software systems have the Standard Template Library vector template available to them. This can be used directly with ArrayDiff because ArrayDiff uses the vector access functions. However, if vector is not available then my TPtrArray and TArray templates may be used instead. These are array templates which provide emulations of the same vector functions as are used in ArrayDiff and can be used as a direct replacement. TPtrArray and TArray do not provide all the member functions of vector, but they provide all the necessary functions for ArrayDiff.

The documentation (as included at the top of ArrayDiff.h) is as follows:

Terminology

  • Working Array - this is the array which is in use by a system and is to be compared to a Target Array and manipulated by the processing of an Edit Script until it matches the Target array.
  • Target Array - this is the array which defines the required ordering and content of the Working Array after processing of an Edit Script.
  • Edit Script - this is a list of Operations which are applied one by one on a Working Array in order to make it match a Target Array.
  • Operation - an individual function to be performed on a Working Array which will perform insertions, deletions and movements of blocks of items within the Working Array.
  • Client - this is the system which owns the Working and Target Arrays. The Client is the caller of functions within the ArrayDiff class, provides the comparison callback functions and processes the Operations within the Edit Script in order to make the Working Array match the Target Array.

The ArrayDiff class is used by Clients to generate the Edit Script in order to manipulate the Working Array to make it match the Target Array. Items in the Working and Target Arrays need not be of the same type, but there does need some commonality in order to perform comparisons.

Single Stage Matching

In this mode, the pModifiedFn parameter passed into initialise() is NULL. If the pCompareFn callback passed into initialise() returns true then there is a full match between the Working and Target Arrays for the specified items and the MATCH Operation is returned to the Client. In this mode, none of the *MODIFIED Operations will be returned to the Client.

Two Stage Matching

In this mode, the pModifiedFn parameter passed into initialise() is defined. Items are only passed to this callback if they have already passed the first stage i.e. the pCompareFn callback has returned true. If the pModifiedFn callback returns false then a full match is achieved and the MATCH Operation is returned to the Client, otherwise a partial match is achieved and the MODIFIED Operation is returned to the Client.

Overview

After constructing and initialising an object of the ArrayDiff class and performing the comparison function, each Operation is retrieved from the ArrayDiff class using getEditOp(), which passes an object of the type SesData_t in order to maintain the context and return the Operation parameters. All Operations are performed on the Working Array and are ordered sequentially from the start of the array towards its end such that SesData_t::workingIndex always increments. The SesData_t::moveIndex is another index into the Working Array for MOVE Operations, specifying the index to move from or to. SesData_t::targetIndex is an index into the Target Array. The Operations are described below. Either before or after compare() is called, but before getEditOp(), trackItemIndex() can be used to track a particular index during the Operations performed on the Working Array. For example, a highlighted item in a list can remain highlighted after the Operations are performed without having to search for it again.

Operation: MATCH

The Working and Target Arrays fully match for this block of items. The pCompareFn callback has returned true and either the pModifiedFn callback was not defined, or it returned false. No action need be taken. This Operation is only passed back to the Client if reportMatchOps is true in the call to getEditOp(). It is not normally required by the Client.

workingIndex index into the Working Array
moveIndex unused
targetIndex index into the Target Array
pseudo WorkingArray[workingIndex..] == TargetArray[targetIndex..]

Operation: MODIFY

The Working and Target Arrays partially match for this block of items. The pCompareFn callback has returned true and the pModifiedFn callback has returned true. This operation is normally used to keep an existing item in the Working Array, but modify some small part of it or even all of it as required.

workingIndex index into the Working Array
moveIndex unused
targetIndex index into the Target Array
pseudo WorkingArray[workingIndex..] <== TargetArray[targetIndex..]

Operation: DELETE

This Operation is used to delete a block of items from the Working Array because they were not found in the Target Array. There are two interpretations of this Operation depending on the value of checkMoved passed into initialise(). If checkMoved is false then the block of items was simply not found at that point in the Target Array. If checkMoved is true then the block of items was not found anywhere in the Target Array.

workingIndex index into the Working Array
moveIndex unused
targetIndex index into the Target Array
pseudo delete WorkingArray[workingIndex..]

Operation: INSERT

This Operation is used to insert a block of items into the Working Array because they were not previously there but they were found in the Target Array. There are two interpretations of this Operation depending on the value of checkMoved passed into initialise(). If checkMoved is false then the block of items were new at this point in the Target Array. If checkMoved is true then the block of items are new to the whole Target Array.

workingIndex index into the Working Array
moveIndex unused
targetIndex index into the Target Array
pseudo insert WorkingArray[workingIndex..] from TargetArray[targetIndex..]

Operation: MOVE_FORWARD

Only returned to the Client if checkMoved is true in the call to initialise(). This Operation is used to move a block of items from workingIndex forward to moveIndex within the Working Array using one of the following sequences:

  • Cut the block of opLength items from the Working Array at workingIndex (shifting the remaining items back to close up the gap)
  • Insert that block of items into the Working Array at index moveIndex (shifting the remaining items forward to make the space)
Alternatively:
  • Copy the block of items from the Working Array at workingIndex, block size opLength into a temporary store
  • Shift the block of items in the Working Array from (workingIndex+opLength) back to workingIndex, block size (moveIndex-workingIndex). Be careful of possible overlaps in the start and end positions of this block.
  • Copy the block of items from the temporary store back into the Working Array at index moveIndex

For example:

               0123456789
Working Array: abcdefghij
MOVE_FORWARD, workingIndex=2, opLength=3, moveIndex=6
               0123456789
Working Array: abfghicdej ("cde" moved forward in the array by 4 positions)
workingIndex index into the Working Array to move from
moveIndex index into the Working Array to move to
targetIndex index into the Target Array
pseudo move from WorkingArray[workingIndex..] to WorkingArray[moveIndex..]

Operation: MOVE_FORWARD_MODIFIED

This Operation is the same as MOVE_FORWARD except that the pModifiedFn callback was defined and returned true for this block of moved items.

Operation: MOVE_BACK

Only returned to the Client if checkMoved is true in the call to initialise(). This Operation is used to move a block of items from moveIndex backward to workingIndex within the Working Array using one of the following sequences:

  • Cut the block of opLength items from the Working Array at moveIndex (shifting the remaining items back to close up the gap)
  • Insert that block of items into the Working Array at workingIndex (shifting the remaining items forward to make the space)
Alternatively:
  • Copy the block of items from the Working Array at moveIndex, block size opLength into a temporary store
  • Shift the block of items in the Working Array from workingIndex forward to (workingIndex+opLength), block size (moveIndex-workingIndex). Be careful of possible overlaps in the start and end positions of this block.
  • Copy the block of items from the temporary store back into the Working Array at index workingIndex

For example:

               0123456789
Working Array: abcdefghij
MOVE_BACK, workingIndex=2, opLength=3, moveIndex=6
               0123456789
Working Array: abghicdefj ("ghi" moved backward in the array by 4 positions)
workingIndex index into the Working Array to move to
moveIndex index into the Working Array to move from
targetIndex index into the Target Array
pseudo move from WorkingArray[moveIndex..] to WorkingArray[workingIndex..]

Operation: MOVE_BACK_MODIFIED

This Operation is the same as MOVE_BACK except that the pModifiedFn callback was defined and returned true for this block of moved items.

Operations: SKIP, WAS_INSERTED, INVALID

None of these Operations are returned to the Client. They are only used internally. After construction of a ArrayDiff::SesData_t object, the op parameter will be INVALID, but this should never be interpretted by the application.

Usage Example

#include <vector>

// TargetRecord_t is a Target Array item
struct TargetRecord_t
{
    TargetRecord_t(U32 _id, time_t _setTime)
    {
        id      = _id;
        dummyB  = 0;
        setTime = _setTime;
    }

    U32     id;
    U32     dummyB;
    time_t  setTime;
};

// WorkingRecord_t is a Working Array item
struct WorkingRecord_t
{
    WorkingRecord_t(const TargetRecord_t& rData)
    {
        id              = rData.id;
        dummyA          = 0;
        lastUpdatedTime = rData.setTime;
    }

    U32     id;
    U32     dummyA;
    time_t  lastUpdatedTime;
};

// Declare the comparison arrays
static std::vector<TargetRecord_t*> gTargetArray;
static std::vector<WorkingRecord_t*> gWorkingArray;

// Comparison callback function
static bool compareArrayCB(U32 workingIndex, U32 targetIndex, void* pContextData)
{
    return (gWorkingArray[workingIndex]->id == gTargetArray[targetIndex]->id);
}

// Modified callback function
static bool modifiedArrayCB(U32 workingIndex, U32 targetIndex, void* pContextData)
{
    return (gWorkingArray[workingIndex]->lastUpdatedTime !=
            gTargetArray[targetIndex]->setTime);
}

static void runComparison(int& rHighlightedIndex)
{
    void* pContextData = NULL;

    // Construct an ArrayDiff object
    ArrayDiff diff;

    // Initialise the comparison parameters
    diff.initialise(compareArrayCB,modifiedArrayCB,pContextData,true);

    // Perform the comparison, defining the entire Working and Target Arrays
    int diffDistance = diff.compare(0,gWorkingArray.size(),0,gTargetArray.size());
    // If there are any Operations to perform on the Working Array
    if ((diffDistance > 0) || (diff.getNumModified() > 0))
    {
        diff.trackItemIndex(&rHighlightedIndex);

        // While there are Operations to perform
        ArrayDiff::SesData_t sesData;
        while (diff.getEditOp(sesData))
        {
            U32 index;

            switch (sesData.op)
            {
                case ArrayDiff::DiffOp_MODIFY :
                    for (index = 0; index < sesData.opLength; index++)
                    {
                        gWorkingArray[sesData.workingIndex+index]->lastUpdatedTime =
                            gTargetArray[sesData.targetIndex+index]->setTime;
                    }
                    break;

                case ArrayDiff::DiffOp_DELETE :
                    for (index = 0; index < sesData.opLength; index++)
                    {
                        delete gWorkingArray[sesData.workingIndex];
                        gWorkingArray.erase(gWorkingArray.begin()+sesData.workingIndex);
                    }
                    break;

                case ArrayDiff::DiffOp_INSERT :
                    for (index = 0; index < sesData.opLength; index++)
                    {
                        gWorkingArray.insert(
                            gWorkingArray.begin()+sesData.workingIndex+index,
                            new WorkingRecord_t(*gTargetArray[sesData.targetIndex+index]));
                    }
                    break;

                case ArrayDiff::DiffOp_MOVE_FORWARD_MODIFIED :
                    for (index = 0; index < sesData.opLength; index++)
                    {
                        gWorkingArray[sesData.workingIndex+index]->lastUpdatedTime =
                            gTargetArray[sesData.targetIndex+index]->setTime;
                    }
                    // Fall through to next case

                case ArrayDiff::DiffOp_MOVE_FORWARD :
                    for (index = 0; index < sesData.opLength; index++)
                    {
                        WorkingRecord_t* pTemp = gWorkingArray[sesData.workingIndex];
                        gWorkingArray.erase(gWorkingArray.begin()+sesData.workingIndex);
                        gWorkingArray.insert(
                            gWorkingArray.begin()+sesData.moveIndex+sesData.opLength-1,
                            pTemp);
                    }
                    break;

                case ArrayDiff::DiffOp_MOVE_BACK_MODIFIED :
                    for (index = 0; index < sesData.opLength; index++)
                    {
                        gWorkingArray[sesData.moveIndex+index]->lastUpdatedTime =
                            gTargetArray[sesData.targetIndex+index]->setTime;
                    }
                    // Fall through to next case

                case ArrayDiff::DiffOp_MOVE_BACK :
                    for (index = 0; index < sesData.opLength; index++)
                    {
                        WorkingRecord_t* pTemp = gWorkingArray[sesData.moveIndex+index];
                        gWorkingArray.erase(gWorkingArray.begin()+sesData.moveIndex+index);
                        gWorkingArray.insert(
                            gWorkingArray.begin()+sesData.workingIndex+index,
                            pTemp);
                    }
                    break;

                default :
                    // No other Operations should be reported
                    assert(false);
                    break;
            }
        }

        // The Working and Target Arrays should now be matching
    }
}

http://www.cogentengineering.co.uk
email
Visitors:Visitor Counter
Designed and developed by Mike Buckham
Artistic input from Sharon Buckham