File Compare
The file comparison task (filecompare
) performs the comparison of two files, with different comparison modes:
Mode | Description |
---|---|
binary | data is compared byte by byte |
layout | data is compared according to the provided layout(s) |
text | data is compared as SBCS text, depending on the encoding defined for repositories of the source and target files |
mixed | data is compared as Mixed data (shifted DBCS). Padding of DBCS string is driven by the attribute padmode |
key | data is compared using the keys specified in the provided layout(s). |
When comparing in layout
mode, files must conform to the same record layout, but may differ in:
- record format (for example, IBM variable format vs Micro Focus variable format)
- encoding (ASCII vs EBCDIC)
- integer fields endian (Little vs Big Endian)
For the file comparison in layout mode, the record layout must be provided. In case the files have more than one layout, then also a “matcher” class must be defined to drive the selection of the proper layout.
As alternative, it is possible to provide an external comparator class for comparing the records. In this case Ianus invokes the Compare() method of the comparator class for each record delegating the comparison to the user code.
Sometimes, even when containing the same data, records may be ordered differently, typically because of the different character encoding and collating system of the source systems (EBCDIC vs ASCII). To bypass the issue and avoid false positives, Ianus allows to invoke an external sort engine (MFSORT) before executing the data comparison.
When comparing in key
mode, the same rules of the layout
mode apply. In key
mode, the records are compare like a database table, considering key parts to match record and originate differences.
Key fields are considered only when comparable, therefore you must pay attention to REDEFINES and multi-record layouts.
The file compare task is defined by the XML element filecompare
whose attributes configure the execution parameters.
The following attributes configure the task:
Attribute | Type | Purpose | Default |
---|---|---|---|
left | string | Name of the left repository | |
right | string | Name of the right repository | |
mode | string | Comparison mode (see above). | layout |
diffslimit | integer | Differences limit. When greater than zero, the comparison stops when the specified number of differences is detected. | Zero |
firstastext | bool | If true , the first record is compared as a single SBCS text field |
false |
lastastext | bool | If true , the first record is compared as a single SBCS text field |
false |
padmode | string | Pad mode for mixed data when converting in mixed mode. See Pad mode |
As defined in the environment file |
whitespaces | string | Defines the white spaces handling mode Whitespaces | asis |
ondbcserror | string | Defines the behavior in case of error converting DBCS. See DBCS Error Management | error |
warnmissingdbcs | bool | Warns when DBCS bytes not found in the codepage. | true |
warnmmalformeddbcs | bool | Warns when malformed DBCS sequences. | true |
warneverydbcserror | bool | If true every instance of the DBCS errors gets logged, otherwise only first gets traced |
true |
compareebcdicbycollate | bool | If true , EBCDIC CHAR fields are compared using EBCDIC collating sequence. If false, comparison is dony by ASCII value |
false |
The following elements configure the task:
Element | Purpose |
---|---|
left | Name and attributes of the source file. Refer to Left and Right Files. |
right | Name and attributes of the target file. Refer to Source and Target Files. |
layouts | Defines the record layout(s) contained in the file, when working in layout mode. Refer to Layouts. |
sort | Defines the sort rules. Refer to Sort. |
compare | Defines the user comparison routine. |
Left and Right
left
and right
elements are used to provide the name of the files being compared.
The following attributes configure the file:
Attribute | Type | Purpose | Default |
---|---|---|---|
recfmt | string | Defines the record format of the file. Refer to File Formats | fixed |
reclen | integer | Defines the maximum record length | |
varfmt | string | Defines the variable record format. Refer to Variable Record Formats | If not specified, the repository default is assumed. |
dd | string | When used with in a JCL interface, the name of the DD JCL statement referencing the file. The file characteristics are extracted from the file catalog and, therefore, the other attributes (recfmt, reclen and varfmt) are ignored. | |
trim | bool | If true, trailing spaces and NULLs (0x00) are removed from line sequential records | false |
encoding | string | Name of the text encoding to be used for Line Sequential files. | Latin1 |
cache | bool | If false, file copy is refreshed (applies to cached repositories only) | true |
Note
The list of available encodings is listed here: https://docs.microsoft.com/it-it/dotnet/api/system.text.encoding?view=net-5.0
Note
For MARS repositories, recfmt
, reclen
and varfmt
are ignored. The file characteristics are extracted from the MARS file catalog.
File Formats
Type | Synonim | Description |
---|---|---|
fixed | fb | Fixed Record Length |
variable | vb | Variable Record Length |
lineseq | ls | Line Sequential |
Variable Record Formats
Type | Description |
---|---|
ibm | IBM variable record format |
microfocus | Micro Focus variable record format |
Layouts
The layouts
element defines one or more record layout to compare/convert the files as well as the match script in case of multiple layouts.
The following elements configure the layouts:
Element | Purpose |
---|---|
layout | Adds a layout definition. Refer to Layout |
match | Defines the match algorithm. Refer to Match |
Layout
The layout
element defines a one or more record layouts. Each layout must be identified by a unique name (task-wide).
By default, the layout fields are defined using XML:
- Each layout element defines one single layout
- Record fields are defined using
field
elements
But there is also the possibility to define the layout using COBOL data definition syntax:
- Each layout element defines one or more layouts (one each level 01)
- Record fields are defined using COBOL notation
For further information on layout definition, refer to Record Layouts
Match
The match
element defines the "match class". Whenever more than on layout is provided, Ianus needs a match class to drive the selection of the correct layout for each record compared.
This class must implement the interface HPE.Ianus.Scripting.ILayoutMatch
. For a detailed description of the match class please refer to the Record Layout Layout Matcher section.
The class can be provided as:
- C# script: the code is compiled on the fly by Ianus and declared either:
- inline, by adding the code in the element value, or
- by reference, providing the filename of the script on the path attribute. If the script file name is not absolute, the script is searched in the same path of the job script.
- Class library: the class can be coded in any .NET language such as C# or VB.NET but also COBOL for .NET and compiled as .NET class library. The class can be referenced either:
- directly: providing the filename of the DLL containing the class on the assembly attribute and the full name of the class on the class attribute, or
- by alias: providing the name of the plugin alias defined in the Plug-ins section of the Environment Configuration File.
The following attributes configure the element:
Attribute | Type | Purpose | Default |
---|---|---|---|
path | string | Indicates to load the class script from the specified file | |
assembly | string | Indicates to load the class assembly from the specified DLL | |
class | string | Indicates the name of the class to use | |
plugin | string | Indicates to load the class from the plugin alias defined in Plug-ins |
Pad Mode
When converting/comparing mixed data from EBCDIC to ASCII, Ianus replaces SOSI bytes (0x0E/0x0F) by blanks (0x20). The pad mode indicates how the replacement is done:
Mode | Description |
---|---|
end | Text is shifted left on 0x0e and 0x0f and the area (field or record) processed is padded right with blanks (0x20) |
shift-out | Text is shifted left on 0x0e and 0x0f and the area (field or record) processed and two blanks (0x20) area added on 0x0f |
Whitespaces
The comparison of white spaces in text areas (char fields as well as text/mixed records) is controlled by the attribute whitespace
.
The following comparison modes are supported:
Mode | Description |
---|---|
asis | Whitespaces are compared as they are (so A_B is not equal to A_B_ or A_____B ) |
trim | Trailing whitespaces are ignored (so A_B is equal to A_B_ but not to A_____B ) |
ignore | Multipler, non trailing whitespaces are considered as one and trailing whitespaces are ignored (so A_B is equal to A____B____ ) |
Sort
The sort
element sets the sort directives and instructs Ianus to invoke the external sort engine for one or both files before running the data comparison.
Sort can be processed on one or both sides of the comparison, as specified by the side attribute. Normally it is sufficient to sort only one of the files being compared, but there can be cases when both files need sorting. Therefore, one the following scenario may apply:
Scenario | Script instructions |
---|---|
One only side to be sorted | One sort element with side set to left or right |
Both sides to be sorted with the same directives | One sort element with side set to both |
Both sides to be sorted with different directives | Two sort elements, with side different side and different directives |
Each sort processing is defined by the sort
element. The following attributes configure the layout:
Attribute | Type | Purpose | Default |
---|---|---|---|
side | string | Side where the sort directive applies: left , right or both |
both |
collate | string | Collate sequence to use for the sort: native or ebcdic |
native |
signebcdic | bool | When set to true, zoned fields sign is handled according to EBCDIC conventions | true |
The following elements configure the sort:
Element | Purpose |
---|---|
key | Adds a key to the sort operation. |
Key
The key
element defines a one a sort key. Keys can be definined:
- By referencing a field previously defined in a layout by the
layout
element. - By providing offset, length and type (according to the sort engine formalism)
The following attributes configure the key:
Attribute | Type | Purpose | Default |
---|---|---|---|
field | string | Name of the field to use as key (mandatory). | |
layout | string | Name of the layout containing the field. If omitted, the first field which name matches the provided one, is selected. | |
order | string | Sort direction: asc or desc |
asc |
offset | integer | Offset (starting from 0) of the sort key | |
length | integer | Length in bytes of the sort key | |
type | string | If provided, sets/overrides the key data type, according to the sort engine key formalism (for example, with MFSORT: CH, CX, PD) |
Example
<sort side="right" collate="ebcdic">
<key type="CH" offset="0" length="2" order="asc" />
</sort>
<sort side="both" collate="ebcdic">
<key field="char01" order="desc"/>
<key field="char02" order="desc"/>
<key field="zoned01" order="desc"/>
<key field="zoned02" order="desc"/>
<key field="comp01" order="desc"/>
<key field="comp02" order="desc"/>
<key field="comp03" order="desc"/>
<key field="comp04" order="desc"/>
<key field="pack01" order="desc"/>
<key field="pack02" order="desc"/>
</sort>
Compare
The compare
element defines a comparator class. When provided, Ianus takes care of reading the files but delegates the records comparison to the user provided class.
This class must implement the interface HPE.Ianus.Scripting.IRecordComparator
. For a detailed description of the match class please refer to Record Comparator.
The class can be provided as:
- C# script: the code is compiled on the fly by Ianus and declared either:
- inline, by adding the code in the element value, or
- by reference, providing the filename of the script on the path attribute. If the script file name is not absolute, the script is searched in the same path of the job script.
- Class library: the class can be coded in any .NET language such as C# or VB.NET but also COBOL for .NET and compiled as .NET class library. The class can be referenced either:
- directly: providing the filename of the DLL containing the class on the assembly attribute and the full name of the class on the class attribute, or
- by alias: providing the name of the plugin alias defined in the Plug-ins section of the Environment Configuration File.
The following attributes configure the element:
Attribute | Type | Purpose | Default |
---|---|---|---|
path | string | Indicates to load the class script from the specified file | |
assembly | string | Indicates to load the class assembly from the specified DLL | |
class | string | Indicates the name of the class to use | |
plugin | string | Indicates to load the class from the plugin alias defined in Plug-ins |
Example
<?xml version="1.0" encoding="utf-8"?>
<job name="ZZFILE_EXTERNAL_COMP">
<filecompare name="EXTCMP_CS_SCRIPT" left="REPOEBCDIC" right="REPOASCII">
<log>**** EXTERNAL comparison - NO DIFF ***</log>
<left recfmt="fixed" reclen="80">IANUS.TEST.SEQ01.EBCDIC.DAT</left>
<right recfmt="fixed" reclen="80">IANUS.TEST.SEQ01.ASCII.DAT</right>
<compare>
<![CDATA[
using System.Collections.Generic;
using HPE.Ianus;
using HPE.Ianus.Log;
using HPE.Ianus.File;
using HPE.Ianus.Scripting;
namespace TestScript
{
public class DummyComparator : HPE.Ianus.Scripting.IRecordComparator
{
public int Compare(LoggerFacade log, Record left, Record right)
{
return 0; // dummy - always equal
}
}
}
]]>
</compare>
</filecompare>
<filecompare name="EXTCMP_CS_DLL" left="REPOEBCDIC" right="REPOASCII">
<log>**** EXTERNAL comparison - NO DIFF - DLL ***</log>
<left recfmt="fixed" reclen="80">IANUS.TEST.SEQ01.EBCDIC.DAT</left>
<right recfmt="fixed" reclen="80">IANUS.TEST.SEQ01.ASCII.DAT</right>
<compare assembly="C:\SOME\PATH\\ExternalCompareCS.dll"
class="ExternalCompareCS.TestComparator01"/>
</filecompare>
<filecompare name="EXTCMP_COBOL_DLL_01" left="REPOEBCDIC" right="REPOASCII">
<log>**** EXTERNAL comparison COBOL - NO DIFF - DLL ***</log>
<left recfmt="fixed" reclen="80">IANUS.TEST.SEQ01.EBCDIC.DAT</left>
<right recfmt="fixed" reclen="80">IANUS.TEST.SEQ01.ASCII.DAT</right>
<compare assembly="C:\SOME\PATH\ExternalCompareCOBOL.dll"
class="ExternalCompareCOBOL.TestCOBOL01"/>
</filecompare>
<filecompare name="FILECMP02" left="REPOEBCDIC" right="REPOASCII" mode="key">
<left recfmt="fixed" reclen="80">IANUS.TEST.KEYSEQ01.EBCDIC.DAT</left>
<right recfmt="fixed" reclen="80">IANUS.TEST.KEYSEQ03.ASCII.DAT</right>
<layouts>
<layout type="cobol" format="free" length="80">
01 FILE01-REC-COBOL.
03 FILLER.
@ianus* keypart=true
05 KEY01 PIC X(8).
05 CHAR03 PIC X(16).
@ianus* keypart=true
05 KEY02 PIC X(4).
03 ZONED-FIELDS.
05 ZONED01 PIC 9(8).
05 ZONED02 PIC 9(8).
03 FILLER PIC X(36).
</layout>
</layouts>
</filecompare>
</job>
Comparison Log
Ianus will report on both console and log file the result of the comparison, indicating the differences, when detected.
For each difference detected, Ianus will log a message structured as follows:
- Difference indicator, a single character (
<
,!
or>
) indicating:- When a record is on the left side only (
<
) - When a row with the same keys is on both sides but some other fields are different (
key
mode only) - When a record is on the right side only (
>
)
- When a record is on the left side only (
- Number of the record causing the difference
- Names of the field(s) containing different data (after the
@
character)
Example
[info] Parallel comparison started
[warn] ! record 2/2 @ ZONED02, COMP02, COMP04, PACK02
[warn] < record 3
[warn] < record 4
[info] Compared 4 vs 2 rows
[warn] REPOEBCDIC/IANUS.TEST.SEQ02.EBCDIC.DAT vs REPOASCII/IANUS.TEST.SEQ02.ASCII.DAT reported 3 differences
Where you can see:
- The indicator char
- The number of record(s) causing the difference
- The name of the fields(s) with differente data
Furthermore, Ianus will trace in the log file the actual content of the columns causing the difference.
Example
<filecompare name="FILECMP03" left="REPOEBCDIC" right="REPOEBCDIC">
<log>**** SEQ VAR IBM vs IBM comparison ***</log>
<left recfmt="variable" reclen="932">SYS053.bin</left>
<right recfmt="variable" reclen="932">SYS053.bin</right>
<layouts>
<layout name="SYS053-DESCR">
<field name="header-sup" length="1" type="binary"/>
<field name="header-cle-tab" length="4" type="zoned" signed="false"/>
<field name="header-cle-cenr" length="1" type="zoned" signed="false"/>
</layout>
</layouts>
</filecompare>
<filecompare name="FILECMP03" left="REPOEBCDIC" right="REPOEBCDIC">
<log>**** SEQ VAR IBM vs IBM comparison ***</log>
<left recfmt="variable" reclen="932">SYS053.bin</left>
<right recfmt="variable" reclen="932">SYS053.bin</right>
<layouts>
<layout type="cobol" format="free" length="auto">
01 SYS053-DESCR.
03 HEADER-SUP PIC X.
03 HEADER-CLE-TAB PIC 9(4).
03 HEADER-CLE-CENR PIC 9.
</layout>
</layouts>
</filecompare>
Status codes
Status | Status code | Description |
---|---|---|
Ready | -1 | Task is initialized, but not yet started |
Running | -2 | Task is running |
Success | 0 | Task completed successfully, no difference detected |
Warnings | 1 | Task completed with warnings, one or more records differ |
Errors | 2 | Task completed with errors, severe error detected |
Aborted | 9 | Task cannot be executed |