Monday, September 26, 2011

Extract records from File1 which are not available in File2

Assume there are two files, whose keys (unique) are the first 12 bytes. File1 has record length (say) 600 and File2 has record length (say) 150. What we need is to Extract records from File1 which are not available in File2, where Field2 in File2 matches with Field1 in File1.

i.e. if we put it in SQL it would be something like :
Select * from File1
Where File1.Field1 NOT in 
(Select File2.Field2 from File2) 


Here are two methods we know:
A. In four steps using SORT :
i. To file1, concatenate an identifier (say) '1' to the end of the output record structure.
ii. Extract all records from file2; make the output file length = the length of the file1; Concatenate an identifier(say) '2' to the end of the output record structure.
iii.Sort the two files generated in steps ii. and iii. above : 
SORT by the key(first 12 bytes), and SUM the new identifier field added. Now : 
a. The Sum is 1, if the record is available only in file1
b. The Sum is 2, if the record is available only in file2
c. The Sum is 3, if the record is available in both files.
iv. Extract all the records having the SUM '1' from this output file - they satisfy our requirement. 

B. In a single step using SYNCTOOL/ICETOOL :


//STEP010 EXEC PGM=SYNCTOOL
//SSMSG DD SYSOUT=*
//TOOLMSG DD SYSOUT=*
//SYSUDUMP DD SYSOUT=*
//IN1 DD DISP=SHR,DSN=userid.tsts.file1
//IN2 DD DISP=SHR,DSN=userid.tsts.file2
//OUT1 DD DSN=&&TEMP1,
// DISP=(NEW,DELETE,DELETE),
// SPACE=(CYL,(5,5),RLSE),
// DCB=(RECFM=FB,LRECL=601)
//OUT2 DD DSN=&&TEMP2,
// DISP=(NEW,DELETE,DELETE),
// SPACE=(CYL,(5,5),RLSE),
// DCB=(RECFM=FB,LRECL=601)
//*------------------------------------------------------------------*
//* NOTE : THE FOLLOWING TEMPORARY FILE NEEDS TO HAVE DISP = MOD *
//* SINCE IT IS REQUIRED TO CONCATENATE TWO DATASETS -
//* OUT1 and OUT2
//*------------------------------------------------------------------*
//OUT3 DD DSN=&&TEMP3,
// DISP=(MOD,DELETE,DELETE),
// SPACE=(CYL,(5,5),RLSE),
// DCB=(RECFM=FB,LRECL=601)
//OUT4 DD DSN=&&TEMP4,
// DISP=(NEW,DELETE,DELETE),
// SPACE=(CYL,(5,5),RLSE),
// DCB=(RECFM=FB,LRECL=601)
//OUT5 DD DSN=DSN=userid.tsts.output,
// DISP=(NEW,CATLG,DELETE),
// SPACE=(CYL,(5,5),RLSE),
// DCB=(RECFM=FB,LRECL=600)
//SYSOUT DD SYSOUT=*
//TOOLIN DD *
COPY FROM(IN1) USING(CTL1)
COPY FROM(IN2) USING(CTL2)
COPY FROM(OUT1) USING(CTL3)
COPY FROM(OUT2) USING(CTL3)
SELECT FROM(OUT3) TO(OUT4) ON(1,12,CH) NODUPS
COPY FROM(OUT4) USING(CTL4)
/*
//CTL1CNTL DD *
* ADD IDENTIFIER '1' TO THE END OF THE FILE1 RECORD
OUTFIL FNAMES=OUT1,OUTREC=(1,600,C'1')
/*
//CTL2CNTL DD *
* MAKE THE RECORD LENGTH EQUAL TO THAT OF FILE1 BY PADDING WITH
* SPACES, AND ADD IDENTIFIER '2' TO THE END OF THE RECORD
OUTFIL FNAMES=OUT2,OUTREC=(1,12,588X,C'2')
/*
//CTL3CNTL DD *
* DIRECT COPY FROM INPUT TO OUTPUT;USED TO CONCATENATE THE TWO
* FILES OUT1 and OUT2 to OUT3
OUTFIL FNAMES=OUT3
/*
//CTL4CNTL DD *
* KEEP ONLY THOSE RECORDS IN DS1 THAT AREN'T IN DS2(i.e. WITH
* IDENTIFIER '1')AND REMOVE THE '1' IDENTIFIER FROM THE RECORD
OUTFIL FNAMES=OUT5,INCLUDE=(601,1,CH,EQ,C'1'),OUTREC=(1,600)
/*
//*

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.