15

I am supposed to convert an EBCDIC file to ASCII by using Java. So far I have this code:

public class Migration {
    InputStreamReader reader;
    StringBuilder builder;

    public Migration(){
        try {
            reader = new InputStreamReader(new FileInputStream("C:\\TI3\\Legacy Systemen\\Week 3\\Oefening 3\\inputfile.dat"),
                   java.nio.charset.Charset.forName("ibm500") );
        } catch(FileNotFoundException e){
            e.printStackTrace();
        }
        builder = new StringBuilder();
    }

    public void read() throws IOException {
        int theInt;
        while((theInt = reader.read()) != -1){
            char theChar = (char) theInt;
            builder.append(theChar);

        }

        reader.close();
    }

    @Override
    public String toString(){
        return builder.toString();
    }
}

The file description is the following:

 02 KDGEX.
      05 B1-LENGTH PIC S9(04) USAGE IS COMP.
      05 B1-CODE PIC S9(04) USAGE IS COMP.
      05 B1-NUMBER PIC X(08).
      05 B1-PPR-NAME PIC X(06).
      05 B1-PPR-FED PIC 9(03).
      05 B1-PPR-RNR PIC S9(08) USAGE IS COMP.
      05 B1-DATA.
        10 B1-VBOND PIC 9(02).
        10 B1-KONST.
          20 B1-AFDEL PIC 9(03).
          20 B1-KASSIER PIC 9(03).
          20 B1-DATZIT-DM PIC 9(04).
        10 B1-BETWYZ PIC X(01).
        10 B1-RNR PIC X(13).
        10 B1-BETKOD PIC 9(02).
        10 B1-VOLGNR-INF PIC 9(02).
        10 B1-QUAL-PREST PIC 9(03).
        10 B1-REKNUM PIC 9(12).
        10 B1-REKNR REDEFINES B1-REKNUM.
          20 B1-REKNR-PART1 PIC 9(03).
          20 B1-REKNR-PART2 PIC 9(07).
          20 B1-REKNR-PART3 PIC 9(02).
        10 B1-VOLGNR-M30 PIC 9(03).
        10 B1-OMSCHR.
          15 B1-OMSCHR1 PIC X(14).
          15 B1-OMSCHR2 PIC X(14).
        10 B1-OMSCHR-INF REDEFINES B1-OMSCHR.
          15 B1-AANT-PREST PIC 9(02).
          15 B1-VERSTR PIC 9(01).
          15 B1-LASTDATE PIC 9(06).
          15 B1-HONOR PIC 9(06).
          15 B1-RIJKN PIC X(13).
        10 FILLER--1 PIC 9(02).
        10 B1-INFOREK PIC 9(01).
        10 B1-BEDRAG-EUR PIC 9(08).
        10 B1-BEDRAG-DV PIC X(01).
        10 B1-BEDRAG-RMG-DV REDEFINES B1-BEDRAG-DV PIC X(01).
      05 FILLER PIC X(5).

We can ignore the first 2 bytes on every line. The problem is the bytes where there's a USAGE IS COMP since the reader is not converting them properly, I think I am supposed to read these as bytes or something, though I have no idea how.

Cœur
  • 32,421
  • 21
  • 173
  • 232
Robin-Hoodie
  • 4,519
  • 4
  • 22
  • 56
  • 5
    COMP with 1-4 digits is a two-byte binary. COMP with 5-9 digits is a four-byte binary. It's coming from an IBM Mainframe (most likely) so it will be Big Endian, if that matters. X'0010' will be a value of 16, as will X'00000010'. All the other data is plain unsigned character data, so could be treated as big chunks of characters if more convenient. It *may* be that the first four bytes are not required. A variable-length record is preceded by two two-byte binary fields, containing length and zero. This may be a coincidence here. – Bill Woodger Dec 03 '13 at 12:19
  • 4
    Just to add, that whoever gives you that file is making it more difficult for you. If *all* the fields that you need were plain character fields, the EBCDIC to ASCII conversion can just be done by whatever utility is giving you the file - you'd have no program to write, no wheel to re-invent. – Bill Woodger Dec 03 '13 at 12:21
  • If you can't get the change at the other end, here's an existing wheel http://stackoverflow.com/questions/17448008/convert-mainframe-binary-to-ascii-using-any-open-source-code-or-tool – Bill Woodger Dec 03 '13 at 12:30
  • Thankyou for all the replies, I will take a look at it – Robin-Hoodie Dec 03 '13 at 17:51

2 Answers2

10

If I am interpreting this format correctly you have a binary file format with fixed-length records. Some of these records are not character data (COBOL computational fields?)

So, you will have to read the records using a more low-level approach processing individual fields of each record:

import java.io.*;

public class Record {
  private byte[] kdgex = new byte[2]; // COMP
  private byte[] b1code = new byte[2]; // COMP
  private byte[] b1number = new byte[8]; // DISPLAY
  // other fields

  public void read(DataInput data) throws IOException {
    data.readFully(kdgex);
    data.readFully(b1code);
    data.readFully(b1number);
    // other fields
  }

  public void write(DataOutput out) throws IOException {
    out.write(kdgex);
    out.write(b1code);
    out.write(b1number);
    // other fields
  }
}

Here I've used byte arrays for the first three fields of the record but you could use other more suitable types where appropriate (like a short for the first field with readShort.) Note: my interpretation of the field widths is likely wrong; it is just an example.

The DataInputStream is generally used as a DataInput implementation.

Since all characters in the source and target encodings use a one-octet-per code point you should be able to transcode the character data fields using a method like this:

public static byte[] transcodeField(byte[] source, Charset from, Charset to) {
  byte[] result = new String(source, from).getBytes(to);
  if (result.length != source.length) {
    throw new AssertionError(result.length + "!=" + source.length);
  }
  return result;
}

I suggest tagging your question with COBOL (assuming that is the source of this format) so that someone else can speak with more authority on the format of the data source.

McDowell
  • 102,869
  • 29
  • 193
  • 261
  • 1
    As Bill Woodger pointed out in his comments... An IBM COBOL S9(04) COMP field is a two byte 2's complement big-endian binary number with the sign in the leftmost bit. An S9(08) COMP is similar but occupies 4 bytes. – NealB Dec 03 '13 at 15:50
4

I also faced same issue like converting EBCDIC to ASCII string. Please find the code below to convert a single EBCDIC to ASCII string.

public class EbcdicConverter
{
    public static void main(String[] args) 
        throws Exception
    {
        String ebcdicString =<your EBCDIC string>;
        // convert String into InputStream
        InputStream is = new ByteArrayInputStream(ebcdicString.getBytes());
        ByteArrayOutputStream baos=new ByteArrayOutputStream();

        int line;
         while((line = is.read()) != -1) {
             baos.write((char)line);
         }
         String str = baos.toString("Cp500");
         System.out.println(str);
    }
}
Pang
  • 8,605
  • 144
  • 77
  • 113
Murali
  • 229
  • 3
  • 6
  • How would that deal with the non-character fields in the question? If you don't have non-character data, just do it in the file transfer, don't write convoluted code for it. – Bill Woodger Nov 08 '16 at 20:39