17

I have confusion regarding COBOL data types. Like in many interviews it is asked to explain the difference between COMP-3 and COMP... what is the exact difference? what is the meaning of usage modes in COBOL and how is it related to data types?

Michele La Ferla
  • 6,265
  • 11
  • 44
  • 75
Manasi
  • 687
  • 8
  • 18
  • 30

5 Answers5

18

USAGE in COBOL describes how a data item is to be used. A few examples of USAGE are:

  • DISPLAY. This identifies an item that may be printed on a terminal or report. This may or may not be a number (e.g. could be a text value). The description of the DISPLAY item is given by the PICture clause. For example: PIC 9(5) USAGE DISPLAY describes a 5 digit number that may be displayed (printed). Often USAGE DISPLAY is left off because it is implied if missing.
  • INDEX. This identifies an item used as an index into a table (OCCURS).
  • COMPsomething indicates that the data item is to be used in arithmetic operations (i.e. it is a number of some type).

There are various types of numeric item. Two of the most commonly used numeric data types are:

  • COMPUTATIONAL or COMP. This is equivalent to BINARY
  • COMPUTATIONAL-3 or COMP-3. This is equivalent to PACKED-DECIMAL

COMP (BINARY) data items are generally the most efficient way to perform calculations on data items that represent integer values.

COMP-3 (PACKED-DECIMAL) data items are used in COBOL because they maintain a fixed number of decimal points. All computations lead to a result having the prescribed number of decimal points. This is particularly useful in accounting type operations. Floating point numbers make the number of digits after the decimal point variable (e.g. the decimal point can "float") which is not the way financial operations are usually represented.

You can find a complete list of COMPutational items for IBM Enterprise COBOL here

One of the problems many programmers have when beginning with COBOL is understanding that a COMP item is great for doing math but cannot be displayed (printed) until it is converted into a DISPLAYable item through a MOVE statement. If you MOVE a COMP item into a report or onto a screen it will not present very well. It needs to be moved into a DISPLAY item first.

The other thing that you may want to research a bit more is the relationship between the PICture and the USAGE when defining variables in COBOL. Here is a link to a very good introductory COBOL Tutorial from the University of Limerick.

Rick Smith
  • 3,077
  • 6
  • 11
  • 20
NealB
  • 15,862
  • 2
  • 34
  • 60
  • How many COMP type of vaiables do we have? I was under impression we only have COMP and COMP3... from which COMP is binary storage and COMP 3 is packed decimal storage. and from the replies to my question I got that these data types differ in storage that is memory it takes to store the data. what is COMP-5? – Manasi Jun 11 '10 at 04:35
  • 2
    @Manasi For IBM Enterprise COBOL there are 5 distinct COMPUTATIONAL items, COMP-1 through COMP-5. I provided a link to the IBM manual describing these in my original post - you should review it. Note that some computational types have multiple names (e.g. COMP/BINARY and COMP-3/PACKED-DECIMAL). Each COBOL vendor supports a similar set of COMP-x items (there may be vendor differences in the way rounding, precision and truncation are handled). Some vendors (eg. RM) provided a COMP-6 item. COMP-5 is a native binary format having 2, 4 or 9 bytes of storage. – NealB Jun 11 '10 at 13:47
  • 1
    oops... last sentence should read: 2, 4, or 8 bytes of storage. – NealB Jun 11 '10 at 14:16
  • 1
    That'd be COMP and COMP-1 through COMP-5. BINARY is the same as COMP and COMP-4. PACKED-DECIMAL is the same as COMP-3. Since COMP/COMP-4/BINARY are the same, we still have five :-) – Bill Woodger Jan 21 '13 at 09:21
  • RM COMP-6 is unsigned COMP-3. I doubt they invented it: it would have been put in for compatibility with some existing COBOL. – user207421 May 01 '13 at 23:04
  • @EJP, as in a "binary" field which only contains decimal values? X'123456', rather than X'0123456F'? That is Binary Coded Decimal (BCD). Partly an "old" thing (to save on disk/tape storage) and partly a data-type which is still in use, something to watch out for. The IBM Mainframe has no "native" BCD, so you have to code it when needed... – Bill Woodger May 03 '13 at 10:30
16

COBOL really only has two data types: Numbers and strings.

The layout of each field in a COBOL record is precisely specified by a PICTURE (usually abbreviated PIC) clause. The most common ones are:

  • PIC X for strings. PIC X(100) means a 100-byte string.
  • PIC 9 for numbers, optionally with S (sign) or V (implicit decimal point). For example, PIC S9(7)V99 means a signed number with 7 digits to the left of the implicit decimal point and 2 digits to the right.

Numeric fields can have a USAGE clause to optimize their storage. The most common USAGEs are DISPLAY, COMP, and COMP-3.

DISPLAY stores each digit as a character. For example, PIC 9(4) VALUE 123 stores the number as if it were the string "0123". And PIC 9(4)V99 VALUE 123.45 stores it as "012345". Note that the decimal point is not actually stored.

This is an inefficient format in that it requires 8 bits to represent each digit. But it does have an "optimization" for signed numbers by using half of the last byte to store the sign. Normally, EBCDIC digits all have a high nybble of F, so 0123 is F0 F1 F2 F3. But -0123 is F0 F1 F2 D3; the D indicates negative. C means positive, and F means unsigned (i.e., positive). (Similar formats are used in ASCII versions of COBOL, but not as standardized.)

COMP-3 is binary-coded decimal with trailing sign nybble. PIC 9(3) COMP-3 VALUE 123 becomes the two bytes 12 3F.

COMP or BINARY is native binary format, just like short, int, or long in C.

dan04
  • 77,360
  • 20
  • 153
  • 184
  • Thanks for the answers. I want to know the factors for deciding the data types suitable for different scenarios. like the memory consumption for each data type is different. COMP will take I guess 4 bytes of memory and COMP-3 takes (digits/2)+1 bytes. – Manasi Jun 09 '10 at 06:16
  • COMP uses the smallest data type that will hold all the digits, but often it has to be a power of two. So, if 16-, 32-, and 64-bit types are available, then 1-4 digits take 2 bytes, 5-9 digits take 4 bytes, and 10-18 digits take 8 bytes. This makes COMP-3 optimal for fields with 1, 5, or 10-13 digits. – dan04 Jun 09 '10 at 07:44
  • As for deciding which data type to use, I wouldn't know. I don't actually *write* COBOL. I'm a C++ programmer who had to learn to read COBOL layouts in order to pass data to programs on our mainframe. – dan04 Jun 10 '10 at 05:41
  • 1
    Tempted to give a -1 for omitting all the other Cobol data types besides numbers and strings -- Objects? Dates? Pointers? Files (in some variants)? – Joe Zitzelberger Apr 29 '11 at 05:30
  • Cobol does not have "strngs", as in a field with an end-of-data marker of some description. Cobol has fixed-length fields and variable-length fields (where the length is held externally to the data). For dan04's comment, since he doesn't know Cobol, forget about the "optimal for"... – Bill Woodger Jan 21 '13 at 09:24
4

As for deciding which data type to use, it can be made very complicated - BUT - a simple set of guidelines are:

DISPLAY and Edited Zone Decimal should only be used for displaying numerics in a report or sysout. Move COMP and COMP-3 fields to a DISPLAY/Edited field before putting it in a report or to sysout.

COMP - has the fastest calculation speed for integers

COMP-3 (PACKED Decimal) - should be used when decimal positions should be maintained.

COMP and COMP-3 fields can be used together in calculations. The compiler will determeine which field type will be converted (under the covers) to a single common numeric data type - rules based.

user396088
  • 59
  • 1
2

As other reply suggests, COMP means big endian binary. COMP-3 is packed decimal- which means one decimal digit is mapped to each nibble.

I am not sure the previous reply got the issue around precision correct though.

PIC S9(9)V9(9) COMP and PIC S9(9)V9(9) COMP-3

Have exactly the same precision. That is part of the ANSI85 standard. It is the job of the compiler and runtime to ensure that the binary representation in the COMP has the appropriate transformations placed upon it to ensure exactly the same results are achieved as would be if usage was display or COMP-3.

IBM mainframe computers have packed decimal calculations in hardware. This is very helpful, because the conversion of decimal to binary scales as n squared n is the length of the number. This means that COMP-3 is every often the fastest format of the mainframe, but is less likely to be on distributed systems. However, this again is not always the case. For example, the Micro Focus native COBOL solution will tend to be faster in COMP-3 than COMP-5 for very large decimal precision (>18 digits) but the reverse for otherwise. The Managed COBOL system from Micro Focus is almost always fastest in COMP (actually, COMP-5 is the best - which is similar to COMP but will have hardware endian rather than enforcing big-endian memory layout).

Finally, my I suggest that for intermediate values and general mathematics, the newer data definitions of binary-long and binary-double are a better choice because then the compiler can make the decisions about how to store and optimize for you.

For more on COBOL on distributed and Managed COBOL check out this knol: http://knol.google.com/k/alex-turner/micro-focus-managed-cobol/2246polgkyjfl/4 and also feel free to look up cobol on facebook :)

  • You should be warned that there is variance among COBOL compilers with respect to how precision is handled for binary data types, particularly when truncation occurs. Understanding how intermidiate results in complex calculations are managed is also a fairly complex subject. For example, see [Intermediate Results](http://publibfp.boulder.ibm.com/cgi-bin/bookmgr/BOOKS/IGY3PG31/APPENDIX1.1?DT=20060329003636) as they apply to IBM Enterprise COBOL. – NealB Jun 09 '10 at 14:09
  • +1 for mentioning that IBM mainframe computers have packed decimal calculations in hardware. Packed decimal was originally an IBM enhancement to Cobol. – Gilbert Le Blanc Jun 09 '10 at 23:14
  • For COMP-3 COMP and COMP-5 there should not be any variance because there intermediate results are defined in the standard. However, different compiler vendors do have their own extensions. COMP-1 and COMP-3 are poorly defined though. – alex turner Jun 10 '10 at 13:53
  • 1
    Agreed for COMP-3 and COMP-5, but beware of COMP/BINARY items because, at least for IBM Enterprise COBOL, the TRUNC(BIN/OPT/STD) compiler option affects how truncation is managed. – NealB Jun 10 '10 at 20:43
0

To clarify when you would select a particular type and usage for a data item.

Any character data then PIC X(n) of the appropriate size for the string. Shorter strings will be padded with trailing spaces.

Numbers which are seldom used in calculations but are displayed often (e.g. AGE, ZIPCODE, CUSTOMER_NUMBER) then PIC 9(n) USAGE DISPLAY.

Whole numbers used to count things which are used in calculations (e.g. QTY_AVAILABLE) THEN PIC S9(4) COMP. S9(4) is a smallint on most paltforms S9(8) is a 32 bit integer on most platforms.

Currency values used in calculations (e.g. PRICE, DELIVERY_COST, TAX ) then PIC S9(4)V99 COMP or COMP-3. This will enable accounting calculations with the correct rounding.

If platform is an IBM mainframe or similar which has hardware support for packed decimal then choose COMP-3, otherwise COMP is more efficient.

Note that to show COMP values on a scrren or report you must first move it to a DISPLAY type item so "PIC S9(4)V99 COMP" should be moved to a "PIC ---9.99 DISPLAY" item to make it human readable. This would display numbers as " 12.45" and " -123.45".

James Anderson
  • 26,221
  • 7
  • 45
  • 76