3

I am making a program to convert characters to ASCII code.

The user will enter characters and then those characters will be stored into an array and the program will convert those characters to their ASCII value.

Below is my code:

package chartoascii;

import java.io.DataInputStream;
import java.io.IOException;

public class CharToAscii {

    public static void main(String[] args) throws IOException 
    {
        DataInputStream in=new DataInputStream(System.in);
        int n;
        Scanner scan = new Scanner(System.in);
        System.out.println("Enter number of Characters you want to insert : ");
        n = scan.nextInt();
        char character[] = new char[n];
        System.out.println("Enter Characters : ");
        for (int i=0; i<n; i++)
        {
            character[i] = in.readChar() ;
        } // for loop

        for (int i=0; i<character.length; i++)
        {
           int ascii = (int) character[i];
           System.out.println(ascii);
        }
    }

}

My program runs fine, but the output I get is not the ASCII codes.

This is my output:

Enter number of Characters you want to insert : 
4
Enter Characters : 
a
b
c
d
24842
25098
25354
25610
Zabuzard
  • 20,717
  • 7
  • 45
  • 67
Atif Ali
  • 51
  • 6
  • 1
    When casting `char` to `int` you get Unicode. Note that Unicode includes ASCII, so a code like `(int) 'a'` correctly gives `97` (a in ASCII). Check what exactly you are converting, add `System.out.println(character[i]);` to your loop. – Zabuzard Dec 16 '17 at 13:58
  • What does the javadoc of DataInputStream.readChar() say? https://docs.oracle.com/javase/8/docs/api/java/io/DataInputStream.html#readChar--. Use Scanner.nextLine(), and take the first character of the string. – JB Nizet Dec 16 '17 at 13:59
  • 1
    @Zabuza: More accurately, you get UTF-16 (except that surrogate pairs may be isolated from one another). – T.J. Crowder Dec 16 '17 at 13:59
  • As @JBNizet said, the `DataInputStream` always reads **two bytes** but Unicode is of variable length. Especially small values like the ASCII values are represented by one byte only. – Zabuzard Dec 16 '17 at 14:00
  • @Zabuza: No, *Unicode* is not of variable length. Some *transformation formats* are (UTF-8, UTF-16). Others (UTF-32) are not. More: http://www.unicode.org/faq/utf_bom.html – T.J. Crowder Dec 16 '17 at 14:05
  • thanks @JBNizet you were right, it worked with scan.next() – Atif Ali Dec 16 '17 at 14:17

3 Answers3

1

After the line:

n = scan.nextInt();

Add the line:

scan.nextLine()

Then, in your first for loop, use

character[i] = scan.nextLine().charAt(0);

The Scanner will get the first character you enter each time, so casting to int later will return its ascii value. Like comments said, DataStream is the issue here.

siralexsir88
  • 410
  • 1
  • 4
  • 14
  • Of course, if the user is naughty and doesn't type anything before pressing Enter, this will fail with an error. :-) – T.J. Crowder Dec 16 '17 at 14:10
  • Indeed. Maybe surround with a while loop and keep checking if input String is "" and when it finally isn't, then assign charAt(0) to character[i]. Or surround with try catch. – siralexsir88 Dec 16 '17 at 14:14
1

Explanation

Your code has two problems. The bigger issue is that DataInputStream doesn't read like you think it, the other is that you connect two resources to System.in, namely DataInputStream and Scanner. You should just use the Scanner for reading all data.

The problem with linking both is that DataInputStream will also interpret the previous entered 4 since it is only consumed by the Scanner but not by the DataInputStream too. That being said I can't reproduce your exact values. If I enter 4 and after that a, b, c then I won't be able to enter d since the DataInputStream also reads the 4 (I think the reason is that your machine uses \n for newline and mine \r\n). So it's input finally is

4
a
b
c

And if I adjust your loop to also show what it prints (as character):

for (int i = 0; i < character.length; i++) {
    int ascii = (int) character[i];
    System.out.println(character[i] + " -> " + ascii);
}

I get this:

? -> 24845
? -> 2658
? -> 3338
? -> 25357

Okay, so why all the ? instead of the correct inputs? Therefore we need to take a look at how DataInputStream#readChar works. According to its documentation:

Returns: the next two bytes of this input stream, interpreted as a char.

However in order to get the ASCII values we would need to interpret the byte-stream ASCII-like. ASCII is of fixed length too, but with one byte per character instead of not two. However if would also like to be able to read different characters like ä, é or even , you will need to interpret the byte-stream not with fixed length but with some encoding scheme like UTF-16. Now note that UTF-16 is not of fixed length.

In order to understand the values let us take a look at the exact byte-stream, we have

01100001 00001101 // ? -> 24845
00001010 01100010 // ? -> 2658
00001101 00001010 // ? -> 3338
01100011 00001101 // ? -> 25357

As you see, if we arrange the bytes like this (always two bytes) we get the corresponding values in decimal format. For ASCII we would need to rearrange the bytes and read like that:

01100001  //  a -> 97
00001101  // \r -> 13
00001010  // \n -> 10
01100010  //  b -> 98
00001101  // \r -> 13
00001010  // \n -> 10
01100011  //  c -> 99
00001101  // \r -> 13

As you see, the byte-stream contains more characters than just a, b and c, namely \r and \n. Those two are interpreted as newline-command, therefore see Wikipedia.


Solution

The easiest fix would be to use Scanner and its next method (documentation). This method automatically blocks until the next complete token was input. This is determined by the delimiter pattern. To set it up for one UTF-16 character we just delimit by an empty String (therefore take a look at Take a char input from the Scanner):

Scanner scanner = new Scanner(System.in);
scanner.useDelimiter("");

After that you can read 4 String values. However we still have the problem of \r\n being feed to the Scanner.

The easiest way of eliminating that is by using Scanner#nextLine instead (documentation). So instead of just reading one character, we read a whole line. The method automatically throws \r\n away for us:

Scanner scanner = new Scanner(System.in);

System.out.println("Enter number of Characters you want to insert : ");
int n = Integer.parseInt(scanner.nextLine());

char[] character = new char[n];
System.out.println("Enter Characters : ");
for (int i = 0; i < n; i++) {
    // Only use first character of line
    character[i] = scanner.nextLine().charAt(0);
}

for (int i = 0; i < character.length; i++) {
    int ascii = (int) character[i];
    System.out.println(character[i] + " -> " + ascii);
}

Which now correctly prints the ASCII values

a -> 97
b -> 98
c -> 99
d -> 100

To be precise it prints the UTF-16 values, but ASCII is included in UTF-16.

Zabuzard
  • 20,717
  • 7
  • 45
  • 67
0

You had typed another character after each letter: a line feed (U000a). This, together with the error mentioned above, that readChar misleadingly does not do what it purports to do by its name, gives the values you have received: 25098 is hexadecimal x620a, x66 being b and x0a being line feed. By using readLine, you get rid of the line feed.

Jonathan Rosenne
  • 2,055
  • 15
  • 24