2

I'm currently working on Java I/O, while its easy to work and deal with byte streams and character streams.I was just wondering, how java manages to convert any type of file(image,pdf etc ) into its bytearray representation.

Ole V.V.
  • 65,573
  • 11
  • 96
  • 117
GeekyCoder
  • 41
  • 4
  • 1
    The OS reads the bytes from disk and loads blocks of bytes into memory, so there’s not much Java has to do by itself. I think you will find better explanations through your search engine than you can expect from the answers to a Stack Overflow question. – Ole V.V. Sep 07 '18 at 07:59
  • 1
    Files are just bytes, and the operating system provides a file system that can read those bytes, either byte wise or per block of bytes. FileInputStream does that. (using native calls) If those bytes represent text an InputStreamReader can read them as char/String given the charset/encoding of those bytes. Likewise ImageIO will read an Image. – Joop Eggen Sep 07 '18 at 08:00
  • There is no conversion. A file already consists of bytes. – Ole V.V. Sep 07 '18 at 08:01
  • Although basic, this is a valid question, and should not be closed. – Raedwald Sep 07 '18 at 08:14

2 Answers2

4

For a computer a files is nothing more than a collection of bytes (and some metadata like name, path, data...) on disk. There isn't really such a thing as a 'type' of a file.

But what does a pdf-file than even mean? Well it's a convention, we say a pdf-file has a name ending in '.pdf' (also called the extension) and the first bytes stored in this file are 25 50 44 46 (the magic numbers https://en.wikipedia.org/wiki/List_of_file_signatures).

To answer your question more directly: the OS is responsible for reading a file from disk. Java only uses the right system call. This call is implemented in the specific JVM.

PS: If you want to investigate this yourself you can use a hex editor to view every file as its bytes. (Pick your favorite: https://en.wikipedia.org/wiki/Comparison_of_hex_editors) In the editor you will see that a file is really nothing more than bytes.

Toonijn
  • 373
  • 2
  • 9
1

Java does not convert a file into a byte array itself. As @Toonijn mentioned, a program (Java or another language) makes system calls to fetch bytes from the disk, url, memory or other source. It is all about how you want to look at the bytes; whether it is an image, multiple images, some custom file, thread dump, or anything else.

Moreover, there are Objects in Java, and an object can be a presentation for anything: char sequence, stream, byte array, temporary buffer, remote file, etc... - whatever. For example, you know that some file is an image, so you can simply look at these bytes as an image. Example

import java.awt.Image;
import java.io.File;
import java.io.IOException;

File image2 = new File("bishnu.jpg");
Image image = ImageIO.read(image2);

Or you know that some some.data file is a text file with custom extension. So the same, as you know what the file content is, you can just read it.

String content = new String(Files.readAllBytes(Paths.get("some.data")));

The same thing about PDF. All you need is to add libraries (they can be written in Java or other languages; latter will requeire some hacks: python, C++, even bash file).

Another example - excel file.

Workbook workbook = WorkbookFactory.create(new File("yourfile.xlsx"));

Note, if you try to read bytes from a file, and you assume that it is one type, for example, and image, but you process it as another type, for example, excel, then you will read the data incorrectly, or even get errors.

Yan Khonski
  • 9,178
  • 13
  • 52
  • 88