1

I have this java application that should load and print data with french special characters from a .dbf or dBase3 file but it doesn't work; the characters are not showing.

I asked this question thinking that the problem was related only to the printing, but if you see the comments you can understand that i figured out that the problem was related to the database and not to the printing, since when adding a special character to my JTextPane, it prints normally... and i tried changing the character set of the textPane but still the same problem.

Also, to complicate even more the question for those out there that love solving difficult problems, when i use MS Access to open my .dbf file, the characters are there. So i'm thinking, the error probably happens while loading the data from the database... By the way, to do data fetching, i'm using this API called xBaseJ that doesn't use sql, but it's own implementation.

I hope i have given all the necessary details and also i'd really appreciate any help, really.. any idea could help me figure out the solution (and the problem too).

Edit Now, with the Answer of Ethan Furman, we know that the problem is related to the encoding of database wich is Plain old Ascii and it's not related to the xBaseJ API.

Now, the question should be: Is it possible to change the encoding of a dBase database? And how can i do it? Thank you @Ethan Furman, And thanks in advance for any help related to this question.

Community
  • 1
  • 1
user3610008
  • 79
  • 2
  • 10

4 Answers4

1

dbf files all use encodings, and not utf-8. Which encoding was used is a part of the metadata stored in the first few bytes of the file. You are facing one of two scenarios:

  • The encoding is stored properly in the dbf file

    If this is happening then MS Acess is properly using that information to decode the raw dbf data into unicode, and xBaseJ is not.

  • The encoding is not stored properly in the file

    If this is happening then MS Access is getting a lucky guess on the encoding, and xBaseJ is refusing to guess.

You need to find a tool that will examine the dbf file and tell you which encoding was stored in it. If you don't know of any, and you don't mind having Python on your machine, you can use a dbf module I wrote to figure it out:

import dbf

table = dbf.Table('/path/to/some_table.dbf')
print(table)

which will print out the encoding, number of fields, size of a record, field names, etc.

Note on installing (which can be such a pain)

Ideally, you should be able to install pip, and then do a pip install enum34 dbf --upgrade which will put the latest versions of those two libraries in the correct spot on your system.

Failing that, you'll want to grab both enum34 and dbf from PyPI and put enum.py and dbf.py in your Python's site-packages folder:

c:\python27\lib\site-packages  # I think, it's been a while since I used Windows

Update

If, after doing all that, you discover that the codepage/encoding was never set in the file (it's amazing how often this happens), then you can also use dbf to change it (if you know what it should be):

table.open()
table.codepage = dbf.CodePage('cp1252') # for example
table.close()
Community
  • 1
  • 1
Ethan Furman
  • 52,296
  • 16
  • 127
  • 201
  • I already have python and i downloaded you module, i'm guessing i should put it in the "lib" directory in python's directory on my hard drive, which i did, and since it's the first time i use python i'd appreciate some help. I tried your command line but this error is shown: 'Traceback(most recent call last): File "(stdin)", line 1, in (module) *new line* File "C:\Python27\lib\dbf.py", line 67, in (module) *new line* from enum, import Enum, IntEnum *new line* ImportError: No module named enum' (Sorry for the bad formatting :p ) – user3610008 Jun 26 '14 at 22:38
  • Well i did all of that, thank you very much... Now the Encoding or the codePage is ascii or as you put it in your module: Plain ol' ascii haha! So now we know that the problem is in the charset of the database and not an xBaseJ related problem.. Which is kind of a relief, now do you know a way to change the encoding of a .dbf file? – user3610008 Jun 27 '14 at 10:14
  • That's great man. i'd keep this in mind next time i try do something with a dbf file. Meanwhile, as mentioned in my answer (submitted below) i found a solution, and now i find that yours is simpler. So have a look at my answer, and again, thank you a lot. your answer really helped! – user3610008 Jun 27 '14 at 21:37
1

Finally, i found the answer...

First of all and as mentioned, thanks to Ethan Furman, i figured out that the problem was related to the encoding of the dbf Database and not to the xBaseJ API.

Then i had to search for hours for a tool that can help me change the charset of the database which is Ascii. I found out that OpenOffice from Apache does that but the problem is that i don't have OpenOffice on my windows, and i tried to download it 5 or 6 times but every time it is interrupted since my internet connection is really really bad (it downloads at the speed of 6 to 7Kbs) and the .exe file is 209 mB. So i had to search even more for another software to do the needed task.. And i don't how i found this DBF Commander that does more than just changing the charset. Anyways, downloaded the trial version that does everything but shows a window telling you to buy it everytime you do anything :D.

Finally, i changed the charset from Ascii (850 International MS-DOS or something) to 1252 Windows Ansi... aaaaand boom! it works!

I still think that there's a difference between the terms "codePage" "Charset" and "encoding" and i'm using them the same.. But at least now i know they exist, and that's a new thing that i learned.

Anyways, thank you again Ethan Furman, and i'd like to thank Google also for making this possible :D!

user3610008
  • 79
  • 2
  • 10
0

I could be wrong but try setting your database to UTF-8. I'm guessing this problem has to do with character encoding.

Arno_Geismar
  • 2,229
  • 1
  • 12
  • 28
  • Probably yes, but at the same time and as i mentioned, when opening the database with MS Access (which is possible since Access supports dbf files) the characters are shown normally. I'm not expert in character sets, but logically, if the database wasn't set to UTF-8, Access wouldn't show the characters (I'm not sure of what i'm saying, it's just logic or at least my logic :p ) but i'll try that anyway, i'll look up "setting dbf files to UTF-8" since i don't know how to do it, and i'll give you feedback. – user3610008 Jun 26 '14 at 15:18
  • 1
    Yes i did actually, there's an accepted answer to this question, take a look at it. – user3610008 Jul 16 '14 at 13:28
0

You can try this library: xbase4j. As I learned, in many DBF files the "language" flag is set incorrectly or is not set at all. To solve this problem, just specify the the proper language before opening the DBF file. Something like this:

new XBase().withLanguage(Language.WinANSI).open(new File("..."));

Feel free to contact me if you need some help.

Regards,

Yasas
  • 82
  • 2