Vigenère cipher in Java for all UTF-8 characters

Question

I have this simple function for encrypting strings via Vigenère in Java. I omitted the decryption as this is just a "-" instead of the "+" in the line where the new value is calculated.

But this function works only for the normal alphabet A-Z. How can I change the function so that it supports lowercase letters as well as uppercase letters and all other UTF-8 chars?

public static String vigenere_encrypt(String plaintext, String key) {
    String encryptedText = "";
    for (int i = 0, j = 0; i < plaintext.length(); i++, j++) {
        if (j == key.length()) { j = 0; } // use key again if end reached
        encryptedText += (char) ((plaintext.charAt(i)+key.charAt(j)-130)%26 + 65);
    }
    return encryptedText;
}

Thank you very much for your help!

What kind of characters do you want to support? UTF-8 is an encoding which supports all unicode characters, many of which won't even print in your favourite font. Furthermore, the Vigenère cipher is not really suited for any kind of modern cryptography...why would you want to apply it to a modern encoding? Of course, just supporting 52 uppercase/lowercase chars is easy... — Maarten Bodewes, Apr 23 '12 at 20:21
If possible, I'd like to support ALL unicode characters to offer best compatibility. Missing characters in fonts is no problem as the encrypted data will never be printed but only stored. I would like to use it because it's simple, fast and can easily be implemented in various programming languages (with interchangeable results). — caw, Apr 23 '12 at 20:57
Interchangable results using unicode, are you sure about that? You could go and interpret each character as 32 bit unicode code point. The key would then be a random string of 32 bit numbers. You are however *much* better off describing UTF-8 for the plain text and Base64 encoding for the cipher text, and then encode using any simple cipher (a XOR cipher if you really *must* make it simple and broken). — Maarten Bodewes, Apr 23 '12 at 21:04
Thank you very much for your explanations, owlstead! Unfortunately, I don't really get yet why a full unicode alphabet is not possible. But a-z, A-Z and a few definable characters will probably do. I just thought it would be as easy to do this for full UTF-8 as it would be for full ASCII. I don't want to be the customer forcing you to give him what he wants instead of what he needs ;) — caw, Apr 23 '12 at 22:33
Unicode defines code points, which is an ever growing "alphabet". This can be encoded using e.g. UTF-8. What you need is to define a group to do the modulus calculations on. This won't happen for Unicode too easily. I'll show you another answer that uses two groups, high and low, but I would strongly urge you to look to encrypting character encodings (bytes) instead of characters, even for simple obfuscation techniques. — Maarten Bodewes, Apr 23 '12 at 22:37
It's simple and fast, yes, but it's also trivially defeated. Please don't use this for anything that matters. — Nick Johnson, Apr 24 '12 at 01:19
Thanks for your comments, you two. You're right, AES (for example) would be safer. But actually, I only want it for obfuscation, anyway. The point is: When I give my mobile application to the client, it comes with the SQLite database. I don't want to enable everyone to simply copy the SQLite file and have the database ready to work. So I want to encrypt all entries in the database with Vigenère (key stored in code) in order to obfuscate it. So Vigenère will be a good solution, I guess. — caw, Apr 24 '12 at 13:38
If your database can handle bigger strings I would still seriously advise you to use `Base64(Enc_AES_CBC(UTF-8(string)))`. — Maarten Bodewes, Apr 24 '12 at 13:46
I try to avoid this approach as AES-encrypted strings need far more disk space and it's hard to implement the AES encryption in different programming languages so that you have interchangeable results. — caw, Apr 24 '12 at 13:54

Maarten Bodewes · Answer 1 · 2012-04-23T22:25:44.553

Well, you asked for it and I felt like puzzling, but print out the cipher text and you will know what you just asked for...

public static String vigenereUNICODE(String plaintext, String key, boolean encrypt) {

    final int textSize = plaintext.length();
    final int keySize = key.length();

    final StringBuilder encryptedText = new StringBuilder(textSize);
    for (int i = 0; i < textSize; i++) {
        final int plainNR = plaintext.codePointAt(i);
        final int keyNR = key.codePointAt(i % keySize);

        final long cipherNR;
        if (encrypt) {
            cipherNR = ((long) plainNR + (long) keyNR) & 0xFFFFFFFFL;
        } else {
            cipherNR = ((long) plainNR - (long) keyNR) & 0xFFFFFFFFL;
        }

        encryptedText.appendCodePoint((int) cipherNR);
    }

    return encryptedText.toString();
}

EDIT: Please don't ever use in production code, as I haven't got a clue if the code points can indeed be encoded/decoded. Not all points have been defined, as far as I know, and the standard is a moving target.

Thank you! I don't want a code that puzzles its creator, of course ;) I just don't understand why the code that I posted in the question can't be expanded to cover all UTF-8 characters as it would be possible for ASCII characters. UTF-8 must have a finite amount of characters, because MySQL (for example) offers encodings such as `utf8_general_ci` as well?! — caw, Apr 24 '12 at 14:11

score 4 · Answer 2 · answered Apr 24 '12 at 23:39

If full unicode support is not possible and you have to define your list of valid characters, anyway, why not just use a function like this?

public static String vigenere_cipher(String plaintext, String key, boolean encrypt) {

    String alphabet = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ ,.-"; // including some special chars
    final int alphabetSize = alphabet.length();
    final int textSize = plaintext.length();
    final int keySize = key.length();
    final StringBuilder encryptedText = new StringBuilder(textSize);

    for (int i = 0; i < textSize; i++) {
        final char plainChar = plaintext.charAt(i); // get the current character to be shifted
        final char keyChar = key.charAt(i % keySize); // use key again if the end is reached
        final int plainPos = alphabet.indexOf(plainChar); // plain character's position in alphabet string
        if (plainPos == -1) { // if character not in alphabet just append unshifted one to the result text
            encryptedText.append(plainChar);
        }
        else { // if character is in alphabet shift it and append the new character to the result text
            final int keyPos = alphabet.indexOf(keyChar); // key character's position in alphabet string
            if (encrypt) { // encrypt the input text
                encryptedText.append(alphabet.charAt((plainPos+keyPos) % alphabetSize));
            }
            else { // decrypt the input text
                int shiftedPos = plainPos-keyPos;
                if (shiftedPos < 0) { // negative numbers cannot be handled with modulo
                    shiftedPos += alphabetSize;
                }
                encryptedText.append(alphabet.charAt(shiftedPos));
            }
        }
    }

    return encryptedText.toString();

}

This should be a very short and working version. And the alphabet can easily be stored in a string that can always be extended (which results in different ciphertexts).

Sure, this works well, but it requires a slightly expensive lookup per character. IRS shorter and easier to understand. Both methods are equally valid. — Maarten Bodewes, Apr 25 '12 at 13:12
This solution works great. For my task, I gave a custom character of all possible keys on a computer keyboard rather that just a-z or A-Z. With my character set, this solution works good. — goyalshub1509, Feb 17 '19 at 23:39

score 3 · Accepted Answer · answered Apr 23 '12 at 22:41

Another answer, that does do the Vigenere cipher on upper & lower case characters, simply inserting the other characters. Use this technique to create multiple groups of characters to encode.

public static String vigenere(String plaintext, String key, boolean encrypt) {

    final int textSize = plaintext.length();
    final int keySize = key.length();

    final int groupSize1 = 'Z' - 'A' + 1; 
    final int groupSize2 = 'z' - 'a' + 1;
    final int totalGroupSize = groupSize1 + groupSize2;

    final StringBuilder encryptedText = new StringBuilder(textSize);
    for (int i = 0; i < textSize; i++) {
        final char plainChar = plaintext.charAt(i);

        // this should be a method, called for both the plain text as well as the key
        final int plainGroupNumber; 
        if (plainChar >= 'A' && plainChar <= 'Z') {
            plainGroupNumber = plainChar - 'A';
        } else if (plainChar >= 'a' && plainChar <= 'z') {
            plainGroupNumber = groupSize1 + plainChar - 'a';
        } else {
            // simply leave spaces and other characters
            encryptedText.append(plainChar);
            continue;
        }

        final char keyChar = key.charAt(i % keySize);
        final int keyGroupNumber; 
        if (keyChar >= 'A' && keyChar <= 'Z') {
            keyGroupNumber = keyChar - 'A';
        } else if (keyChar >= 'a' && keyChar <= 'z') {
            keyGroupNumber = groupSize1 + keyChar - 'a';
        } else {
            throw new IllegalStateException("Invalid character in key");
        }

        // this should be a separate method
        final int cipherGroupNumber;
        if (encrypt) {
            cipherGroupNumber = (plainGroupNumber + keyGroupNumber) % totalGroupSize;
        } else {
            // some code to go around the awkward way of handling % in Java for negative numbers
            final int someCipherGroupNumber = plainGroupNumber - keyGroupNumber;
            if (someCipherGroupNumber < 0) {
                cipherGroupNumber = (someCipherGroupNumber + totalGroupSize);
            } else {
                cipherGroupNumber = someCipherGroupNumber;
            }
        }

        // this should be a separate method
        final char cipherChar;
        if (cipherGroupNumber < groupSize1) {
            cipherChar = (char) ('A' + cipherGroupNumber);
        } else {
            cipherChar = (char) ('a' + cipherGroupNumber - groupSize1);
        }
        encryptedText.append(cipherChar);
    }

    return encryptedText.toString();
}

Again, this is unsafe code as the cipher used has been broken for ages. Don't use too many 'A' characters in your keys :) But the character encoding should be sound.

I know that the cipher can be defeated - although it can also serve as a one-time pad. But as it is only for obfuscation, it's okay. With too many A's it would almost be the Caesar cipher, wouldn't it? — caw, Apr 24 '12 at 14:09
Oh yes, of course - how embarrassing I didn't understand directly :) Last question: How do I add another group of characters, for example the German special chars `ÄÖÜäöüß` to this function? — caw, Apr 24 '12 at 22:48
Please take a look at my answer - shouldn't this do the same while being shorter? — caw, Apr 24 '12 at 23:40

Vigenère cipher in Java for all UTF-8 characters

3 Answers3

Linked