How can I find the number of unique characters in a string?

Question

I have found nothing particular for this purpose.

I am trying to figure out a function that counts each of the characters' occurrences in a string, so that I can pull them out at the end from the length to find how many homogeneous characters are used in that string.

I've tried with nested loop, the first to apply and the second to scan the string and conditionally fulfill the character if it does not appear elsewhere in the string:

size_t CountUniqueCharacters(char *str)
{
    int i,j;
    char unique[CHAR_MAX];
    for(i=strlen(str); i>=0; i--)
    {
        for(j=strlen(str); j>=0; j--)
        {
            if(str[i] != unique[j])
                unique[j] = str[i];
        }
    }
    return strlen(unique);
}

This didn't work well.

This is useful if you are willing to limit someone to type lazy names such as "aaaaaaaaaaaaa".

This really isn't much of a question, it's rather close to "can I have teh codez" which is not generally optimal for this forum. — unwind, Jul 04 '14 at 11:19
I didn't asked for "teh codes" the little title cleary says what is the question. Looking for a function or method.. i haven't said i need a full code. A pseudo-code would work for me too. — Edenia, Jul 04 '14 at 11:21
I don't entirely understand the question (the second sentence makes no sense to me) - are you looking for a method that takes a string and a character and returns the number of occurrences of the character in the string? — Daniel Kleinstein, Jul 04 '14 at 11:23
@Daniel noo.. as the title says "all the character occurrences". Which means something like.. `char* str = "Cannono";` `printf("%i", ccnt(str));` Which returns 4 because in "Cannono" we have 4 different characters. — Edenia, Jul 04 '14 at 11:25
Oh, you're looking for the number of *unique* characters in the string. What have you tried? — Daniel Kleinstein, Jul 04 '14 at 11:26
`for(i=strlen(str); i>=0; i--) { for(j=strlen(str); j>=0; j--) { if(str[i] != unique[j]) unique[j] = str[i]; } } return strlen(unique);` — Edenia, Jul 04 '14 at 11:28

score 7 · Answer 1 · edited Apr 15 '20 at 20:05

7

Here's a simple C++ solution:

int countDistinct(string s) 
{ 

    unordered_map<char, int> m; 

    for (int i = 0; i < s.length(); i++) { 
        m[s[i]]++; 
    } 

    return m.size(); 
}

edited Apr 15 '20 at 20:05

coelhudo

4,162
7
35
53

answered Apr 15 '20 at 15:57

Nishil Shah

71
1
2

You should analyze the complexity of `unordered_map` mechanism, so that the overall complexity does not lead to O(n), I think. – Amir Fo May 30 '21 at 07:04

Daniel Kleinstein · Accepted Answer · 2014-07-04T11:41:57.733

This method has O(n^2) complexity, but it's very possible (though a bit more complex) to do this in O(n).

int CountUniqueCharacters(char* str){
    int count = 0;

    for (int i = 0; i < strlen(str); i++){
         bool appears = false;
         for (int j = 0; j < i; j++){
              if (str[j] == str[i]){
                  appears = true;
                  break;
              }
         }

         if (!appears){
             count++;
         }
    }

    return count;
}

The method iterates over all the characters in the string - for each character, it checks if the character appeared in any of the previous characters. If it didn't, then the character is unique, and the count is incremented.

I did exactly the same thing, but with one logical mistake.. didn't put the condition inside the first loop after the second one. — Edenia, Jul 04 '14 at 11:38

score 2 · Answer 3 · answered Jul 05 '19 at 03:07

I find the following way of counting distinct characters, very simple and in O(n). Here the logic is, just traverse through the character array, and for each character make its count 1, even if it repeats, just override the value with 1 only. After you are done with traversing, just sum all the character occurance.

int count_distinc_char(const char *a){
     int c_arr[MAX_CHAR] = {0};
     int i, count = 0;
     for( i = 0; a[i] != '\0'; i++){
         c_arr[a[i] - 'a'] = 1;
     }    
     for( i = 0; i < MAX_CHAR; i++){
         count += c_arr[i];
     }
     return count;
}

score 2 · Answer 4 · answered Nov 04 '20 at 19:57

2

well you can use a HashSet or unordered_set for the purpose but it has a worst case time complexity of O(N). Hence, its best to use an array of 256 memory locations or arr[256]. This gives the desired output in O(256)~ O(1) time

answered Nov 04 '20 at 19:57

Hash2

31
2

score 1 · Answer 5 · answered Jul 04 '14 at 11:40

Create a linked list to store the characters found in the string and its occurences with the node structure as follow,

struct tagCharOccurence 
{
    char ch;
    unsigned int iCount;
};

Now read all the characters in a string one by one and as you read one character check if it is present in your linked list, if yes then increase its count and if character is not found in linked list then insert a new node with 'ch' set to read character and count initialized to one.

In this way you'll get the count of occurences of each character in single pass only. You can now use the linked list to print the characters as many times as its has been encountered.

score 1 · Answer 6 · answered Sep 04 '18 at 14:08

I just came across this question while looking for some other stuff on Stack Overflow. But I still post a solution which might be helpful to some:

This is also used for implementation of huffman conding here. There you need to know the frequency of each character, so a bit more than what you need.

#include <climits>
const int UniqueSymbols = 1 << CHAR_BIT;
const char* SampleString = "this is an example for huffman encoding";

Left shift operator shifts lhs (i.e. 1) CHAR_BIT times to the left, hence multiplying with 2^8 (on most computers) which is 256, as there are 256 unique symbols in UTF-8

and in your main you have

int main() {
    // Build frequency table
    int frequencies[UniqueSymbols] = {0};
    const char* ptr = SampleString;
    while (*ptr != '\0') {
        ++frequencies[*ptr++];
    }
}

I found it quite minimal and helpful. The only downside is that the size of frequencies is 256 here, uniqueness is then just checking which value is 1.

Thanks for your follow-up. This seems like a variant of a lookup table. Their downsides are usually the huge chunk of memory being used, but are quite fast and easy to implement. — Edenia, Sep 10 '18 at 09:47

score 0 · Answer 7 · answered Apr 15 '20 at 15:59

Here is source code of the C Program to Count the Number of Unique Words. The C program is successfully compiled and run on a Linux system

int i = 0, e, j, d, k, space = 0;

char a[50], b[15][20], c[15][20];



printf("Read a string:\n");

fflush(stdin);

scanf("%[^\n]s", a);

for (i = 0;a[i] != '\0';i++)        //loop to count no of words

{

    if (a[i] =  = ' ')

        space++;

}

i = 0;

for (j = 0;j<(space + 1);i++, j++)    //loop to store each word into an 2D array

{

    k = 0;

    while (a[i] != '\0')

    {

        if (a[i] == ' ')

        {

            break;

        }

        else

        {

            b[j][k++] = a[i];

            i++;

        }

    }

    b[j][k] = '\0';

}

i = 0;

strcpy(c[i], b[i]);

for (e = 1;e <= j;e++)        //loop to check whether the string is already present in the 2D array or not

{

    for (d = 0;d <= i;d++)

    {

        if (strcmp(c[i], b[e]) == 0)

            break;

        else

        {

            i++;

            strcpy(c[i], b[e]);

            break;

        }

    }

}

printf("\nNumber of unique words in %s are:%d", a, i);

return 0;

How can I find the number of unique characters in a string?

7 Answers7