This assignment is a good exercise to optimizing the I/O.
The file will be read into a block of memory, a.k.a. buffer.
Let's use an array for the frequency counting, as it's an optimal technique.
#include <iostream>
#include <fstream>
// Declare the size of the buffer.
static const unsigned int BUFFER_SIZE = 1024*1024;
int main()
{
// Declare the buffer as "static" to use a different memory area.
static char buffer[BUFFER_SIZE];
/* Use the same file opening as in your original code. */
while (file.read(buffer, BUFFER_SIZE))
{
const unsigned int characters_read = file.gcount();
for (unsigned int i = 0; i < characters_read; ++i)
{
const char ch = buffer[i];
if (ch >= 'A' && ch <= 'Z')
{
++upperCounts[ch - 'A'];
}
else
{
if (ch >= 'a' && ch <= 'z')
{
++lowerCounts[ch - 'a'];
}
}
}
}
/* Insert code to print frequencies */
return 0; // Indicate success to the operating system.
}
In the above code, a block of characters is read into memory using the read()
method. Reading in blocks is always faster than reading one character at a time. Although the C++ streaming facilities may buffer the input already, we're taking control so we can set the buffer size.
The buffer is then searched for alphabetic characters and the frequency counts updated. Searching in memory is always faster than searching a file.
Edit 1: Optimizing the calculation
In the code above and in the OP's code, most of the execution time is spent calculating the frequency (by using compare's).
We can save more time by moving the specialization to after the input and counting the frequency of all characters.
unsigned int frequencies[256] = {0}; // Possible range of characters.
while (file.read(buffer, BUFFER_SIZE))
{
const unsigned int characters_read = file.gcount();
for (unsigned int i = 0; i < characters_read; ++i)
{
++frequencies[i];
}
}
// Now print out the frequencies:
for (char ch = 'A'; ch <= 'Z'; ++ch)
{
std::cout << ch << ": " << frequencies[ch] << "\n";
}
for (char ch = 'a'; ch <= 'z'; ++ch)
{
std::cout << ch << ": " << frequencies[ch] << "\n";
}
In the above code, the input loop has been simplified to one purpose: calculating frequencies. No need to check for ranges; range checking is performed after the input.
After input, all the frequencies are output for the alphabetic characters, and only the alphabetic characters.
This example shows that the program can run faster by making operation general during the most frequently executed section. The specialization or details are performed after or outside the high performance section.