-6

I recently did this question

Specification:

Input Format The first line contains the number of test cases, T. Next, T lines follow each containing a long string S.

Output Format For each long string S, display the number of times SUVO and SUVOJIT appears in it.

I wrote the following code for this :

#include <bits/stdc++.h>
using namespace std;

int main() {
    int t;
    cin >> t;

    while (t--) {
        int suvo = 0;
        int suvojit = 0;
        string s;
        cin >> s;

        for (int i = 0; i <= s.size() - 7; i++) {
            if (s.substr(i, 7) == "SUVOJIT")
                suvojit++;
        }
        for (int i = 0; i <= s.size() - 4; i++) {
            if (s.substr(i, 4) == "SUVO")
                suvo++;
        }

        cout << "SUVO = " << suvo - suvojit << ", SUVOJIT = " << suvojit << "\n";
    }
    return 0;
}

The code about gave out of bounds exception for substr() function for this test case:

15
RSUVOYDSUVOJITNSUVOUSUVOJITESUVOSUVOSGSUVOKSUVOJIT
SUVOJITWSUVOSUVOJITTSUVOCKSUVOJITNSUVOSUVOJITSUVOJITSUVOSUVOSUVOJITTSUVOJ
SUVOSUVOSUVOJITASUVOJITGCEBISUVOJITKJSUVORSUVOQCGVHRQLFSUVOOHPFNJTNSUVOJITKSSUVO
SUVOJITSUVOJITJGKSUVOJITISUVOJITKJLUSUVOJITUBSUVOX
MMHBSUVOFSUVOFMSUVOJITUMSUVOJITPSVYBYPMCSUVOJIT
OASUVOSUVOJITSUVOSTDYYJSUVOJITSUVOJITSUVO
RLSUVOCPSUVOJITYSUVOSUVOOGSUVOOESUVOJITMSUVO
WVLFFSUVOJITSUVOVSUVORLESUVOJITPSUVOJITSUVO
RSUVOSUVOJITQWSUVOUMASUVOSUVOJITXNNRRUNUSUVOJIT
HYLSSUVOSUVOSUVOJITPOSUVOJIT
DGMUCSSSUVOJITMJSUVOHSUVOCWTGSUVOJIT
OBNSSUVOYSUVOSUVOJITSUVOJITRHFDSUVODSUVOJITEGSUVOSUVOSUVOJITSUVOSUVOJITSSUVOSUVOSUVOSSUVOJIT
AG
NSUVOJITSUVOSUVOJIT
CGJGDSUVOEASUVOJITSGSUVO

However, when instead of using the s.size() function, I converted the string into a char constant and took the length of it using strlen, then the code caused no error and everything went smoothly.

So, my question is... Why did this happen?

This is my working code with the change:

#include <bits/stdc++.h>
using namespace std;

int main() {
    int t;
    cin >> t;

    while (t--) {
        int suvo = 0;
        int suvojit = 0;
        string s;
        cin >> s;
        int le = strlen(&s[0]);
        for (int i = 0; i <= le - 7; i++) {
            if (s.substr(i, 7) == "SUVOJIT")
                suvojit++;
        }
        for (int i = 0; i <= le - 4; i++) {
            if (s.substr(i, 4) == "SUVO")
                suvo++;
        }

        cout << "SUVO = " << suvo - suvojit << ", SUVOJIT = " << suvojit << "\n";
    }
    return 0;
}
chqrlie
  • 98,886
  • 10
  • 89
  • 149
Vonn
  • 43
  • 1
  • 7
  • 3
    this is no C... – Sourav Ghosh Nov 05 '16 at 17:01
  • @SouravGhosh - strlen is generally used for c style strings, that's why i put it in c and c++ sections – Vonn Nov 05 '16 at 17:03
  • 2
    On your algoritm: 1. If you look for "SUVO", then only if you find a match is it worth considering looking for "SUVOJIT" at that offset. 2. Consider using std::find to repeatedly find "SUVO". 3. Look up "Boyer-Moore" (I'm not sure if "SUVO" is quite long enough to be worth it.) – Martin Bonner supports Monica Nov 05 '16 at 17:19
  • 1
    Off topic: [Why should I not `#include `?](http://stackoverflow.com/questions/31816095/why-should-i-not-include-bits-stdc-h) and [Why is “using namespace std” considered bad practice?](http://stackoverflow.com/questions/1452721/why-is-using-namespace-std-considered-bad-practice) – user4581301 Nov 05 '16 at 17:20

2 Answers2

3

In one case, you use size_t, in the other case you use int.

If the length is for example 6 characters, then s.size () - 7 is not -1, but one huge number and everything goes wrong. But if you write int len = strlen (...), then len - 7 is indeed -1 and everything is fine.

When I see a number subtracted from size_t, that's an immediate red flag. Write "i + 7 ≤ s.size()", not "i ≤ s.size() - 7".

gnasher729
  • 47,695
  • 5
  • 65
  • 91
1

First of all, in my testing your second leads to a problem as well:

enter image description here

Second, especially with older compilers (well, libraries, really) this can be horrendously inefficient, creating a huge number of temporary strings that you only use to compare with another string1.

So, let's consider how the job should be done instead. std::string has a member named find for situations like this. It returns the position of one string inside another, or std::string::npos if there is none. It allows you to specify a starting position at which to begin searching, when you don't want to start from the beginning.

We also, of course, have two instances of essentially identical code, once to search for SUVO, the other to search for SUVOJIT. The code would be much better off with the search code moved into a function, so we only have the search code in one place.

int count_pos(std::string const &haystack, std::string const &needle) {
    size_t pos = 0;
    int ret = 0;

    while ((pos = haystack.find(needle, pos)) != std::string::npos) {
        ++ret;
        ++pos;
    }
    return ret;
}

Note that this also eliminates quite a bit more messy "stuff" like having to compute the maximum possible position at which at match could take place.


1. Why does compiler/library age matter? Older libraries often used a COW string that dynamically allocated storage for every string. More recent ones typically include what's called a "short string optimization", where storage for a short string is allocated inside the string object itself, avoiding the dynamic allocation.

Jerry Coffin
  • 437,173
  • 71
  • 570
  • 1,035