difference between string size() function and strlen in this particular case

Question

I recently did this question

Specification:

Input Format The first line contains the number of test cases, T. Next, T lines follow each containing a long string S.

Output Format For each long string S, display the number of times SUVO and SUVOJIT appears in it.

I wrote the following code for this :

#include <bits/stdc++.h>
using namespace std;

int main() {
    int t;
    cin >> t;

    while (t--) {
        int suvo = 0;
        int suvojit = 0;
        string s;
        cin >> s;

        for (int i = 0; i <= s.size() - 7; i++) {
            if (s.substr(i, 7) == "SUVOJIT")
                suvojit++;
        }
        for (int i = 0; i <= s.size() - 4; i++) {
            if (s.substr(i, 4) == "SUVO")
                suvo++;
        }

        cout << "SUVO = " << suvo - suvojit << ", SUVOJIT = " << suvojit << "\n";
    }
    return 0;
}

The code about gave out of bounds exception for substr() function for this test case:

15
RSUVOYDSUVOJITNSUVOUSUVOJITESUVOSUVOSGSUVOKSUVOJIT
SUVOJITWSUVOSUVOJITTSUVOCKSUVOJITNSUVOSUVOJITSUVOJITSUVOSUVOSUVOJITTSUVOJ
SUVOSUVOSUVOJITASUVOJITGCEBISUVOJITKJSUVORSUVOQCGVHRQLFSUVOOHPFNJTNSUVOJITKSSUVO
SUVOJITSUVOJITJGKSUVOJITISUVOJITKJLUSUVOJITUBSUVOX
MMHBSUVOFSUVOFMSUVOJITUMSUVOJITPSVYBYPMCSUVOJIT
OASUVOSUVOJITSUVOSTDYYJSUVOJITSUVOJITSUVO
RLSUVOCPSUVOJITYSUVOSUVOOGSUVOOESUVOJITMSUVO
WVLFFSUVOJITSUVOVSUVORLESUVOJITPSUVOJITSUVO
RSUVOSUVOJITQWSUVOUMASUVOSUVOJITXNNRRUNUSUVOJIT
HYLSSUVOSUVOSUVOJITPOSUVOJIT
DGMUCSSSUVOJITMJSUVOHSUVOCWTGSUVOJIT
OBNSSUVOYSUVOSUVOJITSUVOJITRHFDSUVODSUVOJITEGSUVOSUVOSUVOJITSUVOSUVOJITSSUVOSUVOSUVOSSUVOJIT
AG
NSUVOJITSUVOSUVOJIT
CGJGDSUVOEASUVOJITSGSUVO

However, when instead of using the s.size() function, I converted the string into a char constant and took the length of it using strlen, then the code caused no error and everything went smoothly.

So, my question is... Why did this happen?

This is my working code with the change:

#include <bits/stdc++.h>
using namespace std;

int main() {
    int t;
    cin >> t;

    while (t--) {
        int suvo = 0;
        int suvojit = 0;
        string s;
        cin >> s;
        int le = strlen(&s[0]);
        for (int i = 0; i <= le - 7; i++) {
            if (s.substr(i, 7) == "SUVOJIT")
                suvojit++;
        }
        for (int i = 0; i <= le - 4; i++) {
            if (s.substr(i, 4) == "SUVO")
                suvo++;
        }

        cout << "SUVO = " << suvo - suvojit << ", SUVOJIT = " << suvojit << "\n";
    }
    return 0;
}

@SouravGhosh - strlen is generally used for c style strings, that's why i put it in c and c++ sections — Vonn, Nov 05 '16 at 17:03
On your algoritm: 1. If you look for "SUVO", then only if you find a match is it worth considering looking for "SUVOJIT" at that offset. 2. Consider using std::find to repeatedly find "SUVO". 3. Look up "Boyer-Moore" (I'm not sure if "SUVO" is quite long enough to be worth it.) — Martin Bonner supports Monica, Nov 05 '16 at 17:19
Off topic: [Why should I not `#include `?](http://stackoverflow.com/questions/31816095/why-should-i-not-include-bits-stdc-h) and [Why is “using namespace std” considered bad practice?](http://stackoverflow.com/questions/1452721/why-is-using-namespace-std-considered-bad-practice) — user4581301, Nov 05 '16 at 17:20

score 3 · Accepted Answer · answered Nov 05 '16 at 17:08

3

In one case, you use size_t, in the other case you use int.

If the length is for example 6 characters, then s.size () - 7 is not -1, but one huge number and everything goes wrong. But if you write int len = strlen (...), then len - 7 is indeed -1 and everything is fine.

When I see a number subtracted from size_t, that's an immediate red flag. Write "i + 7 ≤ s.size()", not "i ≤ s.size() - 7".

answered Nov 05 '16 at 17:08

gnasher729

47,695
5
65
91

2

Recommend an edit to make sure readers know that `int` is signed and `size_t` isn't. – user4581301 Nov 05 '16 at 17:16
In addition, if the input string is "S", then `s.substr(0, 7)` will give an out-of-range error; the corresponding `memcmp` will be undefined behaviour - but will probably silently work. – Martin Bonner supports Monica Nov 05 '16 at 17:20
@MartinBonner - Well that would not happen in this implementation as, if the input string is just "S", then the for loop would not even run as ( i <= le - 7) would give false – Vonn Nov 06 '16 at 17:43
@Vonn I beg your pardon: Yes, you carefully *don't* run off the end of the string - my mistake. – Martin Bonner supports Monica Nov 06 '16 at 17:47

score 1 · Answer 2 · answered Nov 05 '16 at 18:08

First of all, in my testing your second leads to a problem as well:

Second, especially with older compilers (well, libraries, really) this can be horrendously inefficient, creating a huge number of temporary strings that you only use to compare with another string¹.

So, let's consider how the job should be done instead. std::string has a member named find for situations like this. It returns the position of one string inside another, or std::string::npos if there is none. It allows you to specify a starting position at which to begin searching, when you don't want to start from the beginning.

We also, of course, have two instances of essentially identical code, once to search for SUVO, the other to search for SUVOJIT. The code would be much better off with the search code moved into a function, so we only have the search code in one place.

int count_pos(std::string const &haystack, std::string const &needle) {
    size_t pos = 0;
    int ret = 0;

    while ((pos = haystack.find(needle, pos)) != std::string::npos) {
        ++ret;
        ++pos;
    }
    return ret;
}

Note that this also eliminates quite a bit more messy "stuff" like having to compute the maximum possible position at which at match could take place.

^{1. Why does compiler/library age matter? Older libraries often used a COW string that dynamically allocated storage for every string. More recent ones typically include what's called a "short string optimization", where storage for a short string is allocated inside the string object itself, avoiding the dynamic allocation.}

difference between string size() function and strlen in this particular case

2 Answers2