1

I have this assignment that I just can't figure out. I want my function to get a line from an html file and extract an email from it. Then Split the email into email, username, and domain. Then i want to have a third function to get the next email in the html file.

void get_line_emails(ifstream &in_stream, ofstream &out_stream, string email[], string users[], string domain[])
{
    int location, end;
    string mail;    
    getline(in_stream, mail);
    location = mail.find("mailto:");
    end = mail.find(">");
    mail = mail.substr(location, (end - 1));
    cout << mail << endl;
}

void get_next_email(ifstream &in_stream, string mail)
{
        getline(in_stream, mail);
        int location = mail.find("mailto:");
        int end = mail.find(">");
        mail = mail.substr(location, (end - 1));

}

void split_email(string email[], string domain[], string users)
{
    int count = 300;
    string mail;
    for (int i = 1; i < count; ++i) //For loop to input stream.
        {
            mail = email[i];
            int location = mail.find("@");
            int end = mail.find(">");
            string domain[i] = mail.substr(location, (end - 1));
            string users[i] = mail.substr(0, location);
        }
}

I also get this error when I run the program:

terminate called after throwing an instance of 'std::out_of_range'
  what():  basic_string::substr: __pos (which is 4294967295) > this->size() (which is 244)
Abort (core dumped)

If it helps heres my main function:

int main()
{
    string email[1000];
    string users[1000];
    string domain[1000];
    int count = 300;
    string filename;

    ifstream in_stream;
    ofstream out_stream;
    cout << "Enter input filename: " << endl;
    cin >> filename; //Input of filename.
    in_stream.open(filename.c_str()); //Opening the input file for population and other information.
        if (in_stream.fail()) //Checking to see if file opens.
        {
            cout << "Error opening input/output files" << endl; //Telling user file isn't opening.
            exit(1); //Exiting program.
        }

    out_stream.open("Emails.txt");//If it does not exist it will not be created. If it exists it will be overwritten.
    out_stream << "Email " << right << setw(20) << "User " << right << setw(20) << "Domain" << endl;
    out_stream << "_______________________________________________________________________________" << endl;
    get_line_emails(in_stream, out_stream, email, users, domain);
    //split_email(email, domain, users);
    sort(email, users, domain, count);
    in_stream.close(); //Closing the in stream.
    out_stream.close(); //Closing the out stream.

    cout << "A new file Emails has been created with the emails extracted. Thank you." << endl; //End message.

    return 0;
}

Part of the HTML file I am inputting:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> <!-- Content Copyright Ohio University Server ID: 2-->
<!-- Page generated 2016-03-22 14:55:21 by CommonSpot Build 9.0.3.119 (2015-08-14 15:00:01) -->
<!-- JavaScript & DHTML Code Copyright &copy; 1998-2015, PaperThin, Inc. All Rights Reserved. --> <head>
<meta name="Description" id="Description" content="Faculty" />
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<meta name="Keywords" id="Keywords" content="engineering" />
<meta name="Generator" id="Generator" content="CommonSpot Content Server Build 9.0.3.119" />
<link rel="stylesheet" href="/style/ouws_0111_allin1_nonav.css" type="text/css" />
<link rel="stylesheet" href="/engineering/upload/engineeringEV.css" type="text/css" />
<link rel="stylesheet" href="/engineering/upload/gridpak.css" type="text/css" />
<style type="text/css">
.mw { color:#000000;font-family:Verdana,Arial,Helvetica;font-weight:bold;font-size:xx-small;text-decoration:none; }
a.mw:link   {color:#000000;font-family:Verdana,Arial,Helvetica;font-weight:bold;font-size:xx-small;text-decoration:none;}
a.mw:visited    {color:#000000;font-family:Verdana,Arial,Helvetica;font-weight:bold;font-size:xx-small;text-decoration:none;}
a.mw:hover  {color:#0000FF;font-family:Verdana,Arial,Helvetica;font-weight:bold;font-size:xx-small;text-decoration:none;}
</style> <script type="text/javascript">
<!--
var gMenuControlID = 0;
var menus_included = 0;
var jsDlgLoader = '/engineering/about/people/loader.cfm';
var jsSiteID = 1;
var jsSubSiteID = 6148;
var js_gvPageID = 2177477;
var jsPageID = 2177477;
var jsPageSetID = 0;
var jsPageType = 0;
var jsControlsWithRenderHandlers = ",1366057,1407941,1408984,1409120,1409220,1463564,1653027,1464282,1484855,1663987,1703445,1714178,1719109,1716274,1719109,1719109,1722161,1748941,1743237,1767756,1771704,1240950,1795856,1799077,1806233,1814378,1814378,1814378,36,1156323,958270,959997,36,1239784,1239535,1240103,1264495,1264559,1240832,1241026,1268776,1269019,1365662,1365798,1367666,1367112,1367146,1403322,1236239,1644435,1707482,36,1707482,1708185,1708185,1707846,1718301,1718356,1722082,1735273,1156092,1736675,1738340,1758445,1487747,1740183,1750814,1755341,36,4,1241075,1320447,1410344,1440455,1462605,1463564,1642797,1644920,1644955,1659254,1656252,1707459,1692320,1290294,1705469,1705596,1707846,1708163,1708367,1719109,1719109,1719109,1728460,1718356,1706218,1725200,1739433,1193755,1782561,1806244,1781609,1783821,1784445,1783821,1788664,1750814,1781533,1781788,1812661,1810778,1822088,1644219,39,36,36,438722,443887,523857,542895,36,867909,671210,733944,1074794,671213,671222,671225,671231,671234,1190981,1190914,1190943,1193755,1236239,1239497,1280404,1284325,860732,860741,1080236,671204,1237273,671216,671219,671228,671237,671207,1190973,1243855,1264544,1264564,1241172,1267910,1240840,1240849,1241220,1264699,1241365,1264571,1289737,8,1290184,1321465,1322500,1363024,1365670,1365954,1365998,1366014,2214456,2068897,1837521,1190931,1190931,2239453,1992371,1967400,1992371,1808005,1792195,1792195,1156323,1716646,1967400,1763595,1080236,1971121,1960374,1290151,2007514,2013290,2012663,2012302,2012026,2012663,2021773,1191128,426028,1808005,2108357,426028,36,36,36,2145522,2145522,2186158,1792195,1827509,1827486,1827486,1840641,1843869,1843869,1843879,1843879,1827509,1827486,635375,1190931,1853586,1854295,1854509,1854614,1855117,1855125,1859942,1232520,996841,999747,1074782,801933,1156092,1231112,1240950,1264518,1264536,1240828,1241280,1241033,1241322,1265043,1268750,1269805,1287352,1290231,1321501,1322534,1368599,1407796,1407917,1408156,1408447,1461409,1463586,1466072,1660460,1704499,1701618,1704211,1701596,1707383,1706218,1713783,1713443,1715100,1716646,1714352,1723376,1706218,1717134,1717134,1759841,1740127,1740183,1737868,1755222,1763595,1750814,1812661,1784600,860732,1785700,1786558,1786640,1788366,1788803,1787835,1758851,1802116,1802116,1802116,1802116,1810778,1870892,1827509,1854528,1859942,1859942,1870780,1865837,1905202,1905202,1750814,1243855,1763595,1806295,1806280,860741,1893429,1893243,1893429,1898989,1913110,1915322,1921065,1871293,1872541,1900928,1708367,1874008,1827509,1808005,1948002,1708367,1859942,1827509,1243851,1959041,1243851,1746007,1243851,1243851,1967400,1967400,1191128,1780116,1960374,1960374,1780116,1827486,1156092,1153939,36,1827486,1859942,1974908,1156092,1156323,1763595,1080236,1763595,1854295,1854641,1865837,1867230,1867211,1869328,738180,8,1191128,1808005,1967400,1156323,2104541,2058309,2013290,2047047,2068897,2010928,2087246,2010928,2104541,2104541,2104578,2115265,1708185,2120941,426028,2129783,1663761,2166426,2068897,1967400,1967400,1967400,2068897,1808005,1716646,1833649,1827509,2010085,36,2167570,2068897,1706218,1156092,2012337,2186146,1191128,2191212,1190931,1156323,1716646,2012663,2508370,1992371,1080236,2280950,1808005,36,36,1156323,1808005,1819898,1191128,1243855,2281280,2013290,2239453,1837521,1156323,1644219,1849105,1849105,2376567,2381406,1808005,1808005,1156092,2552104,2552104,2281280,1805958,1967400,2068897,2390125,1808005,2444428,2459222,2013290,2568057,2508370,1661786,1763595,2349059,2349059,2438289,1708367,2120941,2508370,2120951,2596819,1156323,1191128,2239453,2367160,2012337,2451225,1808005,2615851,1808005,1849105,55,55,2734901,1191128,55,55,2012663,2734829,1967400,1967400,1996683,1992371,2013290,2018337,2012337,2018364,1156092,1363024,1967400,1888191,1888191,1805958,1967400,2057362,39,1153939,1708185,2010085,2010085,2010085,2079659,2079659,2010928,2010928,2087246,1808005,36,1190931,2369360,2380491,1808005,2120941,1153939,1708367,2511867,2540778,1704499,1787140,1758479,1716646,1827486,2239453,1808005,1808005,1080236,2451225,2120941,1808005,";
var jsDefaultRenderHandlerProps = ",,";
var jsAuthorizedControls = ",65684,62081,62169,62236,62658,67860,70371,70560,70645,70911,71567,71570,71579,71582,71585,71588,71630,71645,73051,73055,73135,73175,73177,73179,73181,73183,73185,75593,75596,75598,75600,75602,75604,75943,77337,77339,77367,77369,77371,77397,77399,77401,77403,77406,77408,77423,77425,77429,77431,77433,77435,77454,77456,77458,77460,77462,77464,77524,77526,77528,77530,77533,77535,77564,77566,77569,77572,77579,77581,77755,77759,77771,77940,78254,78304,78759,81449,81447,81452,81454,86430,95027,110992,112176,114559,122476,122590,122592,122594,122998,123000,123002,123004,123010,123012,123014,123016,123113,123115,123117,123119,123121,123123,123125,123127,123129,123131,123133,123135,123137,123139,123141,123143,123193,123217,123219,123221,543,1784,1786,1791,1829,1901,1903,3434,3062,10165,17470,19113,17964,17975,20458,18450,19246,20461,20532,20535,20631,22975,22976,29043,29065,29198,29497,29894,32565,37812,42989,50270,50283,51427,51770,51940,51987,52309,52306,52325,52338,52440,52727,52935,53585,53717,54936,55739,56170,57624,70375,57659,58549,60274,60859,65324,65375,65378,65630,341266,341268,341270,343681,344120,344123,344125,344127,344129,344131,344133,1155418,344136,344142,344918,344920,346066,349254,349260,353078,353096,353249,353368,353500,353518,356036,356519,356527,356534,359303,359315,359619,365645,365647,365651,372637,372642,373892,409046,385136,402687,408565,416225,423380,423445,423634,423934,424407,424503,426545,425757,425785,426028,426263,433478,438722,440105,440778,441424,441447,441488,441530,441743,441914,441917,441920,441923,442181,442184,442228,442231,442767,443887,444519,444536,448085,446524,447856,448121,450241,450489,450583,451031,123223,123225,123227,123229,123231,123233,123235,123237,123239,123241,123243,123245,123247,123249,133712,138458,138462,138472,138493,140917,152719,152941,155012,174553,176272,182475,185313,185545,185572,185600,185653,189527,189717,189912,189915,209638,190014,209612,209640,210772,233752,233754,240835,242005,245048,245061,246392,247905,253143,255217,258368,258370,258448,259352,259507,259535,259540,259557,259597,270079,272462,272484,273374,275946,276171,281359,281731,281886,285356,285362,285364,289279,290246,293573,293580,293990,306206,306372,307096,307117,1409047,1410292,1410344,1440455,1462692,1462605,1463206,1463358,1463363,1463559,1463575,1466067,1466072,1466949,565361,577664,577666,580782,580785,586106,593209,631308,631375,671204,671207,671210,671213,671216,671219,671222,671225,671228,630659,630928,631186,631230,671231,703507,703512,872630,872675,951724,1070639,1070773,1071579,1074782,1074794,1116648,1118602,1153954,1153962,310170,319781,325794,326607,326613,331241,331243,331248,338287,338305,338307,338805,340095,340098,341260,341264,523857,523883,540187,541324,542748,542895,543075,543442,543531,545031,545034,545925,550439,550694,551327,551342,551843,551848,554801,557468,563421,563522,564335,564350,564362,565392,565403,565430,565440,565460,578908,580751,589443,589691,589825,631522,631342,671234,704390,704500,730405,733189,733195,733931,733944,735045,721050,721061,720116,803640,807230,860741,867909,869754,878921,872399,911315,951437,952815,952921,954983,956036,958270,960899,960901,960903,960912,960914,960916,959997,990601,993320,996841,999438,999472,999741,999747,999871,1034551,1034553,1035679,1035681,1070829,1080236,1111202,1112587,1112594,1116088,1117180,566481,567951,635375,671237,705089,708277,738180,738270,738274,756640,808480,993241,993247,993326,998452,999162,1034549,1034793,1034795,1118837,1121340,1150407,1152064,1153928,1153933,1153939,1153948,1154637,1156092,1156320,753746,754822,754960,755002,755412,755426,755453,801854,801933,802037,802071,802077,802080,802083,802087,802091,802417,802525,804060,860732,753752,754885,753748,754422,802568,451785,453349,452911,452935,454345,454916,464533,465324,476013,469286,469308,470126,472222,476011,476015,489860,478066,482338,482852,492048,486517,489015,489681,492017,492050,492052,498151,516411,516413,516415,516417,516419,516422,1935063,1939712,1992371,1996683,2010928,2012302,2012840,2013290,2021773,2047047,2058309,2079659,2104541,2108357,2115265,2120941,2120951,2135749,2145522,2157693,2157775,1193061
<a href="http://www.youtube.com/user/OhioUnivRussCollege"><img border="0" alt="YouTube" title="YouTube" src="/engineering/images/icon_youtube.png" /><span class="imageCaption" style="display:none;"></span></a>
</div>
<div class="imageImg">
<a href="http://www.linkedin.com/groups?home=&gid=3000035&trk=anet_ug_hm"><img border="0" alt="LinkedIn" title="LinkedIn" src="/engineering/images/icon_linkedin.png" /><span class="imageCaption" style="display:none;"></span></a>
</div>
<div class="imageImg">
<a href="http://www.facebook.com/ohio.engineering"><img border="0" alt="Facebook" title="Facebook" src="/engineering/images/icon_fb.png" /><span class="imageCaption" style="display:none;"></span></a>
</div>
<div class="imageImg">
<a href="https://twitter.com/russcollege"><img border="0" alt="Twitter" title="Twitter" src="/engineering/images/icon_twitter.png" /><span class="imageCaption" style="display:none;"></span></a>
</div>
<div class="imageImg">
<a href="http://instagram.com/russcollege"><img border="0" alt="Instagram" title="Instagram" src="/engineering/images/russ_instagram.png" /><span class="imageCaption" style="display:none;"></span></a>
</div>
</div></div></div><div id="cs_control_2398199" class="cs_control CS_Element_Custom"></div></div></div><div id="cs_control_2142700" class="contentWrap col row"><div  title="" id="CS_Element_2177477_2142700"><div id="cs_control_2142767" class="cs_control col pageTitle">
<!-- Portal Content -->
<div class="content-element">
<h2>Faculty</h2>
<p></p>
<br />
</div>
<!-- Portal Content -->
</div><div id="cs_control_2142762" class="mainContent col"><div  title="" id="CS_Element_2177477_2142762"><div id="cs_control_2142772" class="cs_control CS_Element_Custom">
<!-- Portal Content -->
<div class="content-element">
<p>  </p>
</div>
<!-- Portal Content -->
</div><div id="cs_control_2177314" class="cs_control">
<style type="text/css">
/* This fixes some issues with the anchor links from the A-Z bar at the top */
.group a[name]
{
position: absolute;
}
</style>
<div id="staffAlpha">
<ul class="azList">
<li class="children "><a href="#A">A</a></li>
<li class="children "><a href="#B">B</a></li>
<li class="children "><a href="#C">C</a></li>
<li class="children "><a href="#D">D</a></li>
<li class="children "><a href="#E">E</a></li>
<li class="children "><a href="#F">F</a></li>
<li class="children "><a href="#G">G</a></li>
<li class="children "><a href="#H">H</a></li>
<li class="children "><a href="#I">I</a></li>
<li class="children "><a href="#J">J</a></li>
<li class="children "><a href="#K">K</a></li>
<li class="children "><a href="#L">L</a></li>
<li class="children "><a href="#M">M</a></li>
<li class="children "><a href="#N">N</a></li>
<li class="children "><a href="#O">O</a></li>
<li class="children "><a href="#P">P</a></li>
<li>Q</li>
<li class="children "><a href="#R">R</a></li>
<li class="children "><a href="#S">S</a></li>
<li class="children "><a href="#T">T</a></li>
<li class="children "><a href="#U">U</a></li>
<li class="children "><a href="#V">V</a></li>
<li class="children "><a href="#W">W</a></li>
<li class="children "><a href="#X">X</a></li>
<li class="children "><a href="#Y">Y</a></li>
<li class="children last"><a href="#Z">Z</a></li>
</ul>
<div id="azContent">
<div class="group">
<a id="A" name="A"></a>
<h3 class="letter">A</h3>
<a href="profiles.cfm?profile=abukamai">Nasseef Abukamail</a><br />
Electrical Engineering and Computer Science <br />
Associate Lecturer <br />
<a href="mailto:abukamai@ohio.edu">abukamai@ohio.edu</a> <br />
740.593.1229 
<div><br />
</div><a href="profiles.cfm?profile=alam">Khairul Alam</a><br />
Mechanical Engineering, Center for Advanced Materials Processing, ESP Lab <br />
Professor <br />
<a href="mailto:alam@ohio.edu">alam@ohio.edu</a> <br />
740.593.1558 
<div><br />
</div><a href="profiles.cfm?profile=alim1">Muhammad Ali</a><br />
Biomedical Engineering, Mechanical Engineering, ESP Lab <br />
Associate Professor <br />
<a href="mailto:alim1@ohio.edu">alim1@ohio.edu</a> <br />
740.593.1389 
<div><br />
</div><a href="profiles.cfm?profile=arch">Deak Arch</a><br />
Aviation <br />
Associate Professor, Assistant Chair <br />
<a href="mailto:arch@ohio.edu">arch@ohio.edu</a> <br />
740.597.2688
Kevin Swagger
  • 53
  • 1
  • 8
  • 2
    Perhaps if you added proper error checking before using the results of calls, things will improve. It looks like you're passing -1 in as the first parameter of a `.substr()` call. – mah Mar 23 '16 at 16:58
  • I was trying to subtract one from the location of end. If it helps heres my main program. – Kevin Swagger Mar 23 '16 at 17:11
  • How do you know it found anything? – Galik Mar 23 '16 at 17:15
  • I don't. Im trying to get the line starting at "mailto:" then copy the email address after it then split it. But im not quite sure how to do that... Sorry this is my first time taking a programming class and actually doing programming so im lost. – Kevin Swagger Mar 23 '16 at 17:19
  • It may help if you post some of the *text* file you are reading from **after removing any personal information**. – Galik Mar 23 '16 at 17:21
  • off topic: `while(!in_stream.eof())` See this a couple times a day and it's almost always wrong. [How can this not be part of the curriculum?](http://stackoverflow.com/questions/5605125/why-is-iostreameof-inside-a-loop-condition-considered-wrong) – user4581301 Mar 23 '16 at 17:21
  • http://www.cplusplus.com/reference/string/string/npos/ . Return value of the find function can be -1 if not found, if you use it without checking you may encounter strange cases like this one – 88877 Mar 23 '16 at 17:23
  • So how do I find the first "mailto:" then without a negative one returned. – Kevin Swagger Mar 23 '16 at 17:29
  • @KevinSwagger sorry for the delay, just saw this. Your response to me misses the point I intended... When considering `mail = mail.substr(location, (end - 1));` for example, my comment was not concerned with `(end - 1)` (though that _does_ warrant error checking since you cannot be certain `end` is set to a valid location without it). Rather, my concern is with `location` from `int location = mail.find("@");`. What if `@` isn't found? From your exception, `basic_string::substr: __pos (which is 4294967295)`... 4294967295 is the 32 bit unsigned representation of `-1`. – mah Mar 24 '16 at 01:16
  • I see. But I feel like that is irrelevant because if I find "mailto:" There will be an @ symbol. – Kevin Swagger Mar 24 '16 at 02:44
  • Okay I see your point it returns a negative one. But how do I go about saying if it returns negative one get next line? – Kevin Swagger Mar 24 '16 at 14:58

2 Answers2

1

Divide the problem up into tasks. You have four tasks and they should be tackled individually. Do not proceed to the next task until you know the current task does exactly what you want. Working on more than one task at a time widens the problem area, and this turns out to be more than a geometric expansion. Bugs tend to interact with other bugs. A bug in task 1 may make a bug in task 2 look different, causing you to debug the wrong symptoms.

Consider giving each task a function or if the task is complex, its own file. This way each task can be individually tested easily. Why? What if you change the code from task 1 and want to know if it broke? Sure you can test the whole program, but what if you broke 2 things? If you want to test the splitter logic with a few hundred addresses to make sure you correctly handle all of the weird edge cases, you can just call the splitter function with those few hundred strings and not have to invent a complicated file.

Task 1: read a file line by line.

This is first because until you can do this, you can't do much else.

std::string line;
while (std::getline(in_stream, line))
{
    // output line to compare with source
}

will read a file until it cannot be read anymore be this end of file, corrupt data, some joker pulling out the USB drive while you're reading it, or sundry other problems. How do you test this? An easy way is to read the file in from one stream line by line and print it to the console. This is a pretty big file and the eye is only so useful for comparing large amounts of text, so write all received lines to an output file and then diff the files. If they match, you win. Move on to task 2. If they don't, debug.

Task 2: Look for "mailto".

This take a line from task 1 and looks for "mailto"

size_t loc = line.find("mailto:");
if (loc != std::string::npos)
{
    std::cout << "found: " << line << std::endl;
}

This is an easier thing to test so we can get away with the mk 1 eyeball or Notepad and ctrl+f to confirm that all mailto lines were printed.

Task 3: Isolate the address.

You've found a line containing "mailto" in task 2. Now you have to isolate the address on that line. You have the starting location from task 2 and you may be able to extract the string between the ':' after "mailto" and the next '\"'. I'm not going to spend much time here because this is the meat and potatoes of this assignment. I do too much here and I pass the course, not you, but basically this is a find and a substr similar to what OP has in their question.

Task 4: Split the Address from task 3

This is more work with find and substr to isolate the parts of the address.

user4581301
  • 29,019
  • 5
  • 26
  • 45
  • What does size_t and npos mean? Or is that a dumb question? I guess I don't understand what your example means in task two. – Kevin Swagger Mar 23 '16 at 21:04
  • `size_t` is a C++ data type that is an unsigned integer large enough to store the size of the largest object. If it can count that high it can index anything you can create. more here: http://en.cppreference.com/w/cpp/types/size_t. `std::string::npos` is a marker for end of string. If a find returns end of string, whatever you were looking for was not found. More here: http://en.cppreference.com/w/cpp/string/basic_string/npos – user4581301 Mar 23 '16 at 21:16
  • Okay but I can't use the size_t in my program. So how would I grab just one line at a time? – Kevin Swagger Mar 23 '16 at 23:34
  • How can you not use `size_t`? It's an integral datatype. Sucker is built in and has been for decades. If this is some bone-head instructor's whim, sorry, but sucks to be you. Use `std::string::size_type` or if that is also banned, an `unsigned int` instead. That should work well enough. – user4581301 Mar 24 '16 at 18:31
0

You need to make a loop and test every line until you find one with the string "mailto:".

Here is some example code to give you an idea of how you can do that:

std::ifstream ifs("test.txt");

std::string line; // general buffer

// read each line
while(std::getline(ifs, line))
{
    // try to find "mailto:"
    std::string::size_type pos = line.find("mailto:");

    // ignore if not found
    if(pos == std::string::npos)
        continue;

    // we found it! extract address from line here
    // remember that pos holds the start of the information        
    // ...
}
Galik
  • 42,526
  • 3
  • 76
  • 100