0

I'm new to programming please forgive me if in advance.

I was wondering if there was a way to detect if a textfile i am opening in openfiledialogbox is delimited before loading it into a listview using StreamReader.

Will i have to read the contents of the file to determine this or is there some .net foo? my thoughts say i need to read the entire file to determine this.

OpenFileDialog ofd = new OpenFileDialog();
        if (ofd.ShowDialog() == System.Windows.Forms.DialogResult.OK)
        {

            string fileToOpen = ofd.FileName;
            System.IO.StreamReader sr = new System.IO.StreamReader(fileToOpen);
            // Something more
        }
  • 4
    You need to read a file to verify what's in it. Depending on your issue, you might be able to assume it from the file extension, but you'll never actually know until you parse the content – Allan S. Hansen Mar 24 '16 at 08:25
  • Thank you for the prompt reply. i figured id have to read the whole contents to confirm i just wanted a second opinion. thank you very much. – FreeThirst Mar 24 '16 at 08:37

1 Answers1

2

Assuming it's a textfile, often, reading the first line is enough to check if its comma, semicolon, tab-delimited data. However, you still need to decide it on your own, since except for CSV, I dont remember the rest having any single standard. Write X regexps that will check if that line is one of X formats you want to handle in a special way and check if that line matches any of these rules. Or better, first 3 or 5 lines. If there is a match, you can also count the number of columns, usually it should be constant per line.

... but always there's the ambiguity: does the word-tab-word really mean it's tab-delimited? or maybe it's just random text and someone just formatted some bits of it like that? unless the file has some (semi-)standarized headerline, it's hard to guess. "Certainty" you can have only by reading whole file and verifying if each line follows the format. Let me warn you though: even with reading whole file, you cannot be certain. Consider this line (my editor sucks, here four spaces denote TAB character):

Mary,"    had; a lit",tle    ;    lamb

Is it semicolon-delimited, tab-delimited, or standarized CSV?

Since you cannot decide for one line, you can't decide a file, since you can get a whole file of such lines.. Simply saying, these textformats are NOT designed to be autodetectable.

If you don't want to write such "detector" (I'd suggest you do, it's a simple and good excercise to learn why detecting is hard when fileformat is not designed with that in mind) - then I remember that I saw once a trick that may help you. I have no idea how much it can detect but may be worth trying. Please see this thread about autodetection of mimetype and see here for the list of types. For example, CSV is text/csv.

Community
  • 1
  • 1
quetzalcoatl
  • 27,938
  • 8
  • 58
  • 94