9

I have a number of data files created by many different programs. Is there a way to determine the database and version of the database that was used to create the data file.

For example, I'd like to identify which files are created from Microsoft Access, dBASE, FileMaker, FoxPro, SQLite or others.

I really just want to somehow quickly scan the files, and display information about them, including source Database and Version.

For reference, I'm using Delphi 2009.

lkessler
  • 19,414
  • 31
  • 125
  • 196
  • 3
    Surely you are aware of file extensions. Secondly, surely you are aware that perhaps many programs do NOT use a database file format at all, and that the number of private binary file formats in the world is for all intents and purposes, uncountable. – Warren P Jun 04 '11 at 17:04
  • @Warren: I am specifically looking at genealogy software. They tend to use database drivers within their programs but save their data to disk as project files with their own proprietary extensions. I, for example, will be using DISQLite3 and will save my .bhp file in SQLite format. – lkessler Jun 04 '11 at 19:48
  • But i would expect, that this "renamed" database files are often not in their "native" format. I think that they often are encapsulated into a bigger structure ("container") with some other (non-DB) information and maybe compression. At least i do that with my projects. – Andreas Jun 04 '11 at 22:45

3 Answers3

10

First of all, check the file extension. Take a look at the corresponding wikipedia article, or other sites.

Then you can guess the file format from its so called "signature".

This is mostly the first characters content, which is able to identify the file format.

You've an updated list at this very nice Gary Kessler's website.

For instance, here is how our framework identify the MIME format from the file content, on the server side:

function GetMimeContentType(Content: Pointer; Len: integer;
  const FileName: TFileName=''): RawUTF8;
begin // see http://www.garykessler.net/library/file_sigs.html for magic numbers
  result := '';
  if (Content<>nil) and (Len>4) then
    case PCardinal(Content)^ of
    $04034B50: Result := 'application/zip'; // 50 4B 03 04
    $46445025: Result := 'application/pdf'; //  25 50 44 46 2D 31 2E
    $21726152: Result := 'application/x-rar-compressed'; // 52 61 72 21 1A 07 00
    $AFBC7A37: Result := 'application/x-7z-compressed';  // 37 7A BC AF 27 1C
    $75B22630: Result := 'audio/x-ms-wma'; // 30 26 B2 75 8E 66
    $9AC6CDD7: Result := 'video/x-ms-wmv'; // D7 CD C6 9A 00 00
    $474E5089: Result := 'image/png'; // 89 50 4E 47 0D 0A 1A 0A
    $38464947: Result := 'image/gif'; // 47 49 46 38
    $002A4949, $2A004D4D, $2B004D4D:
      Result := 'image/tiff'; // 49 49 2A 00 or 4D 4D 00 2A or 4D 4D 00 2B
    $E011CFD0: // Microsoft Office applications D0 CF 11 E0 = DOCFILE
      if Len>600 then
      case PWordArray(Content)^[256] of // at offset 512
        $A5EC: Result := 'application/msword'; // EC A5 C1 00
        $FFFD: // FD FF FF
          case PByteArray(Content)^[516] of
            $0E,$1C,$43: Result := 'application/vnd.ms-powerpoint';
            $10,$1F,$20,$22,$23,$28,$29: Result := 'application/vnd.ms-excel';
          end;
      end;
    else
      case PCardinal(Content)^ and $00ffffff of
        $685A42: Result := 'application/bzip2'; // 42 5A 68
        $088B1F: Result := 'application/gzip'; // 1F 8B 08
        $492049: Result := 'image/tiff'; // 49 20 49
        $FFD8FF: Result := 'image/jpeg'; // FF D8 FF DB/E0/E1/E2/E3/E8
        else
          case PWord(Content)^ of
            $4D42: Result := 'image/bmp'; // 42 4D
          end;
      end;
    end;
  if (Result='') and (FileName<>'') then begin
    case GetFileNameExtIndex(FileName,'png,gif,tiff,tif,jpg,jpeg,bmp,doc,docx') of
      0:   Result := 'image/png';
      1:   Result := 'image/gif';
      2,3: Result := 'image/tiff';
      4,5: Result := 'image/jpeg';
      6:   Result := 'image/bmp';
      7,8: Result := 'application/msword';
      else begin
        Result := RawUTF8(ExtractFileExt(FileName));
        if Result<>'' then begin
          Result[1] := '/';
          Result := 'application'+LowerCase(Result);
        end;
      end;
    end;
  end;
  if Result='' then
    Result := 'application/octet-stream';
end;

You can use a similar function, from the GAry Kessler's list.

Arnaud Bouchez
  • 40,947
  • 3
  • 66
  • 152
  • Perfect! That's exactly what I wanted to know. And Gary Kessler's page (no relation) is a great reference. – lkessler Jun 04 '11 at 19:57
2

There are lots of database engines with hundreds (if not thousands) of versions and formats. (Binary, CSV, XML...) Many of them are encrypted to protect the content. It is quite "impossible" to identify every database and every format and it is a subject of constant changes.

So first of all you have to limit your task to a list of database engines you want to scan. Thats what i would do...

Andreas
  • 1,329
  • 1
  • 9
  • 19
  • Okay. How about if I start with the list I gave. What's the best way to identify them? If there's any others that are in common use and easy to identify, then I can add them. Maybe there's a Delphi component somewhere that already does this? – lkessler Jun 04 '11 at 16:57
  • I seriously doubt that there is a tool that would identify "ANY DATABASE FILE". There are no common traits you could use, you would have to combine all possible database access libraries into a mega-library to be sure. And what do you propose for your tool to do when a file is of a particular type but is damaged? Your question is not well defined, or really, ultimately answerable. – Warren P Jun 04 '11 at 17:13
2

First, I do not believe you could do more in a "quick scan" than provide a "possible format". Also, it's very difficult to imagine that any quick technique could be reliable.

DBASE files commonly use the extension .dbf. There are variants of the dBase file format used by FoxPro, and Clipper. Wikipedia documents these as xBase. Any dBase library that can open dBase files will also probably be able to (a) show that this is in fact a true dBase file by opening it, and (b) allow you to see which supported variants of the xBase file format are in use.

Access files are usually using the .mdb file format, but can be encrypted with a password. You could probably write your own library that could postiively identify the internal content as being of the "Jet database engine" (internal type of file used by Access) but not read the content, but I doubt that short of cracking the password, you could do this reliably.

FileMaker files can have many file extensions, and their internal file formats are not well documented. According to wikipedia, .fm .fp3 .fp5 and .fp7 are common file extensions. You will have similar "password" problems with filemaker databases, as with Access. I am not aware of any way to read filemaker files in delphi except through ODBC, and even then, I don't think you could provide an "omni-reader" in Delphi that was powered by ODBC, since ODBC requires careful setup and knowledge of the originating file into an odbc data source before it becomes readable through ODBC. Browse/Discovery is not a phase that is supported by ODBC.

SQLite files can have any file extension at all. The easiest way to try to detect it would be to load/open the file using SQLite and see if it opens.

The rest of the list is more or less infinite, and the technique would be the same. Just keep rolling more database engines and access layer libraries into your Katamari Damaci Database Detector Tool.

If you want to start with old database formats as you seem to be, I would investigate using BDE (ancient, but hey, you're talking about ancient stuff), plus ADO, to try to auto-detect and open files.

Warren P
  • 58,696
  • 38
  • 168
  • 301