6

Possible Duplicate:
What are important points when designing a (binary) file format?

I am going to develop a program which will store data in file.

The file can be big. The data in the file is basically made up with variable length records. And I need random access to the records.

I just want to read some resouces/books about how to design the structure of a data file. But I can't find any yet.

Any suggestion is much appreciated.

Community
  • 1
  • 1
limi
  • 832
  • 7
  • 18
  • what kind of variables will be in data? long strings with different notations? – billz Jan 03 '13 at 08:45
  • Not a duplicate, because this question is much more specific about its need (random access to variable length records). – MSalters Jan 03 '13 at 09:06
  • 1
    @limi: Why have you decided on a file? Databases exist for this very purpose, and they already implement all the required logic to map your data to permanent storage. – MSalters Jan 03 '13 at 09:08

4 Answers4

3

You might find http://decoy.iki.fi/texts/filefd/filefd useful. It's a general starting point to the techniques to consider.

Also look at this question here on SO: What are important points when designing a (binary) file format?

Community
  • 1
  • 1
Philipp
  • 10,577
  • 5
  • 57
  • 111
2

The problem you describe is a central theme of Database Theory.

Any decent text on the subject should give you some good ideas. The standard text from uni was:

Fundamentals of Database Systems- Elmasari & Nava (PDF) (Amazon)

Another approach is to use a memory mapped array of structs, take a look at my bountied answer to a similar question

Yet another approach is to use a binary protocol like Google protobuf and "send" your data to the file when writing and "receive" it when writing.

Community
  • 1
  • 1
Andrew Tomazos
  • 58,923
  • 32
  • 156
  • 267
2

If the answer you're looking for is "what book to read" I can't help.

If "how do to that" may be good for you as well I've some suggestions.

One good solution is the one suggested by Srykar; I would just add that I'd use SQLite instead of MySQL. It's an open source C library that you can embed in your program. It lets you store data in a DB just the way you'd do with SQL statement, but calling the library C functions instead. In your case you may keep everything in memory and then save the data to disk at proper time.

Reference: http://www.sqlite.org

Another option is the old "do it yourself way". I mean: there's nothing very complicated about storing your data to a file (unless your data is very very structured, but I'd go with option nr. 1 in this case).

You write down a plan of how you want the structure of your file to be. And you follow that plan both when writing the file to disk and when reading it re-storing the data into memory.

If you have n records. Write n to disk, then write each record.

If each record has variable lenght, then write the length of each record before writing the record.

You talk about "random access" in your question. Probably you mean that the file is very big and at access time you want to read from disk only the portion you're interested in.

If so plan to build an index; that index will tell the offset of each element in bytes from the beginning of the file. Store the index at the beginning of the file and then store the data.

When you read the file you start reading the index, get the offset to the data you need, and read that portion of file.

These are very basic examples, just to get the idea...

Hope they helps!

Paolo
  • 13,439
  • 26
  • 59
  • 82
  • Thanks, after some investigation, I think SQLite is the best choice to me. Developing my own file format is not worth the effort , to support delete/insert and at same time to get good performance, quality are not easy. – limi Jan 03 '13 at 09:41
1

Is there any reason you are not considering putting this data in a persistent DB store like mysql? these system are built to deal with random data access with proper indexes to speeden you data retrieval. Plus while reading from a file, you would have to read the entire file to get what you want as there are no indexes and no query language.

Added to this they have systems in place to make sure multiple running processes can access the same data without data getting corrupted. It provided data recovery incase of inconsistencies.

So just storing is the simple part, it does not end there. You would have to provide all the other solutions eventually. Better use whats available.

Srikar Appalaraju
  • 66,073
  • 51
  • 206
  • 260
  • or you can just create a zip archive and put everything in it, it's the main concept adopted by .jar files or .apk files or billions of other similar solutions. – user1824407 Jan 03 '13 at 08:41
  • zip is an archive. but the question states `And I need random access to the records.` hence I suggested mysql... – Srikar Appalaraju Jan 03 '13 at 08:43
  • ... and the title says data-file ... probably this question needs an explanation, as far as I know I could also suggest serialization for this – user1824407 Jan 03 '13 at 08:44