2

I have been working with databases for quite a while and have found my sweet spot is in storing the data collected by a shell script, within the shell script itself. Sometimes it'll be single line logs for tracking data in quick projects but other times it will be JSON objects written there (which I find loads quicker than an unopened .json file).

Here's an example of what it looks like for Python

# ====================================================
#   Don't write below this line, live database below.
# ====================================================
# "scores":{"Adam":2,"John":6,"Steve":1,"Tim":35}
# "teams":{"shockwave":{"wins":5,"losses":2},"jellyfish":{"wins":3,"losses":4}}

With this method I've found nothing but benefits from keeping it one file for security, preformance, ease of use, ease of transportation and backup, minimal chance of database collisions, and just overall simplicity. I also know that this is the architecture sqllite uses to stay easy to implement for devs.

I do however care about writing good code and would like to know any downfalls to this method.

  • When the database gets big, will tons of data in comments weigh the program down even if it's not being looked at durring times when other data is being processed?

  • Is there a limit to the size a file can get before it becomes unstable?

  • Why isn't this method used more often?

  • Is there a source where I can learn to implement this better?

  • Is there anything I'm not thinking of?

Before you answer though, I know having a single 10TB file would be an impractical piece of work to have one process dealing with. I like making clusters and neurons that work both together and individually, and typically use this method for smaller implementations like that. So please do keep that in mind when answering because I know there's better large scale solutions for industry usage.

John1024
  • 97,609
  • 11
  • 105
  • 133
codykochmann
  • 347
  • 1
  • 10
  • 1
    Keep in mind that not all file systems support large files. Code editors may not always like files that are too large. If your program happens to terminate in the middle of writing data you might end up with database you need to repair, most other databases already do this. It may load faster but this may be a limited use-case. You will also need to be careful when writing to the source file. OSes may allow multiple processes write access to files. – Joelmob Sep 11 '14 at 19:24
  • 1
    I agree and have already learned my lesson with limited code editors. Normally my source is multiple files because tackling a mountain of code just makes things more complicated and I just use a script to neatly tie everything together into one compiled, yet still readable file when I'm done. As for rewriting databases, I use a "busy"/"available" system to keep from write collisions and all of the writing is done through the echo command to minimize the amount of time writing. – codykochmann Sep 11 '14 at 19:48
  • Related: [What's the Pythonic way to store a data block in a Python script?](http://stackoverflow.com/q/6942843/2157640) – Palec Oct 07 '14 at 00:11
  • That does work to a certain extent but it is limited to the amount of RAM the system is able to handle at the time. The problem with that implementation is that every time that Python code runs it absolutely has to load that dictionary or that string but what if that string is the size of 12 libraries and all it needed to do for at least that one single run was to ping a server? Since the time that I have written this question, further developments have pushed me to convert the storage code at the bottom of the page to be converted over to base64 just for safer error handling and stability. – codykochmann Apr 23 '15 at 15:02

0 Answers0