1

I have a huge JSON file that grows every minute. If every data was added using \n at the end, it would be really easy to read the file using many upvoted answers here in SO. However, my JSON file has no line ending character, the data is stored like this:

[{a:1,b:"test{}ing"},{a:4,b:"aga,in"},{a:6,b:"another test with \" character"},...]

I want to read, for example, the last 100 entries {} of this file (which are always appended to the file) and while reading, I want to perform a check on the a value, if it gets bigger than a specified number OR if reached 100 entries, stop reading the file and output the json object.

How can I do this using PHP? I mean, how can I read the contents at the end of the file and, while reading, parsing the JSON of the content. I have no idea how to do this cause how can I know I can parse the JSON content if it may still be incomplete/malformed?

danronmoon
  • 3,613
  • 5
  • 32
  • 55
Samul
  • 1,506
  • 3
  • 17
  • 41
  • 1
    Just to confirm, is this JSON or CSV, looks a lot like JSON to me – Max Carroll Jun 28 '20 at 21:08
  • Well if its JSON then you can parse it from JSON into a PHP object and then because its essentially an array, you could use the php array functions to select the last 100 elements in the array just like you would any other array? – Max Carroll Jun 28 '20 at 21:09
  • 1
    @MaxCarroll I am exausted, I wrote CSV when I actually meant JSON! Sorry, I am really tired. Also, I cant parse the entire file with PHP cause there will be not enought memory available, the file is almost 2 TB. I need to read the last 100 {} elements only. – Samul Jun 28 '20 at 21:10
  • Use JSON decode to translate the JSON string in to a PHP object, Which Im guessing would be an array, since it appears to be an array at the top level https://www.php.net/manual/en/function.json-decode.php – Max Carroll Jun 28 '20 at 21:11
  • 1
    Then it looks like you can slice the array to get the last 100 items https://stackoverflow.com/questions/3591867/how-to-get-the-last-n-items-in-a-php-array-as-another-array again please check this I'm not an expert in php, these are just concepts which are available in many programming languages – Max Carroll Jun 28 '20 at 21:12
  • @MaxCarroll I cant slice, because I cant load the file to the array in first place. The file is huge, I cant load it to memory and then get only the last N items. – Samul Jun 28 '20 at 21:14
  • Look at these answers. Maybe you can get something out of them? https://stackoverflow.com/questions/2961618/how-to-read-only-5-last-line-of-the-text-file-in-php – Zim84 Jun 28 '20 at 21:25

1 Answers1

1

I think this is the perfect use case for a document store no sql databse such as mongo db which is for storing retrieving and manipulating large volumes of JSON data.

Please manipulate and access large volumes of data using a suitable solution such as a nosql database and perhaps a document store instead of storing it in a text file.

Here are some links to some reading materials

https://medium.com/cracking-the-data-science-interview/an-introduction-to-big-data-nosql-96b882f35e50

This one is a good one to explain what NOSQL is and the problems that it solves

Which is the suitable database for storing a large JSON?

This one has various dicussions about which databases might be good at doing this

https://www.sisense.com/en-gb/blog/postgres-vs-mongodb-for-storing-json-data/

Takes a look at postgres vs mongo which are two possible options you have.

If you really must continue using a text file to write 2TB of JSON data to then you could stream out the end of the file and use a regex to match.

You could try using file_get_contents https://www.php.net/manual/en/function.file-get-contents.php to extract out the last 10KB (assuming the last 10KB will defiantly have the 100 elements you need - adjust accordingly)

$section = file_get_contents('./2TBFile.JSON', FALSE, NULL, $SizeOfFile - 10000, 10000);

$regex = \({Shape Of An Element}){100}$\

replace shape of an element with a regex which which one will match exactly one element in your dataset. Then it should return the last 100 since we are using the dollar to match the end of the string, Just ensure you use the right regex options depending on your string type (e.g. multiline etc...)

Max Carroll
  • 2,773
  • 1
  • 21
  • 24
  • The line `$arrayObject = json_decode($jsonString)` will not work for me. The string `$jsonString` will be so big (2TB) that I will never be able to own a computer/machine with 2TB of RAM to hold that content into memory. – Samul Jun 28 '20 at 21:20
  • 1
    Hmm so file is so large it can't be loaded into memory. Perhaps you can use a a streamreader to read backwards from the end of file and use some kind of a regex to count how many elements its iterated over then? Or perhaps not write this stuff to a file and use a database instead – Max Carroll Jun 28 '20 at 21:23
  • Database performence would be terrible with a 2TB raw content becoming more like 10TB of disk storage because of index, types... I dont know how to search google anymore to look for this answer! – Samul Jun 28 '20 at 21:25
  • If you really must select the last 100 elements of the file I would guess how many bytes it is, then perhaps add on some contingency read that many bytes from the end of the file using a stream reader and extract them out using a regex. – Max Carroll Jun 28 '20 at 21:40