Convert a .txt dictionary into a data frame with skipping some values

Question

I have a .txt performancelog in a (mostly) dictionary format that looks something like this:

10:07:49.1396 Info {"message":"Killing processes...","level":"Information","logType":"User","timeStamp":"2020-10-19T10:07:49.1386035+02:00"}

10:07:49.4102 Info {"message":"Opening applications...","level":"Information","logType":"User","timeStamp":"2020-10-19T10:07:49.4092373+02:00"}

I'd like to put it ito a data frame like this:

message                  level          logType   timeStamp
Killing processes...     Information    User      2020-10-19T10:07:49.1386035+02:00
Opening applications...  Information    User      2020-10-19T10:07:49.4092373+02:00

So basically only the stuff within the curly brackets. I don't need the "10:07:49.1396 Info" at the beginning of log entries.

I'm learning NumPy and Pandas now, but being an absolute beginner I'm not even sure if it's possible with just those two libraries. Do I need to use something else as well?

Try modifying the text file to be a list of dictionaries (JSON format), then read that with pandas. You can do this without numpy -- you can use builtin string methods. — jakub, Oct 30 '20 at 13:02
Thanks for the tip. But there are 2 issues: a) that part in the beginning before the curly brackets b) I'd like to use the logfile in a Dash Plotly dashboard later, real time and interactive. So I'm hesitant to modify the logfile itself. I just want to be able to read from it quickly in real time. — catLuck, Oct 30 '20 at 13:03

score 2 · Answer 1 · answered Oct 30 '20 at 13:11

You have to parse the log manually to collect the relevant data:

import re, json

pattern = re.compile(r'.+? .+? (.+)')
logs = []
with open('data.txt') as fp:
    for line in fp:
        match = pattern.match(line)
        if match:
            try:
                data = json.loads(match.group(1))
                logs.append(data)
            except json.JSONDecodeError:
                pass

df = pd.DataFrame(logs)

To do it in real time, you have to watch the file for changes. See, for example, this question: How do I watch a file for changes?

score 1 · Accepted Answer · answered Oct 30 '20 at 13:22

Here's another way using json_normalize:

import json
import re

pattern = re.compile('{.*}')
rows = []
with open('a.txt', 'r+') as f:
    for line in f:
        for match in re.finditer(pattern, line):
            data = json.loads(match.group())
            dfx = pd.json_normalize(data)
            rows.append(dfx)

df = pd.concat(rows)
print(df)

                   message        level logType                          timeStamp
0     Killing processes...  Information    User  2020-10-19T10:07:49.1386035+02:00
0  Opening applications...  Information    User  2020-10-19T10:07:49.4092373+02:00

Convert a .txt dictionary into a data frame with skipping some values

2 Answers2