1

I'm trying to put pandas dataframes into a dictionary, not the other way around.

I tried to put a list of dataframe chunks as a value in a dictionary and Python returns an error without any explanations.

Here's what I'm trying to do:

I imported messenger chatlog csv file into pandas dataframe and managed to split it by their dates, and put them all in a list.

Now I want to iterate over this list and split it even more: if chat stopped more than 15min, it's splitted in chunks. I want to make another list of these chunks of chat of specific date and then put them is a dictionary where the key is the date and the value is the list of these chunks.

Then all of a sudden Python returns an error. Below are where I'm stuck and the error returned.

import pandas as pd
from datetime import datetime

# Get chatlog and turn it into Pandas Dataframe
ktlk_csv = pd.read_csv(r'''C:\Users\Jaepil\PycharmProjects\test_pycharm/5years.csv''', encoding="utf-8")
df = pd.DataFrame(ktlk_csv)

# Change "Date" column from String to DateTime 
df["Date"] = pd.to_datetime(df["Date"])

# Make a column "time_diff" which is literally diffences of timestamp between chats. 
df["time_diff"] = df["Date"].diff()
df["time_diff"] = df["time_diff"].dt.total_seconds()

# Criteria to split chat chunks 
chunk_tolerance = 900 # 900: 15min of silence splits a chat
chunk_min = 5 # a chat less than 5 min is not a chunk. 

# Split a chatlog by date. (1st split)
df_byDate = []
for group in df.groupby(lambda x: df["Date"][x].day):
    df_byDate.append(group)

# Iterate over the list of splitted chats and split them into many chunks
df_chunk = {}
for day in df_byDate:
    table = day[1]
    list_of_daily_chunks = []
    for group in table.groupby(lambda x: table["time_diff"][x] < chunk_tolerance ):
        list_of_daily_chunks.append(group)

    # It does NOT return any error up to this point. 

    key = table.loc[:, "Date"].dt.date[0].strftime("%Y-%m-%d")
    df_chunk[key] = list_of_daily_chunks

This returns an error:

> C:/Users/Jaepil/PycharmProjects/test_pycharm/PYNEER_KatalkBot_-_CSV_to_Chunk.py Traceback (most recent call last): File "C:/Users/Jaepil/PycharmProjects/test_pycharm/PYNEER_KatalkBot_-_CSV_to_Chunk.py", line 32, in key = table.loc[:, "Date"].dt.date[0].strftime("%Y-%m-%d") File "C:\Users\Jaepil\Anaconda3\lib\site-packages\pandas\core\series.py", line 601, in getitem result = self.index.get_value(self, key) File "C:\Users\Jaepil\Anaconda3\lib\site-packages\pandas\core\indexes\base.py", line 2477, in get_value tz=getattr(series.dtype, 'tz', None)) File "pandas_libs\index.pyx", line 98, in pandas._libs.index.IndexEngine.get_value (pandas_libs\index.c:4404) File "pandas_libs\index.pyx", line 106, in pandas._libs.index.IndexEngine.get_value (pandas_libs\index.c:4087) File "pandas_libs\index.pyx", line 154, in pandas._libs.index.IndexEngine.get_loc (pandas_libs\index.c:5126) File "pandas_libs\hashtable_class_helper.pxi", line 759, in pandas._libs.hashtable.Int64HashTable.get_item (pandas_libs\hashtable.c:14031) File "pandas_libs\hashtable_class_helper.pxi", line 765, in pandas._libs.hashtable.Int64HashTable.get_item (pandas_libs\hashtable.c:13975) KeyError: 0

What have I done wrong? At first, I got an error that series objects cannot be hashed so I changed it into a string. However, a different error is now present.

"Series objects are mutable and cannot be hashed" error

zebralamy
  • 173
  • 2
  • 13

1 Answers1

1

I think you need instead:

key = table.loc[:, "Date"].dt.date[0].strftime("%Y-%m-%d")

first convert to strings by strftime and then select first value by iat:

key = table["Date"].dt.strftime("%Y-%m-%d").iat[0]

Or use iloc for select first row with get_loc for position of column Date:

key = table.iloc[0, df.columns.get_loc("Date")].strftime("%Y-%m-%d")
jezrael
  • 629,482
  • 62
  • 918
  • 895
  • WOW. What the.... How did you do that?????? I shouldn't have used .loc? but why? and what is that .iat? – zebralamy Nov 23 '17 at 09:16
  • `loc` is not necesary here, because select column. – jezrael Nov 23 '17 at 09:17
  • key = table["Date"].dt.iat[0].strftime("%Y-%m-%d") --> doesn't work. The original key = table["Date"].dt.strftime("%Y-%m-%d").iat[0] --> worked. – zebralamy Nov 23 '17 at 09:18
  • Can I ask one more thing? `table` seems to be a tuple and it has a weird value at [0] and desired dataframe at [1]. What is the value at [0]? Groupby made it but why? – zebralamy Nov 23 '17 at 09:33
  • Sure, you need add `i`, because `groupby` return tuples - group name with table like `for i, group in table.groupby(lambda x: table["time_diff"][x] < chunk_tolerance ):` – jezrael Nov 23 '17 at 09:35