1

I have a large npz file that l've loaded with numpy's np.load. I want to convert this to panda's dataframe so l can apply machine learning algorithms (KNN, K-Means, DT) using scikit-learn. I am new to python so my experience is very limited to this library. Thank you for the help.

This is what l have so far:

dataset = np.load('./example.npz')

test_data = dataset['data']

test_labels = dataset['labels']

print data.shape gives (17000, 78400)

print labels.shape gives (17000, 1)

Kos
  • 3,847
  • 8
  • 29
  • 34

2 Answers2

2

I'm not sure how you want to structure your dataframe, but this will load the npz file with the labels as index:

import pandas as pd
import numpy as np

npz = np.load('/path/to/npz.npz')
df= pd.DataFrame.from_dict({item: npz[item] for item in npz.files}, orient='index')

if you want to load the arrays into a single column use:

pd.DataFrame.from_dict({item: [npz[item]] for item in npz.files}, orient='index')

Just drop the orient='index' if you want to load the labels as columns.

RJ Adriaansen
  • 3,782
  • 2
  • 5
  • 18
-2

Please try out this:

import pandas as pd
df = pd.DataFrame(dataset)
Lakshmi - Intel
  • 483
  • 3
  • 10