-3
for i in data.columns:
    top_10 = [x for x in data.i.value_counts().sort_values(ascending=False).head(10).index]
    for label in top_10:
        data[label] = np.where(data['i'] == label, 1, 0)
    data[['i'] + top_10]

what is the mistake?

VMAtm
  • 26,645
  • 17
  • 75
  • 107
  • Can you elaborate? What is the expected output, and what is the observed output? – joshmeranda Aug 22 '20 at 17:10
  • my expected output should be like "the top 10 unique values of each colomn should be added to the data set as new colomns" – Os Snehith Ab Aug 22 '20 at 17:15
  • and what is the behavior that you are actually seeing? – joshmeranda Aug 22 '20 at 17:18
  • x for x in data.i.value_counts() at this part i am getting error like " AttributeError: 'DataFrame' object has no attribute 'i ' – Os Snehith Ab Aug 22 '20 at 17:21
  • you are using `i` as a attribute of `data` (in `data.i.value_counts()`) this may cause the problem – Girish Dattatray Hegde Aug 22 '20 at 17:23
  • Ah, thats a simple error. You are attempting to access `i` as an attribute of `data` here: `data.i.value_counts()`. But `data` does not have an attribute named `i` – joshmeranda Aug 22 '20 at 17:24
  • how can this problem resolved if want to do the process for every column in the dataset as for every column it is difficult to write """ data."coloumnname".value_counts() """ – Os Snehith Ab Aug 22 '20 at 17:27
  • actually the data has the attributes like x1,x2,x3 so on x100,as it difficult to write the code for every column i kept the process inside the loop, how can this be resolved ,if i want to do the process for every column(attribute) of my dataset? – Os Snehith Ab Aug 22 '20 at 17:30
  • using `data.i.value_counts()` it treads `i` literally as column name, not as variable. You need `data[i]` - without `' '` to use `i` - as variable. `data[i].value_counts()`. You have similar problem in other place where you use `data['i']` with `' '` which means literally `"i"`, not variable `i` – furas Aug 22 '20 at 18:11

1 Answers1

0

If you want to use variable i which you have in for i in data.columns: then you shouldn't use data.i but data[i] (without ' ')

for i in data.columns:

    top_10 = data[i].value_counts().sort_values(ascending=False).head(10).index

Maybe it would be more readable if you would use better name ie. column_name

for column_name in data.columns:
    
    top_10 = data[column_name].value_counts().sort_values(ascending=False).head(10).index

data.i is similar to data["i"] and it means column with name literally i, not variable i.


I don't know what you try to do with nested for-loop but you should also use data[i] instead of data["i"]

    for label in top_10:
        data[label] = np.where(data[i]==label, 1, 0)

But probably you should use better method to create labels

    for number, value in enumerate(top_10):
        data[i + '_' + str(number)] = np.where(data[i].index==value, 1, 0)

It could be more readable with different names

for column_name in data.columns:
    
    top_10 = data[column_name].value_counts().sort_values(ascending=False).head(10).index

    for number, value in enumerate(top_10):
        data[column_name + '_' + str(number)] = np.where(data[column_name].index==value, 1, 0)

But without some example data it is hard to say if it is correct.


EDIT:

Minimal working example.

I use random.seed(0) to always get the same values.

I use top_3 to see all values on screen.

import pandas as pd
import random
import numpy as np

random.seed(0) #  to get the same values every time

data = pd.DataFrame({
    "A": [random.randint(0, 10) for _ in range(10)],
    "B": [random.randint(0, 10) for _ in range(10)],
})

#print(data)

for column_name in data.columns:
    #print(data[column_name].value_counts())
    top_3 = data[column_name].value_counts().sort_values(ascending=False).head(3).index
    #print(top_3)
    for number, value in enumerate(top_3, 1):
        name = column_name + '_' + str(number)
        data[name] = np.where(data[column_name]==value, 1, 0)
        
print(data)   

Result:

   A  B  A_1  A_2  A_3  B_1  B_2  B_3
0  6  9    1    0    0    0    0    0
1  6  3    1    0    0    0    0    0
2  0  8    0    0    0    0    0    1
3  4  2    0    1    0    1    0    0
4  8  4    0    0    0    0    1    0
5  7  2    0    0    1    1    0    0
6  6  1    1    0    0    0    0    0
7  4  9    0    1    0    0    0    0
8  7  4    0    0    1    0    1    0
9  5  8    0    0    0    0    0    1
furas
  • 95,376
  • 7
  • 74
  • 111