0

:Edit: fixed a misunderstanding on my part - i am getting a nested list, not an array. i'm working with a function in a for loop - bootstrapping some model predictions.

code looks like this:

def revenue(product):
revenue = predict * 4500
profit = revenue - 500000
return profit

and the loop i am feeding it into looks like this:

# set up a loop to select 500 random samples and train our region 2 data set 

model = LinearRegression(fit_intercept = True, normalize = False)
features = r2_test.drop(['product'],axis=1)

    values = []
    for i in range(1000):
        subsample = r2_test.sample(500,replace=False)
        features = subsample.drop(['product'],axis=1)
        predict = model2.predict(features)
        result = (revenue(predict))
        values.append(result)

so doing a 1000 loop of predictions on 500 samples from this dataframe:

   id       f0          f1            f2      product
0   74613   -15.001348  -8.276000   -0.005876   3.179103
1   9753    14.272088   -3.475083   0.999183    26.953261
2   93502   6.263187    -5.948386   5.001160    134.766305
3   33405   -13.081196  -11.506057  4.999415    137.945408
4   16486   12.702195   -8.147433   5.004363    134.766305
5   27901   -3.327590   -2.205276   3.003647    84.038886
6   69620   -11.142655  -10.133399  4.002382    110.992147
7   78940   4.234715    -0.001354   2.004588    53.906522
8   56159   13.355129   -0.332068   4.998647    134.766305
9   73142   1.069227    -11.025667  4.997844    137.945408
10  12663   11.777049   -5.334084   2.003033    53.906522
11  39849   16.320755   -0.562946   -0.001783   0.000000
12  61800   7.736313    -6.093374   3.982531    107.813044
13  72213   6.695604    -0.749449   -0.007630   0.000000
14  5479    -10.985487  -5.605994   2.991130    84.038886
15  6297    -0.347599   -6.275884   -0.003448   3.179103
16  88123   12.300570   2.944454    2.005541    53.906522
17  68352   8.900460    -5.632857   4.994324    134.766305
18  99029   -13.412826  -4.729495   2.998590    84.038886
19  64238   -4.373526   -8.590017   2.995379    84.038886

now, once i have my output, i want to select the top 200 predictions from each iteration, i'm using this loop:

# calculate the max value of each of the 500 iterations, then total them for the total profit
top_200 = []
for i in range(0,500):
    profits = values.nlargest(200,[i],keep = 'all')
    top_200.append(profits)

the problem i am running into is - when i feed values into the top_200 loop, i end up with an array of the selected 200 by column:

    [               0              1              2              3    \
 628  125790.297387  -10140.964686 -361625.210913 -243132.040492   
 32   125429.134599 -368765.455544 -249361.525792 -497190.522207   
 815  124522.095794   -1793.660411  -11410.126264  114928.508488   
 645  123891.732231  115946.193531  104048.117460 -246350.752024   
 119  123063.545808 -124032.987348 -367200.191889 -131237.863430   
 ..             ...            ...            ...            ...   

but i'd like to turn it into a dataframe, however, i haven't figured out how to do that while preserving the structure where 0 has it's 200 values, 1 has it's 200 values, etc.

i thought i could do something like:

top_200 = pd.DataFrame(top_200,columns= range(0,500))

and it gives me 500 columns, but only column 0 has anything in it and i end up with a [500,500] dataframe instead of the anticipated 200 rows by 500 columns.

i'm fairly sure there is a good way to do this, but my searching thus far has not turned anything up. I also am not sure what i am looking for is called so, i'm not sure what exactly i am looking for.

any input would be appreciated! Thanks in advance.

:Further editing: so now that i know i'm getting a lists of lists, not an array, i thought i'd try to write to a dataframe instead:

# calculate the top 200 values of each of the 500 iterations
top_200 = pd.DataFrame(columns=['profits'])
for i in range(0,500):
    top_200.loc[i] = i
    profits = values.nlargest(200,[i],keep = 'all')
    top_200.append(profits)

top_200.head()

but i've futzed something up here as my results are:

profits
0   0
1   1
2   2
3   3
4   4

where my expected results would be something like:

col 1           col2            col3    
0   first n_largest     first n_largest     first n_largest 
1   second n_largest    second n_largest    second n_largest
3   third n_largest     third n_largest     third n_largest
seuadr
  • 101
  • 6
  • Could this post be helpful? https://stackoverflow.com/questions/20763012/creating-a-pandas-dataframe-from-a-numpy-array-how-do-i-specify-the-index-colum – Laurent Mar 22 '21 at 07:04
  • @CygnusX thanks for the link - it helped me understand.. i don't have an array, i have a nested list. this at least gives me a direction to go :) – seuadr Mar 22 '21 at 11:26

1 Answers1

0

So, After doing some research based on @CygnusX 's recommended question i figured out that i was laboring under the impression that i had an array as the output, but of course top-200 = [] is a list, which, when combined with the nlargest gives me a list of lists.

Now that i understood the problem better, i converted the list of lists into a dataframe, and then transposed the data - which gave me the results i was looking for.

# calculate the max value of each of the 500 iterations, then total them for the total profit

top_200 = []
for i in range(0,500):
    profits = (values.nlargest(200,[i],keep = 'all')).mean()
    top_200.append(profits)

test = pd.DataFrame(top_200)
test = test.transpose()

output (screenshot, because, 500 columns.):

enter image description here

there is probably a more elegant way to accomplish this, like not using a list but a dataframe, but, i couldn't get the .append to work the way i wanted in a dataframe, since i wanted to preserve the list of 200 nlargest, not just have a sum or a mean. (which the append worked great for!)

seuadr
  • 101
  • 6
  • Indeed, you could use a dataframe, but then you can't use .append to append a list to it, only another dataframe or series, which is why you didn't get the expected result (I think). – Laurent Mar 22 '21 at 13:23
  • @CygnusX i'll take your word for it :D i am still learning, so i'm not totally sure. on the bright side, with your assistance i ended up getting where i wanted to go, even if it might be uh.. a little less elegant than it likely could be - function first, then form, right? :D thanks again for your help! – seuadr Mar 22 '21 at 14:46