1

I can't sum fee based on country, currency and product id from dfJANUARY and dfFEBRUARY. python said 'array is too big'

my file.txt as dfJANUARY has 35,6 mb

my file.txt as dfFEBRUARY has 36,3 mb

In[1]: dfJANUARY
Out[1]
  Country         PRODUCT ID    currency   fee

0  Arab Emirate    COCA COLA      USD       1000
1  Arab Emirate    COCA COLA      USD       1000
2  Arab Emirate    COCA COLA      USD       1009

86212 rows × 6 columns (unhide country: America ; PRODUCT ID: Fanta ; currency: SGD) 

In[2]: dfFEBRUARY
Out[2]:
  Country         PRODUCT ID    currency   fee

0  Arab Emirate    COCA COLA      USD       2000
1  Arab Emirate    COCA COLA      USD       2000
2  Arab Emirate    COCA COLA      USD       2000

86212 rows × 6 columns (unhide country: America ; PRODUCT ID: Fanta; currency: SGD)

I've tried made code but it's fail

df = pd.merge(dfJANUARY,dfFEBRUARY, on = "fee", how = "inner")

* when i merge ther's warning:
valueerror array is too big arr.size * arr.dtype.itemsize

#made value of total
TOTAL = dfJANUARY[fee] + dfFEBRUARY[fee] 

#made new column, it's name "TOTAL"
df["TOTAL"] = TOTAL

#made Pivot
gdf = df.pivot_table(index = ["PRODUCT ID","Country","currency"],values = ("TOTAL"), aggfunc="sum", fill_value=0)

so this is my expactation, i can sum income based on type of currency, product id, country. So i will get TOTAl

can you help me?

**expect**
 dfEXPECT
                                           TOTAL
  Country         PRODUCT ID    currency   

0  Arab Emirate    COCA COLA      USD       10000
                                  SGD       15000
1  Arab Emirate    Fanta          USD       20000
                                  SGD       30000
2  America         COCA COLA      USD       90000
                                  SGD       95000
3  America         Fanta          USD       80000
                                  SGD       75000
86212 rows × 6 columns

1 Answers1

0

In your case, you want to pd.concat the dataframes (putting the second "below" the first). I'm surprised that pd.merge failed, but it is harder to merge (because it is a more general function).
Try

df = pd.concat([df1,df2])
df.pivot_table(index = ["PRODUCT ID","Country","currency"],values = ("TOTAL"), aggfunc="sum", fill_value=0)

and see if it helps...

Itamar Mushkin
  • 2,420
  • 2
  • 12
  • 27
  • In this case, you don't even need TOTAL. You can solve your problem by summing the total for each df separately and then joining them, but that's another solution, not part of the same one. – Itamar Mushkin May 26 '19 at 06:13
  • Also, before asking others to check your code, please run it yourself and see if it works (or, what fails). – Itamar Mushkin May 26 '19 at 06:14
  • thanks sir for your advice, can i ask sir, what is different between join, merge and column? – charisma bathara May 26 '19 at 06:27
  • In a nutshell: `merge` is the most generic (and so, the heaviest), `join` joins on a common index (puts the new dataframe 'right' of the old one), and `concatenate` joins on columns (puts the one one 'under' the old one). – Itamar Mushkin May 26 '19 at 07:28
  • Read more here, for example: https://stackoverflow.com/questions/38256104/differences-between-merge-and-concat-in-pandas https://stackoverflow.com/questions/22676081/what-is-the-difference-between-join-and-merge-in-pandas?noredirect=1&lq=1 – Itamar Mushkin May 26 '19 at 07:29
  • in statement values on pivot, what should i do if i will sum up 3 or more variable in to 1? – charisma bathara May 26 '19 at 10:45
  • I'm not sure I understand your question. If it's a new one, ask a new question. If you think it's a very basic question, take the time to read a tutorial on `pandas` (we've all been there :-) ). If you're asking what to do if you have more than one dataframe you want to concatenate before pivot - just change `pd.concat([df1, df2])` to however many dfs you have, like `pd.concat([df1, df2, df3])` – Itamar Mushkin May 26 '19 at 11:11