0

I have two dataframes, dframe and dframp. Dframe has 301497 rows in it and dframep has 6080 rows in it. Both dataframes are show below. I want to merge the two such that when dframep is added to dframe the new dataframe puts Nans where dframep does not have any values for that date. I have tried this:

dfall = dframe.merge(dframep, on=['ID','date','instrument','close'], how='outer')

The two merge together but the result is 307577 rows e.g. for the dates that are not in dframep there are no Nan's.

Pulling my hair out so any help would be appreciated. Im guessing it has something to do with indexing and selecting the columns correctly.

enter image description here enter image description here

Thanks

enter image description here

user7346517
  • 25
  • 1
  • 5

1 Answers1

1

I can't replicate your problem (nor understand it given your description), but try something like this ? :

dfall = pd.merge(dframe, dframep, how = 'left' ,left_on = ['ID','date','instrument','close'], right_on =['ID','date','instrument','close']

This will keep the rows of dframe, and bring the info that matches from dframp

sophocles
  • 5,806
  • 3
  • 8
  • 25
  • Thanks, this creates dfall but has no rows of dframp in it which to me says maybe the date columns dont match each other for some reason. – user7346517 Nov 24 '20 at 16:03
  • with the above code, you are using 4 columns to perform the match, so it might mean that the closing price, or the ID or the instrument etc, don't match. Is there a unique ID to merge the 2 files? Assuming that the column 'ID' acts as an identifier, why don't you perform the merge just on 'ID'? – sophocles Nov 24 '20 at 16:06
  • The matching column will be date, the IDs will also match as they are just a sequential number. Instrument will be different and so too will close in most cases. – user7346517 Nov 24 '20 at 16:21
  • Then try the above code with using only the 'date' and the 'id', in left_on and right_on. Let me know if it works. – sophocles Nov 24 '20 at 16:24
  • Here are the two dataframes side by side for the 18th Nov 2020 – user7346517 Nov 24 '20 at 16:24
  • only the 'date' and the 'id', in left_on and right_on.--- tried this and it doesnt add any rows just two columns instrument_y which are all nan and close_y which are all nan. – user7346517 Nov 24 '20 at 16:27
  • Okay then. Are you sure that there are rows that match between the 2 dataframes? Based on both data and ID? – sophocles Nov 24 '20 at 16:29
  • Date for sure and ID also, it seems like it is matching the rows from the smaller data frame and deleting them rather than adding them as another row and also adding where they don’t have any values. – user7346517 Nov 24 '20 at 16:34
  • I think go through this answer: https://stackoverflow.com/questions/53645882/pandas-merging-101, it will definitely guide you on how to get what you need. If it doesn't, then the issue is with the underlying data itself. – sophocles Nov 24 '20 at 16:39
  • Thanks for the help. Been through that answer and tried many times so most be a data issue. – user7346517 Nov 24 '20 at 16:50
  • Sorry I couldn't help anymore. If you were able to share your data I would give it a go. – sophocles Nov 24 '20 at 16:51