0

I've a Database.csv file with one column and 3 rows, those are data exported from salesforce with simple-salesforce, I try to get the 'Name' Value from the OrderedDict cell data('Name', 'Demand').

Dataframe

Type__c
    OrderedDict([('attributes', OrderedDict([('type', 'Type__c'), ('url', '/services/data/v42.0/sobjects/Type__c/1234400000001ddAAA')])), ('Name', 'Demand')])
    OrderedDict([('attributes', OrderedDict([('type', 'Type__c'), ('url', '/services/data/v42.0/sobjects/Type__c/1234400000001ddAAA')])), ('Name', 'Demand')])
    OrderedDict([('attributes', OrderedDict([('type', 'Type__c'), ('url', '/services/data/v42.0/sobjects/Type__c/123430000004C93AAE')])), ('Name', 'Stand')])

with below code i read the csv file and for each row i am assigning the value that contains "OrderedDict([....) to name, and i want to print the "Name" field value,

import pandas as pd
from collections import OrderedDict
df = pd.read_csv('Database.csv')
for index, row in df.iterrows():
    name = df._get_value(index, 'Type__c')
    base=OrderedDict([('attributes', OrderedDict([('type', 'Type__c'), ('url', '/services/data/v42.0/sobjects/Type__c/1234500000001erAAA')])), ('Name', 'Private')])
    print("\n",name['Name'])
    print(base['Name'])

Results:

.
print("\n",name['Name'])
    TypeError: string indices must be integers

when i change print("\n",name['Name']) to print("\n",name) to test the manual added OrderedDict, i can see it works

import pandas as pd
from collections import OrderedDict
df = pd.read_csv('Database.csv')
for index, row in df.iterrows():
    name = df._get_value(index, 'Type__c')
    test=OrderedDict([('attributes', OrderedDict([('type', 'Type__c'), ('url', '/services/data/v42.0/sobjects/Type__c/1234500000001erAAA')])), ('Name', 'Private')])
    ***print("\n",name)***
    print(test['Name'])

Result:

 OrderedDict([('attributes', OrderedDict([('type', 'Type__c'), ('url', '/services/data/v42.0/sobjects/Type__c/1234400000001ddAAA')])), ('Name', 'Demand')])
Private

 OrderedDict([('attributes', OrderedDict([('type', 'Type__c'), ('url', '/services/data/v42.0/sobjects/Type__c/1234400000001ddAAA')])), ('Name', 'Demand')])
Private

 OrderedDict([('attributes', OrderedDict([('type', 'Type__c'), ('url', '/services/data/v42.0/sobjects/Type__c/123430000004C93AAE')])), ('Name', 'Stand')])
Private

I've a file with thousands of rows and there must be way to create a new column named "Name" and add the text data

at the End i want to achieve FROM

Type__c
        OrderedDict([('attributes', OrderedDict([('type', 'Type__c'), ('url', '/services/data/v42.0/sobjects/Type__c/1234400000001ddAAA')])), ('Name', 'Demand')])
        OrderedDict([('attributes', OrderedDict([('type', 'Type__c'), ('url', '/services/data/v42.0/sobjects/Type__c/1234400000001ddAAA')])), ('Name', 'Demand')])
        OrderedDict([('attributes', OrderedDict([('type', 'Type__c'), ('url', '/services/data/v42.0/sobjects/Type__c/123430000004C93AAE')])), ('Name', 'Stand')])

TO

Name
Demand
Demand
Stand
Pamuk
  • 3
  • 1

1 Answers1

0

Welcome to the SO-community Pamuk!

Rather than iterating through the rows of a dataframe, it is much more efficient to "apply" a particular function to an entire column (or even subset of a dataframe). That way, pandas will handle performance for you, and it is usually more readable (since you don't have to add any iterating logic).

Here is how you can obtain that "Name" column with apply and a quick lambda function (only the last line is relevant for you, the rest is for replicating your sample dataframe):

from collections import OrderedDict
import pandas as pd

data = [
    OrderedDict([('attributes', OrderedDict([('type', 'Type__c'), ('url', '/services/data/v42.0/sobjects/Type__c/1234400000001ddAAA')])), ('Name', 'Demand')]),
    OrderedDict([('attributes', OrderedDict([('type', 'Type__c'), ('url', '/services/data/v42.0/sobjects/Type__c/1234400000001ddAAA')])), ('Name', 'Demand')]),
    OrderedDict([('attributes', OrderedDict([('type', 'Type__c'), ('url', '/services/data/v42.0/sobjects/Type__c/123430000004C93AAE')])), ('Name', 'Stand')]),
]

df = pd.DataFrame({"Type__c": data})
df["Name"] = df["Type__c"].apply(lambda x: x["Name"])

See Pandas Series.apply since df["Type__c"] docs for more examples and details.

For more complicated row-wise operations that involve using more columns at once, you can check the docs for DataFrame.apply and apply a function to the dataframe itself (or a subset of it).

For a more readable part of the docs that is very relevant to what you are doing, check the Pandas User Guide section on Function Application.

Dharman
  • 21,838
  • 18
  • 57
  • 107
WillMonge
  • 930
  • 10
  • 19
  • Hi, thank you for welcoming, the code sample of yours works well as script, unfortunately my problem is when i read the dataframe from a csv, i am getting the below code error with your code: at df = pd.DataFrame({"Type__c": data}) line and error is ValueError: If using all scalar values, you must pass an index. I assume it is if you write in the code OrderedDict it recoginize this and process but if i try to read the text from a dataframe it does not run the OrderedDict. – Pamuk Dec 25 '20 at 10:55
  • I just tried saving the data to a file and reading from there and it worked fine for me. By the way, you should only be using the last line of code `df["Name"] = df["Type__c"].apply(lambda x: x["Name"])`. That error sounds more like the data variable is not a list (or iterable) (see this question --> https://stackoverflow.com/questions/17839973/constructing-pandas-dataframe-from-values-in-variables-gives-valueerror-if-usi ). – WillMonge Dec 26 '20 at 20:22