63

I am trying to get the following script to work. The input file consists of 3 columns: gene association type, gene name, and disease name.

cols = ['Gene type', 'Gene name', 'Disorder name']
no_headers = pd.read_csv('orphanet_infoneeded.csv', sep=',',header=None,names=cols)

gene_type = no_headers.iloc[1:,[0]]
gene_name = no_headers.iloc[1:,[1]]
disease_name = no_headers.iloc[1:,[2]]

query = 'Disease-causing germline mutation(s) in' ###add query as required

orph_dict = {}

for x in gene_name:
    if gene_name[x] in orph_dict:
        if gene_type[x] == query:
            orph_dict[gene_name[x]]=+ 1
        else:
            pass
    else:
        orph_dict[gene_name[x]] = 0

I keep getting an error that says:

Series objects are mutable and cannot be hashed

Any help would be dearly appreciated!

sophros
  • 8,714
  • 5
  • 30
  • 57
Sal
  • 651
  • 1
  • 5
  • 3
  • 2
    show us the full traceback so we can see the line on which the error is being thrown. my guess is it's `orph_dict[gene_name[x]] = 0`. the traceback would also show us the class of error being thrown. – dbliss Apr 17 '15 at 13:50

2 Answers2

32

Shortly: gene_name[x] is a mutable object so it cannot be hashed. To use an object as a key in a dictionary, python needs to use its hash value, and that's why you get an error.

Further explanation:

Mutable objects are objects which value can be changed. For example, list is a mutable object, since you can append to it. int is an immutable object, because you can't change it. When you do:

a = 5;
a = 3;

You don't change the value of a, you create a new object and make a point to its value.

Mutable objects cannot be hashed. See this answer.

To solve your problem, you should use immutable objects as keys in your dictionary. For example: tuple, string, int.

Community
  • 1
  • 1
Ella Sharakanski
  • 2,371
  • 2
  • 22
  • 43
12
gene_name = no_headers.iloc[1:,[1]]

This creates a DataFrame because you passed a list of columns (single, but still a list). When you later do this:

gene_name[x]

you now have a Series object with a single value. You can't hash the Series.

The solution is to create Series from the start.

gene_type = no_headers.iloc[1:,0]
gene_name = no_headers.iloc[1:,1]
disease_name = no_headers.iloc[1:,2]

Also, where you have orph_dict[gene_name[x]] =+ 1, I'm guessing that's a typo and you really mean orph_dict[gene_name[x]] += 1 to increment the counter.

jkitchen
  • 683
  • 9
  • 15
  • 1
    How could I apply this technique of creating the Series from the start when I am splitting into a training and testing dataset? `X_train, X_test, y_train, y_test = train_test_split(training_feature_set, training_feature_label, test_size = 0.1, random_state=42)` @http://stackoverflow.com/users/639792/jkitchen – Alvis May 03 '17 at 11:07
  • 1
    @Alvis, if your function returns DataFrames, you can still select individual items from those. Read the [docs for indexing](http://pandas.pydata.org/pandas-docs/stable/indexing.html). `.loc` or `.iloc` are probably what you want. – jkitchen May 04 '17 at 16:37
  • 1
    Thank you @jkitchen I'll check out the documentation :-) – Alvis May 05 '17 at 09:10