3

I've tried reading similar questions before asking, but I'm still stumped. Any help is appreaciated.

Input: I have a pandas dataframe with a column labeled 'radon' which has values in the range: [0.5, 13.65]

Output: I'd like to create a new column where all radon values that = 0.5 are changed to a random value between 0.1 and 0.5

I tried this:

df['radon_adj'] = np.where(df['radon']==0.5, random.uniform(0, 0.5), df.radon)

However, i get the same random number for all values of 0.5


I tried this as well. It creates random numbers, but the else statment does not copy the original values

df['radon_adj'] = df['radon'].apply(lambda x: random.uniform(0, 0.5) if x == 0.5 else df.radon)
HolaGonzalo
  • 325
  • 2
  • 5
  • 9

1 Answers1

4

One way would be to create all the random numbers you might need before you select them using where:

>>> df = pd.DataFrame({"radon": [0.5, 0.6, 0.5, 2, 4, 13]})
>>> df["radon_adj"] = df["radon"].where(df["radon"] != 0.5, np.random.uniform(0.1, 0.5, len(df)))
>>> df
   radon  radon_adj
0    0.5   0.428039
1    0.6   0.600000
2    0.5   0.385021
3    2.0   2.000000
4    4.0   4.000000
5   13.0  13.000000

You could be a little smarter and only generate as many random numbers as you're actually going to need, but it probably took longer for me to type this sentence than you'd save. (It takes me 9 ms to generate ~1M numbers.)

Your apply approach would work too if you used x instead of df.radon:

>>> df['radon_adj'] = df['radon'].apply(lambda x: random.uniform(0.1, 0.5) if x == 0.5 else x)
>>> df
   radon  radon_adj
0    0.5   0.242991
1    0.6   0.600000
2    0.5   0.271968
3    2.0   2.000000
4    4.0   4.000000
5   13.0  13.000000
DSM
  • 291,791
  • 56
  • 521
  • 443
  • Is it possible for them to differ? (Non-rhetorical, BTW-- I can't remember what methods are nan-aware and what ones aren't. I don't *think* `len` cares, but I wouldn't bet twenty bucks on it.) – DSM Nov 24 '14 at 19:59