I have a survey to analyze that was completed by participants on SurveyMonkey. Unfortunately, the way the data are organized is not ideal, in that each categorical response for each question has its own column.
Here, for example, are the first few lines of one of the responses in the dataframe:
How long have you been participating in the Garden Awards Program? \
0 One year
1 NaN
2 NaN
3 NaN
4 NaN
Unnamed: 10 Unnamed: 11 Unnamed: 12 \
0 2-3 years 4-5 years 5 or more years
1 NaN NaN NaN
2 NaN 4-5 years NaN
3 2-3 years NaN NaN
4 NaN NaN 5 or more years
How did you initially learn of the Garden Awards Program? \
0 I nominated my garden to be evaluated
1 NaN
2 I nominated my garden to be evaluated
3 NaN
4 NaN
Unnamed: 14 etc...
0 A friend or family member nominated my garden ...
1 A friend or family member nominated my garden ...
2 NaN
3 NaN
4 NaN
This question, How long have you been participating in the Garden Awards Program?
, has valid responses: one year
, 2-3 years
, etc., and are all found on the first row as a key to which column holds which value. This is the first problem. (Similarly for How did you initially learn of the Garden Awards Program?
, where valid responses are: I nominated my garden to be evaluated
, A friend or family member nominated my garden
, etc.).
The second problem is that the attached columns for each categorical response all are Unnamed: N
, where N is as many columns as there are categories associated with all questions.
Before I start remapping and flattening/collapsing the columns into a single one per question, I was wondering if there was any other way of dealing with survey data presented like this using Pandas. All my searches pointed to the SurveyMonkey API, but I don't see how that would be useful.
I am guessing that I will need to flatten the columns, and thus, if anyone could suggest a method, that would be great. I'm thinking that there is a way to keep grabbing all columns belonging to a categorical response by grabbing an adjacent column until Unnamed
is no longer in the column name, but I am clueless how to do this.