I have the following code which takes the key terms listed in 'job_titles'. It then uses these terms to filter out any strings that do not contain the terms in 'job_titles' in a column called 'Jobtitle'.
The code was previously working, however now it is returning this error:
ValueError: Cannot mask with non-boolean array containing NA / NaN values
I was wondering if anyone could provide guidance on how to troubleshoot this?
Note that the dataframe is called glassdoor.
job_titles = ['data', 'analytics', 'machine learning']
# Creating masks for each job title to identify where they appear
job_masks = [glassdoor.Jobtitle.str.contains(Jobtitle, flags=re.IGNORECASE, regex=True) for Jobtitle in job_titles]
# Combining all masks where any value is True, return True
combined_mask = np.vstack(job_masks).any(axis=0)
combined_mask
# Applying the mask to the dataset
glassdoor = glassdoor[combined_mask].reset_index(drop=True)
listings_after = glassdoor.shape[0]
print(f'After refining job titles there were {listings_after} job listings.')
glassdoor.head(20)