0

I can't quite figure out whats wrong, perhaps cause I don't completely understand text mining in the first place. The syntax tends to confuse me... but I digress, this is following a tutorial yet I've come up with an error. the one in particular well,,

Error in strwidth(words[i], cex = size[i], ...) : invalid 'cex' value

As the title implies I'm trying to use the wordcloud. However I turn up with this error when I use it. If you're curious about what data I have, I'm using the same database referenced in this post which the kind lads there helped me within SQL. But now I'm using R.... here is the code in question.

# Word cloud
#  Installin packages and libs
install.packages("SnowballC")
library("tm")
library("wordcloud")
library ("SnowballC")

#  using negative tweets and loading them into corpus
negative <- idadf(mycon, "SELECT   TEXT 
                            FROM  GOP_DEBATE 
                            WHERE GOP_DEBATE.SENTIMENT='negative'")

docs<-VectorSource(negative$TEXT)
docs<-Corpus(docs)

#  Prepocessing (cleaning)
docs <- tm_map(docs, stripWhitespace)

#   remoce white space and urls
removeInvalid<-function(x) gsub("[^\x01-\x7F]", "", x)
docs <- tm_map(docs, content_transformer(removeInvalid))

removeURL <- function(x) gsub("http[^[:space:]]*", "", x)
docs <- tm_map(docs, content_transformer(removeURL))

#   Remove punctuation
docs <- tm_map(docs, removePunctuation)
toSpace <- content_transformer(function (x , pattern ) gsub(pattern, " ", x))
docs <- tm_map(docs, toSpace, "@")   #Remove @
docs <- tm_map(docs, toSpace, "/")   #Remove /
docs <- tm_map(docs, toSpace, "\\|") #Remove |

#   remove the numbers
docs <- tm_map(docs, removeNumbers)

#   lowercasing text
docs <- tm_map(docs, tolower)

#   Stop words
docs <- tm_map(docs, removeWords, stopwords("english"))
docs <- tm_map(docs, removeWords, stopwords("SMART"))

#   additional white spaces
docs <- tm_map(docs, stripWhitespace)

#    stemming
docs <- tm_map(docs, stemDocument)

#    document matrix
dtm <- DocumentTermMatrix(docs)

# Convert dtm to a matrix
m <- as.matrix(dtm)  

# create cloud
dtms <- removeSparseTerms(dtm, 0.6)    
freq <- colSums(as.matrix(dtm)) 
dark2 <- brewer.pal(6, "Dark2")   
wordcloud(names(freq), freq, min.freq=35, max.words=100, rot.per=0.2, scale=c(0.9, 0.9), colors=dark2)

any help is appreciated!

Mia P
  • 73
  • 5
  • cex is `number indicating the amount by which plotting text and symbols should be scaled relative to the default. 1=default, 1.5 is 50% larger, 0.5 is 50% smaller, etc.`, so this needs to be a number, where it is probably NA. If you apply this [test columns for NA](https://stackoverflow.com/questions/20364450/find-names-of-columns-which-contain-missing-values), `colnames(mymatrix)[colSums(is.na(mymatrix)) > 0]` on your `dtm` it will identify NA offenders that cex can't handle. HTH – Chris Jul 22 '20 at 14:22

0 Answers0