1

So I'm rather new to R, and I'm learning how to mine text from this handy website: https://eight2late.wordpress.com/2015/05/27/a-gentle-introduction-to-text-mining-using-r/

I do have my own text set of .doc, .docx, and .xlsx files and I'm trying to mine them. They're located in a folder in my working directory called 'files', but I have already encountered an error after simply writing a few lines of code.

The code I have so far is:

library(tm)
library(readtext)

data = readtext('files')

At this point, after waiting for 25 seconds or so, I get the error:

Error: System call to 'antiword' failed (1): The Big Block Depot is damaged

and the code stops running there.

I have tried searching online for solutions but it seems like a fairly rare error and so I only found 1 possible solution at https://github.com/ropensci/antiword/issues/1 but that did not work for me.

This solution suggested that one of my files were corrupt, and suggested using the code

fixInNamespace(antiword, pos="package:antiword")

to change the error to a warning to not interrupt the reading of the files. I tried that, and at first it raised the error of

Error in as.environment(pos):
    no item called "package:antiword" on the search list

After which, I loaded the antiword library with a library(antiword) and changed the stop( to a warning(. However, when I ran the data = readtext('files') line again, it immediately raised the error

Error in is_windows() : could not find function "is_windows"

I'm at a loss here! Any help would be appreciated. Should I be using another package in this case?

Zac
  • 159
  • 1
  • 1
  • 8
  • Seems, the missing function can be found in goodmansasha's post on your linked github site. `is_windows – jay.sf Jul 17 '18 at 07:33
  • Oh yes, thank you! That helps, but now when I run the code I get an `Error: Failed to execute 'C:\.......library\3.5\antiword\bin\antiword' (The system cannot find the file specified)` I've checked in the folder and I found two files, antiword32.exe and antiword64.exe – Zac Jul 18 '18 at 05:13

1 Answers1

0

I had the same problem with my code, where I tried to get a doc. file in R. I also used the readtext library. What helped me was converting the Word documents I was trying to get into R from doc. to docx. When I ran the same code after it worked.

Dharman
  • 21,838
  • 18
  • 57
  • 107
  • Take a look again at [answer] to see how you could make this more of a specific answer than a general comment – camille May 13 '21 at 14:30