Questions tagged [stringi]

stringi is THE R package for fast, correct, consistent and convenient string/text processing in each locale and any native character encoding. The use of the ICU library gives R users a platform-independent set of functions known to Java, Perl, Python, PHP, and Ruby programmers.

's stringi package provides a platform independent way of manipulating strings. It is built on the library and has a syntax inspired by the package.

Repositories

Other resources

Related tags

237 questions
0
votes
0 answers

R: How does the regex "\\b"%s+%c("character","...")%s+%"\\b" work?

I was looking for an option to replace multiple patterns and found some answers in the first of below links. One of the suggested answers uses the stringr package. I was interested to check options with stringi and found one in the documentation…
Manuel Bickel
  • 2,056
  • 2
  • 9
  • 22
0
votes
0 answers

Filter text column based on keywords vector

Here is the dput() info structure(list(Text = c("bandwidth issues. issues with vpn", "be more customer focussed reduce prices and offer same deas to existing customers that they use to attract new ones", "be more helpful and provide a better…
Shery
  • 1,459
  • 2
  • 18
  • 40
0
votes
0 answers

string_count and regex in R

I want to use str_count from the stringi package to count special symbols in a string. Something like this: library(stringi) data$var1 <- stri_count(data$var, pattern="[[:punct:]]") I'm getting the following error. Error in stri_count(data$var,…
Prometheus
  • 1,671
  • 1
  • 25
  • 47
0
votes
0 answers

Installing readtext package

I cannot install the readtext package, and I have tried two ways: when trying 'install.packages (readtext)' a message is displayed saying that this is not available for R version 3.3.3 When trying to install it from github through devtools, it…
Ria
  • 11
  • 1
0
votes
0 answers

Upper Middlename using stringi and data.table

I have a data.table that looks like this: require(data.table) require(stringi) DT <- data.table(ID = c(1,2,3), Name =c("john peter", "joe", "Ann cathrine")) I would like to manipulate the Name column such that I upper the first character of names…
HannesZ
  • 409
  • 2
  • 5
  • 11
0
votes
1 answer

Extracting all the information from an uncommon JSON structure in R

In a previous post (How do I read multiple JSON structures contained in one file?) I have asked about an uncommon data structure (or at least uncommon for R) I have a txt file with this structure: identifier ### part A ### part B A simplification…
pachamaltese
  • 2,702
  • 4
  • 24
  • 48
0
votes
1 answer

Strings in R - insert space between selected alphabet character and numeric characters

I have hospital ward data that needs to be consistent. The first numeric character is the floor number, the alphabet characters that follow is the ward acronym, and the final two numeric characters are the bed number. So 2EA 28 would be floor 2,…
monkeyshines
  • 966
  • 1
  • 7
  • 22
0
votes
1 answer

R lapply using stringi and rbind

I'd like to split out some data within a data frame by a specific string and count the frequency. After toying with a few methods I've come up with a method, but there's a slight error in my results. Example: Data frame data file: data abc…
Oli Paul
  • 478
  • 1
  • 4
  • 23
0
votes
1 answer

stri_replace_all_regex won't accept results from imported pattern replacement file

I have an applescript that finds and replaces about a hundred terms. Using regular expressions. I'd like to import this find and replace functions in R. So, in ScriptEditor, I've saved the AppleScript as a text file and imported this into R via…
spindoctor
  • 1,359
  • 1
  • 11
  • 27
0
votes
0 answers

Remove LineFeed with gsub in R

I have a 1 million lines file, which once read with readLines can be condensed to: prob <- readLines("offendingFile.txt") dput(prob) c("000005928484|Name Nmee Leonel |YUMBO |El Placer de El Cerrito ALG 76248 …
PavoDive
  • 5,293
  • 20
  • 50
0
votes
0 answers

data.table - key vs. concatenated list in "by"

I have read FAQ but it is still now clear on what the implications are in using key vs. using that key in concatenated list of reasonably large data.table? From my experiment, I see only performance but not sure if there is any thing else. #…
0
votes
2 answers

Convert a large scale characters to date-format-like characters in r

I have a data frame df with 10 million rows. I want to convert the character format of "birthday" column from "xxxxxxxx" to "xxxx-xx-xx". eg. from "20051023" to "2005-10-23". I can use df$birthday <- lapply(df$birthday, as.Date, "%Y%m%d") to do…
Eric Chang
  • 1,872
  • 4
  • 15
  • 19
0
votes
1 answer

R - How to count the occurence of a specific string for large textfiles

I am trying to find the occurence of ~10.000 different locations in a list of emails. What I need is one vector with the most frequently mentioned location per eMail, one with the second most frequent and one with the third ! Since my dataset is…
Clemens
  • 9
  • 3
0
votes
1 answer

How to use Regex to strip punctuation without tainting UTF-8 or UTF-16 encoded text like chinese?

How do I strip punctuation from ASCII and UTF-8 encoded strings without messing up the UTF-8 original characters, specifically Chinese, in R. text <- "Longchamp Le Pliage 肩背包 (小)" stri_replace_all_regex(text, '\\p{P}', '') results in: Longchamp Le…
Zeke
  • 89
  • 6
0
votes
0 answers

Error when installing ggplot2, need c compiler

I'm trying to install ggplot2 on a collaborator's Mac computer. I get this error: configure: error: no acceptable C compiler found in $PATH See `config.log' for more details ERROR: configuration failed for package ‘stringi’ It looks like I need to…
blakeoft
  • 2,262
  • 1
  • 11
  • 15
1 2 3
15
16