-1

I have this sample grepl() code that differentiates between male and females name. The following code is given and it works but I am unable to understand how it works.

name = c("Braund, Mr. Owen Harris",
         "Cumings, Mrs. John Bradley (Florence Briggs Thayer)")

grepl("\\(.*?\\)", name)
# [1] FALSE  TRUE
KenHBS
  • 5,620
  • 6
  • 30
  • 42
  • I mean.. _Aren't they all duplicates_ wicter ? Just mark them all dups.. –  Jul 24 '17 at 00:24

2 Answers2

0

The matching is based on the presence of (, zero or more characters (.*) that follows it and followed by a closing )). Here, it assumes that the female names have the braces. We can also match based on the Mrs.

grepl("\\bMrs\\.", name)
#[1] FALSE  TRUE
akrun
  • 674,427
  • 24
  • 381
  • 486
0

Your code doesn't differentiate between male and female names.

"\\(.*?\\)" is a regular expression. It is a powerful way of searching for patterns in large texts (like CTRL + F)

grepl("\\(.*?\\)", name) searches for an element in names that conforms to having an opening bracket (, followed by a number of characters, followed by a closing bracket ).

So this regular expression does not distinguish between male and female names, it distinguishes between elements with ( .. something something .. ) and elements without such a pattern.

KenHBS
  • 5,620
  • 6
  • 30
  • 42