71

I need a regular expression that I can use in VBScript and .NET that will return only the numbers that are found in a string.

For Example any of the following "strings" should return only 1231231234

  • 123 123 1234
  • (123) 123-1234
  • 123-123-1234
  • (123)123-1234
  • 123.123.1234
  • 123 123 1234
  • 1 2 3 1 2 3 1 2 3 4

This will be used in an email parser to find telephone numbers that customers may provide in the email and do a database search.

I may have missed a similar regex but I did search on regexlib.com.

[EDIT] - Added code generated by RegexBuddy after setting up musicfreak's answer

VBScript Code

Dim myRegExp, ResultString
Set myRegExp = New RegExp
myRegExp.Global = True
myRegExp.Pattern = "[^\d]"
ResultString = myRegExp.Replace(SubjectString, "")

VB.NET

Dim ResultString As String
Try
      Dim RegexObj As New Regex("[^\d]")
      ResultString = RegexObj.Replace(SubjectString, "")
Catch ex As ArgumentException
      'Syntax error in the regular expression
End Try

C#

string resultString = null;
try {
    Regex regexObj = new Regex(@"[^\d]");
    resultString = regexObj.Replace(subjectString, "");
} catch (ArgumentException ex) {
    // Syntax error in the regular expression
}
IAdapter
  • 55,820
  • 69
  • 166
  • 236
Brian Boatright
  • 33,940
  • 33
  • 76
  • 102

8 Answers8

193

In .NET, you could extract just the digits from the string. Like this:

string justNumbers = new String(text.Where(Char.IsDigit).ToArray());
Richard Garside
  • 82,523
  • 9
  • 75
  • 82
Matt Hamilton
  • 188,161
  • 60
  • 377
  • 317
  • 2
    ps. I know I've answered a VB question with C#, but since it's .NET I figured it's worth putting the idea out there. RegEx seems like overkill for something this simple. – Matt Hamilton May 10 '09 at 01:36
  • I actually needed VBScript to use in a Classic ASP page but I appreciate your answer. – Brian Boatright May 10 '09 at 01:45
  • 5
    I was about to post a comment along the lines of, "/Clearly/, regex would be faster for this", but I ran a (unscientific) benchmark in Mono, and Linq won (about half the duration the regex took). :) So my hat is off to you. – Matthew Flaschen May 10 '09 at 02:11
  • 9
    +10. Just a heads up for everyone out there, don't forget `using System.Linq;` for this. For me, VS2010 just said there's no such method "Where" for strings, and IntelliSense wouldn't give me the auto-add for the using statement. – DanM7 May 15 '13 at 17:00
  • You will also need using System.Linq.Expressions: using System.Linq; using System.Linq.Expressions; – WoodsLink Jul 01 '16 at 16:45
  • thanks @MattHamilton - what if you just want it in number format, not a string? – BKSpurgeon Nov 09 '16 at 23:32
14

As an alternative to the main .Net solution, adapted from a similar question's answer:

string justNumbers = string.Concat(text.Where(char.IsDigit));
Community
  • 1
  • 1
Teodor Tite
  • 1,565
  • 3
  • 23
  • 27
14

I don't know if VBScript has some kind of a "regular expression replace" function, but if it does, then you could do something like this pseudocode:

reg_replace(/\D+/g, '', your_string)

I don't know VBScript so I can't give you the exact code but this would remove anything that is not a number.

EDIT: Make sure to have the global flag (the "g" at the end of the regexp), otherwise it will only match the first non-number in your string.

Sasha Chedygov
  • 116,670
  • 26
  • 98
  • 110
  • Thanks! That's exactly what I was looking to do. I knew it had to be somewhat simple. I'm using RegExBuddy and will try to test it and then post the VBScript code. I believe VBScript will do a replace. – Brian Boatright May 10 '09 at 01:09
  • 2
    If you want to do it with .NET classes, it's basically re = Regex("\D"); re.Replace("123 123 1234", ""). Remember to cache your Regex objects (don't compile them every time the method is called). – Matthew Flaschen May 10 '09 at 01:11
7

Note: you've only solved half the problem here.

For US phone numbers entered "in the wild", you may have:

  • Phone numbers with or without the "1" prefix
  • Phone numbers with or without the area code
  • Phone numbers with extension numbers (if you blindly remove all non-digits, you'll miss the "x" or "Ext." or whatever also on the line).
  • Possibly, numbers encoded with mnemonic letters (800-BUY-THIS or whatever)

You'll need to add some smarts to your code to conform the resulting list of digits to a single standard that you actually search against in your database.

Some simple things you could do to fix this:

  • Before the RegEx removal of non-digits, see if there's an "x" in the string. If there is, chop everything off after it (will handle most versions of writing an extension number).

  • For any number with 10+ digits beginning with a "1", chop off the 1. It's not part of the area code, US area codes start in the 2xx range.

  • For any number still exceeding 10 digits, assume the remainder is an extension of some sort, and chop it off.

  • Do your database search using an "ends-with" pattern search (SELECT * FROM mytable WHERE phonenumber LIKE 'blah%'). This will handle sitations (although with the possibility of error) where the area code is not provided, but your database has the number with the area code.

richardtallent
  • 32,451
  • 13
  • 78
  • 116
  • 1
    true. I did add something after the regex that returned the entire string if it was 10 digits or right(string,10) if it was longer. you last suggestion is a good one and something I will add. thanks! +1 – Brian Boatright May 10 '09 at 17:32
  • Great points! I added my submission down below to solve this problem. –  Dec 20 '15 at 21:34
1

By the looks of things, your trying to catch any 10 digit phone number....

Why not do a string replace first of all on the text to remove any of the following characters.

<SPACE> , . ( ) - [ ] 

Then afterwards, you can just do a regex search for a 10 digit number.

\d{10}
Eoin Campbell
  • 40,912
  • 17
  • 95
  • 151
0

In respect to the points made by richardtallent, this code will handle most of your issues in respect to extension numbers, and the US country code (+1) being prepended.

Not the most elegant solution, but I had to quickly solve the problem so I could move on with what I'm doing.

I hope it helps someone.

 Public Shared Function JustNumbers(inputString As String) As String
        Dim outString As String = ""
        Dim nEnds As Integer = -1

        ' Cycle through and test the ASCII character code of each character in the string. Remove everything non-numeric except "x" (in the event an extension is in the string as follows):
        '    331-123-3451 extension 405  becomes 3311233451x405
        '    226-123-4567 ext 405        becomes 2261234567x405
        '    226-123-4567 x 405          becomes 2261234567x405
        For l = 1 To inputString.Length
            Dim tmp As String = Mid(inputString, l, 1)
            If (Asc(tmp) >= 48 And Asc(tmp) <= 57) Then
                outString &= tmp
            ElseIf Asc(tmp.ToLower) = 120
                outString &= tmp
                nEnds = l
            End If
        Next


        ' Remove the leading US country code 1 after doing some validation
        If outString.Length > 0 Then
            If Strings.Left(outString, 1) = "1" Then

                ' If the nEnds flag is still -1, that means no extension was added above, set it to the full length of the string
                ' otherwise, an extension number was detected, and that should be the nEnds (number ends) position.
                If nEnds = -1 Then nEnds = outString.Length

                ' We hit a 10+ digit phone number, this means an area code is prefixed; 
                ' Remove the trailing 1 in case someone put in the US country code
                ' This is technically safe, since there are no US area codes that start with a 1. The start digits are 2-9
                If nEnds > 10 Then
                    outString = Right(outString, outString.Length - 1)
                End If
            End If
        End If

        Debug.Print(inputString + "          : became : " + outString)

        Return outString
    End Function
0

The simplest solution, without a regular expression:

public string DigitsOnly(string s)
   {
     string res = "";
     for (int i = 0; i < s.Length; i++)
     {
       if (Char.IsDigit(s[i]))
        res += s[i];
     }
     return res;
   }
Nur.B
  • 9
  • 3
0

Have you gone through the phone nr category on regexlib. Seems like quite a few do what you need.

Ólafur Waage
  • 64,767
  • 17
  • 135
  • 193