-1

I have number as 2.300 or 2.300.456 that I want to convert to 2300 and 2300456 in my dataframe and for that I tried to use regex but this is not working... I used this expression:

  • \d+{1-3}.\d+{3} for 2.300
  • \d+{1-3}.\d+{3}.\d+{3} for 2.300.456

does anyone has a better solution for my problem?? Thank you

Wiktor Stribiżew
  • 484,719
  • 26
  • 302
  • 397
Baala
  • 1
  • 1
  • FYI, in regular expressions, the dot/period character by itself acts as a wildcard and matches any single character. So your regex actually matches `2.300`, `2a300`, `2!300`, etc. To match an actual dot you need to escape the wildcard character with a backslash `\.` - this would then only match `2.300` exactly. Your first regex should be `\d+{1-3}\.\d+{3}` if you were to use it, but as others have said, you don't need regex for this problem at all. – BadHorsie Jan 13 '20 at 12:50

2 Answers2

0

If you are using pandas. You can just use this

df['col'] = df['col'].str.replace('.', '')

Generally, for any string, you can use the string method replace in python. No need for regex in this case.

3li
  • 550
  • 3
  • 12
0

I suppose you have numbers with point as strings. In this case you don't need regex. Consider this code:

number = '2.300.456'
number = number.replace('.', '') # number is now the string '2300456'
number = int(number) # number is now the integer 2300456
jpaul
  • 299
  • 1
  • 5
  • Yes but I have a lot of these kind of numbers in my dataframe ( newspapers articles dataframe: --> with first column = title and the second column = the full text). So I have to "detect " all numbers expression and remove the dot but in my dataset there are other kinds of string with a dot that I cant remove... – Baala Jan 13 '20 at 14:44
  • Ah ok I see : you have strings with points that you want to keep (in an e-mail address for instance) and strings with points that you want to delete (in numbers). – jpaul Jan 13 '20 at 15:21