39

I am trying to come up with a regular expression to match Bitcoin addresses according to these specs:

A Bitcoin address, or simply address, is an identifier of 27-34 alphanumeric characters, beginning with the number 1 or 3 [...]

I figured it would look something like this

/^[13][a-zA-Z0-9]{27,34}/

Thing is, I'm not good with regular expressions and I haven't found a single source to confirm this would not create false negatives.

I've found one online that's ^1[1-9A-Za-z][^OIl]{20,40}, but I don't even know what the [^OIl] part means and it doesn't seem to match the 3 a Bitcoin address could start with.

federico-t
  • 11,157
  • 16
  • 58
  • 108
  • 7
    Your referenced page has a section "Address validation". Why not use the technique provided in the link over there? (Quote: "[...] it is advisable to use a method from [this thread](https://bitcointalk.org/index.php?topic=1026.0) rather than to just check for string length, allowed characters, or that the address starts with a 1 or 3") – phimuemue Feb 10 '14 at 17:17
  • 1
    @phimuemue If _all_ bitcoin addresses have that format, then I don't see a reason why this wouldn't work. Besides, I'm not looking for a rigurous validation (after all, it could be a valid address and not yet exist) but rather something that discards addresses that are clearly invalid. – federico-t Feb 10 '14 at 17:28
  • 4
    @fedeetz: bitcoin addresses do contain a checksum. You can't validate a bitcoin address using a regexp because *all* bitcoin addresses have that checksum. It is true that you regexp will discard many addresses which are clearly invalid... But your regexp will also accept an insane number of invalid ones. The very purpose of that checksum **is** to prevent people from using invalid addresses and I'd tend to think that the author(s) of bitcoins are very smart people and knew what they were doing. Doing "validation" without verifying the checksum whose very purpose is validation makes no sense. – TacticalCoder Feb 10 '14 at 22:58
  • @TacticalCoder That's not a problem for me, as I said, as long as it discards clearly invalid addresses and it doesn't generate false negatives, it's enough. This is not for an application open to the public, only to a couple developers. The whole point is that if they have a typo or copy only half of the address, for the app to warn them. – federico-t Feb 11 '14 at 00:10
  • @fedeetz your regex *will* match invalid Bitcoin addresses, as the characters `O`, `I` and `l` are not valid characters in a Bitcoin address. – runeks Jun 13 '14 at 12:47
  • To testnet: /^[mn2][a-zA-Z0-9]{27,34}/ – Felipe Sep 01 '15 at 18:44
  • 2
    https://rosettacode.org/wiki/Bitcoin/address_validation – Al Po Aug 02 '17 at 22:10

8 Answers8

62
^[13][a-km-zA-HJ-NP-Z1-9]{25,34}$

will match a string that starts with either 1 or 3 and, after that, 25 to 34 characters of either a-z, A-Z, or 0-9, excluding l, I, O and 0 (not valid characters in a Bitcoin address).

runeks
  • 1,545
  • 2
  • 12
  • 24
  • 9
    Thanks for providing at least partially correct solution rather than whining about what can't be done like rest of posters. – nikib3ro Aug 16 '14 at 19:36
  • 3
    Since a valid Bitcoin candidate must be 26 and 35 characters long, the interval should be `{25, 34}`, because of the `^[13]` at the starts take away a character from the count. See specs: https://en.bitcoin.it/wiki/Address – mokagio Nov 22 '14 at 09:19
  • exception that the uppercase letter "O", uppercase letter "I", lowercase letter "l", and the number "0" are never used to prevent visual ambiguity. –  Oct 26 '17 at 14:33
  • bc1q5lm8v27uf9v8nz6yczg3gxraflxlas4jvr0zuf comes out as `invalid` - but it is a valid address... – Oscar Chambers Feb 07 '21 at 10:17
16
^[13][a-km-zA-HJ-NP-Z1-9]{25,34}$

bitcoin address is

  • an identifier of 26-35 alphanumeric characters
  • beginning with the number 1 or 3
  • random digits
  • uppercase
  • lowercase letters
  • with the exception that the uppercase letter O, uppercase letter I, lowercase letter l, and the number 0 are never used to prevent visual ambiguity.
FranciscoA
  • 161
  • 1
  • 3
14

[^OIl] matches any character that's not O, I or l. The problems in your regex are:

  • You don't have a $ at the end, so it'd match any string beginning with a BC address.
  • You didn't count the first character in your {27,34} - that should be {26,33}

However, as mentioned in a comment, a regex is not a good way to validate a bitcoin address.

ThiefMaster
  • 285,213
  • 77
  • 557
  • 610
  • 5
    It seems to me the purpose of the regex is finding *potential* bitcoin addresses, not necessarily valid ones. – runeks Jun 13 '14 at 12:24
  • 1
    Regex module would be good for light-weight like browser plugin, or webcrawler. – jonnyjandles Feb 05 '15 at 20:44
  • Or find valid addresses not necessarily existing addresses. Whether or not addresses exist in your block chain is based upon when and how often you sinc'd. Figuring out if the address is valid or not is a completely different exercise. – JMS Jan 23 '19 at 16:42
10
^(bc1|[13])[a-zA-HJ-NP-Z0-9]{25,39}$

Based on the new address type Bech32

Victor
  • 185
  • 1
  • 9
3

Based on answer of runeks and Erhard Dinhobl I got this that accepts bech32 and legacy:

\b(bc(0([ac-hj-np-z02-9]{39}|[ac-hj-np-z02-9]{59})|1[ac-hj-np-z02-9]{8,87})|[13][a-km-zA-HJ-NP-Z1-9]{25,35})\b

Including testnet address:

\b((bc|tb)(0([ac-hj-np-z02-9]{39}|[ac-hj-np-z02-9]{59})|1[ac-hj-np-z02-9]{8,87})|([13]|[mn2])[a-km-zA-HJ-NP-Z1-9]{25,39})\b

Only testnet:

\b(tb(0([ac-hj-np-z02-9]{39}|[ac-hj-np-z02-9]{59})|1[ac-hj-np-z02-9]{8,87})|[mn2][a-km-zA-HJ-NP-Z1-9]{25,39})\b
Felipe
  • 15,458
  • 9
  • 63
  • 87
2

Based on the description here: https://github.com/bitcoin/bips/blob/master/bip-0173.mediawiki I would say the regex for a Bech32 bitcoin address for Version 1 and Version 0 (only for mainnet) is:

\bbc(0([ac-hj-np-z02-9]{39}|[ac-hj-np-z02-9]{59})|1[ac-hj-np-z02-9]{8,87})\b

Here are some other links where I found infos:

Erhard Dinhobl
  • 1,228
  • 14
  • 33
1

As the OP didn't provide a specific use case (only matching criteria) and I came across this in researching methods to detect BitCoin addresses, wanted to post back and share with the community.

These RegEx provided will find BitCoin addresses either at the start of a line and/or end of the line. My use case was to find BitCoin addresses in the body of an email given the rise of blackmail/sextortion (Reference: https://krebsonsecurity.com/2018/07/sextortion-scam-uses-recipients-hacked-passwords/) - so these weren't effective solutions (as outlined later). The proposed RegEx will catch many FPs in email, due to filenames and other identifiers within URLs. I am not knocking the solutions, as they work for certain use cases, but they simply don't work for mine. One variation caught many spam emails within a short timeframe of passive alerting (examples follow).

Here are my test cases:

--------------------------------------------------------
BitCoin blackmail formats observed (my org and online):
--------------------------------------------------------
BTC Address: 1JHwenDp9A98XdjfYkHKyiE3R99Q72K9X4 
BTC Address: 1Unoc4af6gCq3xzdDFmGLpq18jbTW1nZD
BTC Address: 1A8Ad7VbWDqwmRY6nSHtFcTqfW2XioXNmj
BTC Address: 12CZYvgNZ2ze3fGPFzgbSCELBJ6zzp2cWc
BTC Address: 17drmHLZMsCRWz48RchWfrz9Chx1osLe67

Receiving Bitcoin Address: 15LZALXitpbkK6m2QcbeQp6McqMvgeTnY8
Receiving Bitcoin Address: 1MAFzYQhm6msF2Dxo3Nbox7i61XvgQ7og5

--------------------------------------------------------
Other possible BitCoin test cases I added:
--------------------------------------------------------
- What if text comes before and/or after on same line?  Or doesn't contain BitCoin/BTC/etc. anywhere (or anywhere close to the address)?
    Send BitCoin payments here 1MAFzYQhm6msF2Dxo3Nbox7i61XvgQ7og5
    1MAFzYQhm6msF2Dxo3Nbox7i61XvgQ7og5 to keep your secrets safe.
    Send payments here 1MAFzYQhm6msF2Dxo3Nbox7i61XvgQ7og5 to keep your secrets safe.

- Standalone address:
    1Dvd7Wb72JBTbAcfTrxSJCZZuf4tsT8V72

--------------------------------------------------------
Redacted Body content generating FPs from spam emails:
--------------------------------------------------------
src=3D"https://example.com/blah=3D2159024400&t=3DXWP9YVkAYwkmif9RgKeoPhw2b1zdMnMzXZSGRD_Oxkk"

"cursor:pointer;color:#6A6C6D;-webkit-text-size-blahutm_campaign%253Drdboards%2526e_t%253Dd5c2deeaae5c4a8b8d2bff4d0f87ecdd%2526utm_cont=blah

src=3D"https://example.com/blah/74/328e74997261d5228886aab1a2da6874.jpg" 

src=3D"https://example.com/blah-1c779f59948fc5be8a461a4da8d938aa.jpg"

href=3D"https://example.com/blah-0ff3169b28a6e17ae8a369a3161734c1?alert_=id=blah

Some RegEx samples I tested (won't list those I'd knock for greedy globbing with backtraces):

^[13][a-km-zA-HJ-NP-Z1-9]{25,34}$
[13][a-km-zA-HJ-NP-Z1-9]{25,34}$
    (Too narrow and misses BitCoin addresses within a paragraph)

(bc1|[13])[a-zA-HJ-NP-Z0-9]{25,39}$
    (Still misses text after BTC on same line and triples execution time)

\W[13][a-km-zA-HJ-NP-Z1-9]{25,34}\W
    (Too broad and catches URL formats)

The current RegEx I am evaluating which catches all my known/crafted sample cases and eliminates known FPs (specifically avoiding end of sentence period for URL filename FPs):

[13][a-km-zA-HJ-NP-Z1-9]{25,34}\s

One reference point for execution times (shows cost in steps and time): https://regex101.com/

Please feel free to weigh in or provide suggestions on improvements (I am by no means a RegEx master). As I further vet it against email detection of Body content, I will update if other FP cases are observed or more efficient RegEx is derived.

Seth

Seth
  • 19
  • 3
0

I am not into complicated solutions and this regex served the purpose for the most simplest validation, when you just don't want to receive complete nonsense.

\w{25,}
ssamko
  • 516
  • 5
  • 16