8

I would like to extract an IP and Port from a returned list. I am currently using str(var).replace command to remove extra characters. This will/has cause problems when the string format changes making the .replace command through an error

def discover_device():
    """ This function will look for available device on the local network and extract the IP from the result"""
    discover_device = '[<Device: 192.168.222.123:8075>]' # Actually: call to broadcasting device
    device_ip = str(discover_device).replace('[<Device: ', '').replace(':8075>]', '')

So the problem would come if: [<Device: xxx.xxx.xxx.xxx:xxxx>]

Changed to this: [<now_what: xxx.xxx.xxx.xxx:xxxx>]

The dicovery_device() would through and error.

What is the best practise to identify an ip/port pattern and extract ip and port without having to rely on the integrity of surrounding characters?

From this: [<Device: 192.168.222.123:8075>]

To this: 192.168.222.123:8075

and preferably: [192.168.222.123, 8075]

Taking into consideration IP variances within dot blocks and largest port number based on 16-bit (normally 4 integers after the colon up to 5 integers)

Enrique Bruzual
  • 311
  • 1
  • 2
  • 14

4 Answers4

5

Assuming an IPv4 address, try extracting numbers and critical punctuation. Then slice the valid result when necessary. Also validating ip addresses may be a safer approach.

In Python 3:

Code

import string
import ipaddress


def validate_port(func):
    """Return the results or raise and exception for invalid ports."""
    def wrapper(arg):
        result = func(arg)
        if len(result) == 2 and not result[-1].isdigit():
            raise ValueError("Invalid port number.")
        return result
    return wrapper


@validate_port
def discover_device(device):
    """Return a list of ip and optional port number.  Raise exception for invalid ip."""
    result = "".join(i for i in device if i in (string.digits +".:")).strip(":").split(":")

    try:
        ipaddress.ip_address(result[0])
    except ValueError as e:
        # Numbers in the device name (index 0) or invalid ip
        try:
            ipaddress.ip_address(result[1])
        except IndexError:
            raise e
        else:
            return result[1:]
    else:
        return result

Demo

discover_device("[<Device: 192.168.222.123>]")
# ['192.168.222.123']

discover_device("[<Device: 192.168.222.123:8075>]")
# ['192.168.222.123', '8075']

discover_device("[<Device.34: 192.168.222.123:8080>]")
# ['192.168.222.123', '8080']

discover_device("[<Device: 192.168.222123>]")
# ValueError: '192.168.222123' does not appear to be an IPv4 or IPv6 address

discover_device("[<Device21: 192.168.222123>]")
# ValueError: '192.168.222123' does not appear to be an IPv4 or IPv6 address

discover_device("[<device.451: 192.168.222.123:80.805>]")
# ValueError: Invalid port number.

Features

  • insensitive to surrounding characters
  • ip address validation (not IPv6) and exception handling
  • safeguard against numbers in the device name
  • validate port numbers (optional)

Details

Typically result is a list comprising the ip and an optional port number. However, in cases where numbers are in the device name, the first index of the result will include unwanted numbers. Here are examples of result:

    # ['192.168.222.123']                                  ip   
    # ['192.168.222.123', '8075']                          ip, port
    # ['192.168.222123']                                   invalid ip
    # ['.34', '192.168.222.123', '8080']                   device #, ip, port
    # ['192.168.222.123', '80.805']                        invalid port

The exception handling tests for numbers in the device name and validates ip addresses in the first or second indices. If none are found, an exception is raised.

Although validating port numbers is outside the scope of the question, ports are assumed to be a number. A simple test was added to the validate_port decorator, which can be applied or updated as desired. The decorator screens the output from discover_device(). If the port is not a pure number, an exception is raised. See this post for modifying restrictions. See this blog for a great tutorial on Python decorators.

Options

If validation is not a concern, the following code should suffice, provided "." is absent from the device name:

def discover_device(device):
    result = "".join(i for i in device if i in (string.digits +".:")).strip(":").split(":")
    if "." not in result[0]:
        return result[1:]
    return result

If a non-decorator solution is preferred, define the following function:

def validate_port(result):
    """Return the results or raise and exception for invalid ports."""
        if len(result) == 2 and not result[-1].isdigit():
            raise ValueError("Invalid port number.")
        return result

Now pass the return values of discover_device() into the latter function, i.e.return validate_port(result[1:]) and return validate_port(result).

Regards to @coder for suggestions.

pylang
  • 28,402
  • 9
  • 97
  • 94
  • Thx @coder. The post has been improved given your edge case. – pylang Sep 15 '17 at 20:53
  • I like how you think @coder. Added ip validation using Python 3's `ipaddress` module. – pylang Sep 15 '17 at 21:36
  • 1
    thx for considering my suggestions, I wish I could upvote more than once. – coder Sep 15 '17 at 21:43
  • @pylang Thank you for the comprehensive answer. I have a question, should it not though an error if I feed this `discover_device("[]")`, but in place of an error I get this `['192.168.222.123', '80.805']`. if I add a **coma** among the port numbers it recognize it, but not a **period**. is IP validation a function of `ipaddress`? – Enrique Bruzual Sep 15 '17 at 22:07
  • An interesting point. `ip_address` validates the ip host alone. I will look into validating port numbers. – pylang Sep 15 '17 at 22:11
  • @pylang Some of this code (decorators & wrappers) is above my pay grade ($0.00), Based from what I understand reading Mark Lutz, and tell me if I am wrong. By using the decorator `@validate_port` you are basically “**siphoning**” `def discover_device(device):` through `def validate_port(func):` Thank you again for your answer. – Enrique Bruzual Sep 16 '17 at 04:00
  • You have the right idea - decorators can "screen" the output of a function. Here the output is screened with a test that raises an exception for invalid ports e.g. `['192.168.222.123', '80.805'] `, otherwise it passes through unmodified. See the edit and recommended blog post for more on decorators. – pylang Sep 17 '17 at 02:56
4

No regex is needed for this. Use str's builtin method split.

>>> device = '[<Device: 192.168.222.123:8075>]'
>>> _, ip, port = device.strip('[<>]').split(':')
>>> print((ip.strip(), port))
('192.168.222.123', '8075')

If you really want to use a regex, I would use a simple one:

>>> import re
>>> ip, port = re.findall('([\d.]+)', device)
>>> print((ip, port))
('192.168.222.123', '8075')
Zach Gates
  • 3,744
  • 1
  • 21
  • 44
  • This relies on the first colon being present, though. If the input were changed more drastically than in OP's example, you might have input like `'[]'` or `'[]'` or `'[(Device: 192.168.222.123:8075)]'` which would break in your example. – RagingRoosevelt Sep 15 '17 at 20:16
  • @RagingRoosevelt: OP mentioned that the device name might change to the form `[]`. My answer works for this as well. An answer does not need to work for every case imaginable; it's for this specific question. If the OP expects to have a device name in another form, I'll change my answer. – Zach Gates Sep 15 '17 at 20:18
  • If you read the question again, OP never specified that device name was what changed about that string (in fact, that was an assumption you introduced). OP only specified "This will/has cause problems when the string format changes making the .replace command through an error". OP provided one example of **a** way the format could change. – RagingRoosevelt Sep 15 '17 at 20:25
  • @RagingRoosevelt: According to OP: "So the problem would come if: [] Changed to this: []" (which I would consider a change in the device's name). Like I said, if OP has concessions about a specific form, he should state that in the question. – Zach Gates Sep 15 '17 at 20:26
  • Certainly, if OP meant specifically that the only change in format he expected was a change in device name, he should have specified that was the case. OP left the expected change in format open, though, and just provided an example of one way the change might occur. You're assuming the change OP mentioned is a change in device name, but OP never even specified that's what he's looking at. – RagingRoosevelt Sep 15 '17 at 20:28
  • 1
    @zachGates I do appreciate your answer, is compact but doesn't actually answer my questing, the one in bold, specifically ask for a solution "without having to rely on the integrity of surrounding characters" so I am more inclined to use regex as some of the other users have suggested. – Enrique Bruzual Sep 15 '17 at 20:30
  • @EnriqueBruzual: Could you further specify what you mean by "integrity of surrounding characters", in your question? Can you also provide some examples of this? As my answer is currently written, an example like `<<>>` would work, as would something like `[DEVICE:0.0.0.0:9999]`. – Zach Gates Sep 15 '17 at 20:31
  • @ZachGates I am looking at the regex portion you posted, and when I say the "integrity of surrounding characters" I mean, any character that is not part of an IP address. I apologize, I tried to be as clear as possible but sometimes. So the idea is, Zach, I don't have control over what the device broadcast, but there will always be an ip. in this case there is "key: value" but who knows if they decide to replace the "key ~ value" format. so it is better to just count on an established internet protocol format since we all be informed when that happens ie:ipv6 – Enrique Bruzual Sep 15 '17 at 20:49
2

You can simply use a regex to find the IP address, independently from what's before.

For example this one :

\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b

As a test :

>>> import re
>>> re.findall(r'\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b', '[<Device: 192.168.222.123:8075>]')
['192.168.222.123']
>>> re.findall(r'\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b', '[<SomethingElse: 192.168.222.123:8075>]')
['192.168.222.123']
>>> re.findall(r'\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}:\d{,5}', '[<SomethingElse: 192.168.222.123:8075>]')
['192.168.222.123:8075']
Eric Duminil
  • 48,038
  • 8
  • 56
  • 100
-1

I think your best bet is to use regular expressions:

import re

def discover_device(in_str):
    m = re.search('(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}(?:\:\d{1,5})?)', in_str)
    if m:
        return m.group(0)
    else:
        return None

If your regex string is (\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}(?:\:\d{1,5})?), then the breakdown is:

  • \d{1,3}\. looks for between 1 and 3 digits followed by a period
  • (?:\:\d{1,5})? looks for one or zero occurrences of a semicolon followed by between 1 and 5 digits (the ?: specifies that it's a non-capturing group so that it won't be present by itself in your result)

If you wanted it to capture the port and IP separately, you could do

def discover_device(in_str):
    m = re.search('(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})(?:\:(\d{1,5}))?', in_str)
    if m:
        return (m.group(1), m.group(2))
    else:
        return None

Here's the regex if you want to play with it.

RagingRoosevelt
  • 1,720
  • 15
  • 31