1

I am trying to split strings similar to this using Regex.Split:

To return this:

Effectively, ignoring double forward slash and only worrying about a single forward slash.

I know I should be using something like this /(?!/) negative look ahead - but can't get it to work.

This is not a duplicate of this Similar Question, because if you run that regular expression through Regex.Split, it does not give the required result.

c0D3l0g1c
  • 2,652
  • 4
  • 27
  • 68
  • This could be helpful: https://stackoverflow.com/questions/161738/what-is-the-best-regular-expression-to-check-if-a-string-is-a-valid-url – Praveen May 21 '18 at 06:54
  • 2
    @Praveen, while I appreciate your answer - perhaps you should read my question again. – c0D3l0g1c May 21 '18 at 07:00
  • 1
    Possible duplicate of [Split string on single forward slashes with RegExp](https://stackoverflow.com/questions/32586057/split-string-on-single-forward-slashes-with-regexp) – l'L'l May 21 '18 at 07:12
  • There is a solution at the above duplicate that produces the exact desired result using the following pattern (`[^/]+(?://[^/]*)*`)... – l'L'l May 21 '18 at 07:13
  • 2
    Why not use the [Uri](https://msdn.microsoft.com/en-us/library/system.uri(v=vs.110).aspx) class instead? `Uri.Segments` will return all segments of the path. `Host` will return `www.linkedin.com`. It won't be affected by query parameters and fragments like a regex either. `new Uri(@"https://www.linkedin.com/in/someone?someparam=foo#tab1").Segments` will return `\ `, `in`, `someone` – Panagiotis Kanavos May 21 '18 at 07:42

3 Answers3

6

How about this: (?<!/)/(?!/)

Breaking it down:

  • (?<!/): negative lookbehind for / characters
  • /: match a single / character
  • (?!/): negative lookahead for / characters

Taken together, we match a / character that does not have a / both before and after it.

Example usage:

string text = "https://www.linkedin.com/in/someone";
string[] tokens = Regex.Split(text, "(?<!/)/(?!/)");
foreach (var token in tokens)
{
    Console.WriteLine($"Token: {token}");
}

Output:

Token: https://www.linkedin.com
Token: in
Token: someone

41686d6564
  • 15,043
  • 11
  • 32
  • 63
Samantha
  • 915
  • 6
  • 20
0

Also you can do it using this code :

string pattern = @"([^\/]+(\/{2,}[^\/]+)?)";
string input = @"https://www.linkedin.com/in/someone";
foreach(Match match in Regex.Matches(input, pattern)) {
    Console.WriteLine(match);
}

Output :

https://www.linkedin.com
in
someone
leofun01
  • 126
  • 1
  • 8
0

As mentioned by @Panagiotis Kanavos in the comments section above, why make things complicated when you can use the Uri Class:

Provides an object representation of a uniform resource identifier (URI) and easy access to the parts of the URI.

public static void Main()
{
    Uri myUri = new Uri("https://www.linkedin.com/in/someone");   
    string host =  myUri.Scheme + Uri.SchemeDelimiter  + myUri.Host;
    Console.WriteLine(host);        
}

OUTPUT:

enter image description here

DEMO:

dotNetFiddle

DirtyBit
  • 15,671
  • 4
  • 26
  • 53