17

Take a string such as:

In C#: How do I add "Quotes" around string in a comma delimited list of strings?

and convert it to:

in-c-how-do-i-add-quotes-around-string-in-a-comma-delimited-list-of-strings

Requirements:

  • Separate each word by a dash and remove all punctuation (taking into account not all words are separated by spaces.)
  • Function takes in a max length, and gets all tokens below that max length. Example: ToSeoFriendly("hello world hello world", 14) returns "hello-world"
  • All words are converted to lower case.

On a separate note, should there be a minimum length?

Michael Myers
  • 178,094
  • 41
  • 278
  • 290
Shawn
  • 18,369
  • 19
  • 95
  • 151
  • 1
    here's some info on url length: http://www.boutell.com/newfaq/misc/urllength.html – bchhun Jan 21 '09 at 16:38
  • 1
    Maybe replacing some special characters with their "english pronunciation", e.g. "#" => "sharp", would allow to make better urls and differentiate C from C# (which is good, right ;) ?) ? – Wookai Jan 21 '09 at 17:17
  • yea definitely, except # isn't sharp, thats a different symbol ;p – Shawn Jan 22 '09 at 03:53
  • 1
    Right... But I think you got my point ;)... – Wookai Jan 22 '09 at 15:37

12 Answers12

10

Here is my solution in C#

private string ToSeoFriendly(string title, int maxLength) {
    var match = Regex.Match(title.ToLower(), "[\\w]+");
    StringBuilder result = new StringBuilder("");
    bool maxLengthHit = false;
    while (match.Success && !maxLengthHit) {
        if (result.Length + match.Value.Length <= maxLength) {
            result.Append(match.Value + "-");
        } else {
            maxLengthHit = true;
            // Handle a situation where there is only one word and it is greater than the max length.
            if (result.Length == 0) result.Append(match.Value.Substring(0, maxLength));
        }
        match = match.NextMatch();
    }
    // Remove trailing '-'
    if (result[result.Length - 1] == '-') result.Remove(result.Length - 1, 1);
    return result.ToString();
}
Shawn
  • 18,369
  • 19
  • 95
  • 151
7

I would follow these steps:

  1. convert string to lower case
  2. replace unwanted characters by hyphens
  3. replace multiple hyphens by one hyphen (not necessary as the preg_replace() function call already prevents multiple hyphens)
  4. remove hypens at the begin and end if necessary
  5. trim if needed from the last hyphen before position x to the end

So, all together in a function (PHP):

function generateUrlSlug($string, $maxlen=0)
{
    $string = trim(preg_replace('/[^a-z0-9]+/', '-', strtolower($string)), '-');
    if ($maxlen && strlen($string) > $maxlen) {
        $string = substr($string, 0, $maxlen);
        $pos = strrpos($string, '-');
        if ($pos > 0) {
            $string = substr($string, 0, $pos);
        }
    }
    return $string;
}
Gumbo
  • 594,236
  • 102
  • 740
  • 814
  • I like this solution. I was trying to do this by matching all non alphanumerics and splitting them joining on - I kept trying to match only if they weren't the start or end of the string, but never got it working. In the end i settled on matching the words, and appending. – Shawn Jan 21 '09 at 16:42
  • What if the first word is longer than the maximum length ? – Wookai Jan 21 '09 at 17:12
  • i returned a substring in that situation – Shawn Jan 22 '09 at 03:53
4

C#

public string toFriendly(string subject)
{
    subject = subject.Trim().ToLower();
    subject = Regex.Replace(subject, @"\s+", "-");
    subject = Regex.Replace(subject, @"[^A-Za-z0-9_-]", "");
    return subject;
}
annakata
  • 70,224
  • 16
  • 111
  • 179
  • i think this has a few issues because what about this situation: (string)someObject fails. becomes: stringsomeobject-fails – Shawn Jan 21 '09 at 16:45
  • 1
    that did occur to me, but frankly I'm not sure how I'd want to handle it. In the past I've gone with nuking everything between and including the parens, but I suspect it's implementation specific. Whatever you want though, it's trivial to add to the above template. – annakata Jan 21 '09 at 16:59
2

A better version:

function Slugify($string)
{
    return strtolower(trim(preg_replace(array('~[^0-9a-z]~i', '~-+~'), '-', $string), '-'));
}
Alix Axel
  • 141,486
  • 84
  • 375
  • 483
2

Here's a solution for php:

function make_uri($input, $max_length) {
  if (function_exists('iconv')) {  
    $input = @iconv('UTF-8', 'ASCII//TRANSLIT', $input);  
  }

  $lower = strtolower($input);


  $without_special = preg_replace_all('/[^a-z0-9 ]/', '', $input);
  $tokens = preg_split('/ +/', $without_special);

  $result = '';

  for ($tokens as $token) {
    if (strlen($result.'-'.$token) > $max_length+1) {
      break;
    }

    $result .= '-'.$token;       
  }

  return substr($result, 1);
}

usage:

echo make_uri('In C#: How do I add "Quotes" around string in a ...', 500);

Unless you need the uris to be typable, they don't need to be small. But you should specify a maximum so that the urls work well with proxies etc.

Allain Lalonde
  • 85,857
  • 67
  • 175
  • 234
1

Solution in Perl:

my $input = 'In C#: How do I add "Quotes" around string in a comma delimited list of strings?';

my $length = 20;
$input =~ s/[^a-z0-9]+/-/gi;
$input =~ s/^(.{1,$length}).*/\L$1/;

print "$input\n";

done.

1

Solution in shell:

echo 'In C#: How do I add "Quotes" around string in a comma delimited list of strings?' | \
    tr A-Z a-z | \
    sed 's/[^a-z0-9]\+/-/g;s/^\(.\{1,20\}\).*/\1/'
1

This is close to how Stack Overflow generates slugs:

public static string GenerateSlug(string title)
{
    string slug = title.ToLower();
    if (slug.Length > 81)
      slug = slug.Substring(0, 81);
    slug = Regex.Replace(slug, @"[^a-z0-9\-_\./\\ ]+", "");
    slug = Regex.Replace(slug, @"[^a-z0-9]+", "-");

    if (slug[slug.Length - 1] == '-')
      slug = slug.Remove(slug.Length - 1, 1);
    return slug;
}
Pavel Chuchuva
  • 21,289
  • 9
  • 93
  • 110
0

Another season, another reason, for choosing Ruby :)

def seo_friendly(str)
  str.strip.downcase.gsub /\W+/, '-'
end

That's all.

edgerunner
  • 14,394
  • 2
  • 57
  • 68
0

In python, (if django is installed, even if you are using another framework.)

from django.template.defaultfilters import slugify
slugify("In C#: How do I add "Quotes" around string in a comma delimited list of strings?")
SingleNegationElimination
  • 137,315
  • 28
  • 247
  • 284
0

A slightly cleaner way of doing this in PHP at least is:

function CleanForUrl($urlPart, $maxLength = null) {
    $url = strtolower(preg_replace(array('/[^a-z0-9\- ]/i', '/[ \-]+/'), array('', '-'), trim($urlPart)));
    if ($maxLength) $url = substr($url, 0, $maxLength);
    return $url;
}

Might as well do the trim() at the start so there is less to process later and the full replacement is done with in the preg_replace().

Thxs to cg for coming up with most of this: What is the best way to clean a string for placement in a URL, like the question name on SO?

Community
  • 1
  • 1
Darryl Hein
  • 134,677
  • 87
  • 206
  • 257
0

To do this we need to:

  1. Normalize the text
  2. Remove all diacritics
  3. Replace international character
  4. Be able to shorten text to match SEO thresholds

I wanted a function to generate the entire string and also to have an input for a possible max length, this was the result.

public static class StringHelper
{
/// <summary>
/// Creates a URL And SEO friendly slug
/// </summary>
/// <param name="text">Text to slugify</param>
/// <param name="maxLength">Max length of slug</param>
/// <returns>URL and SEO friendly string</returns>
public static string UrlFriendly(string text, int maxLength = 0)
{
    // Return empty value if text is null
    if (text == null) return "";

    var normalizedString = text
        // Make lowercase
        .ToLowerInvariant()
        // Normalize the text
        .Normalize(NormalizationForm.FormD);

    var stringBuilder = new StringBuilder();
    var stringLength = normalizedString.Length;
    var prevdash = false;
    var trueLength = 0;

    char c;

    for (int i = 0; i < stringLength; i++)
    {
        c = normalizedString[i];

        switch (CharUnicodeInfo.GetUnicodeCategory(c))
        {
            // Check if the character is a letter or a digit if the character is a
            // international character remap it to an ascii valid character
            case UnicodeCategory.LowercaseLetter:
            case UnicodeCategory.UppercaseLetter:
            case UnicodeCategory.DecimalDigitNumber:
                if (c < 128)
                    stringBuilder.Append(c);
                else
                    stringBuilder.Append(ConstHelper.RemapInternationalCharToAscii(c));

                prevdash = false;
                trueLength = stringBuilder.Length;
                break;

            // Check if the character is to be replaced by a hyphen but only if the last character wasn't
            case UnicodeCategory.SpaceSeparator:
            case UnicodeCategory.ConnectorPunctuation:
            case UnicodeCategory.DashPunctuation:
            case UnicodeCategory.OtherPunctuation:
            case UnicodeCategory.MathSymbol:
                if (!prevdash)
                {
                    stringBuilder.Append('-');
                    prevdash = true;
                    trueLength = stringBuilder.Length;
                }
                break;
        }

        // If we are at max length, stop parsing
        if (maxLength > 0 && trueLength >= maxLength)
            break;
    }

    // Trim excess hyphens
    var result = stringBuilder.ToString().Trim('-');

    // Remove any excess character to meet maxlength criteria
    return maxLength <= 0 || result.Length <= maxLength ? result : result.Substring(0, maxLength);
}
}

This helper is used for remapping some international characters to a readable one instead.

public static class ConstHelper
{
/// <summary>
/// Remaps international characters to ascii compatible ones
/// based of: https://meta.stackexchange.com/questions/7435/non-us-ascii-characters-dropped-from-full-profile-url/7696#7696
/// </summary>
/// <param name="c">Charcter to remap</param>
/// <returns>Remapped character</returns>
public static string RemapInternationalCharToAscii(char c)
{
    string s = c.ToString().ToLowerInvariant();
    if ("àåáâäãåą".Contains(s))
    {
        return "a";
    }
    else if ("èéêëę".Contains(s))
    {
        return "e";
    }
    else if ("ìíîïı".Contains(s))
    {
        return "i";
    }
    else if ("òóôõöøőð".Contains(s))
    {
        return "o";
    }
    else if ("ùúûüŭů".Contains(s))
    {
        return "u";
    }
    else if ("çćčĉ".Contains(s))
    {
        return "c";
    }
    else if ("żźž".Contains(s))
    {
        return "z";
    }
    else if ("śşšŝ".Contains(s))
    {
        return "s";
    }
    else if ("ñń".Contains(s))
    {
        return "n";
    }
    else if ("ýÿ".Contains(s))
    {
        return "y";
    }
    else if ("ğĝ".Contains(s))
    {
        return "g";
    }
    else if (c == 'ř')
    {
        return "r";
    }
    else if (c == 'ł')
    {
        return "l";
    }
    else if (c == 'đ')
    {
        return "d";
    }
    else if (c == 'ß')
    {
        return "ss";
    }
    else if (c == 'þ')
    {
        return "th";
    }
    else if (c == 'ĥ')
    {
        return "h";
    }
    else if (c == 'ĵ')
    {
        return "j";
    }
    else
    {
        return "";
    }
}
}

To the function would work something like this

const string text = "ICH MUß EINIGE CRÈME BRÛLÉE HABEN";
Console.WriteLine(StringHelper.URLFriendly(text));
// Output: 
// ich-muss-einige-creme-brulee-haben

This question has already been answered many time here but not a single one was optimized. you can find the entire sourcecode here on github with some samples. More you can read from Johan Boström's Blog. More on this is compatible with .NET 4.5+ and .NET Core.

Mujtaba
  • 135
  • 1
  • 2
  • 14