3

I've created a Html helper that encodes email addresses in order to prevent SPAM. This is the same technique used by the MarkdownSharp library when auto-generating email links.

The problem is that TagBuilder.MergeAttribute encodes the attribute text which breaks the link. Is it possible to override this behaviour or at least specify the attribute another way. I know I can fall back to just using string concatenation or a StringBuilder but TabBuilder does offer a number of benefits such as easily merging other HTML attributes.

    /// <summary>
    /// Creates an encoded email link in the hopes of foiling most SPAM bots
    /// </summary>
    public static IHtmlString EmailLink(this HtmlHelper html, string email, string text = null, object htmlAttributes = null)
    {
        Ensure.Argument.NotNullOrEmpty(email, "email");

        var encodedEmail = EncodeEmailAddress(email);

        var tb = new TagBuilder("a");
        tb.MergeAttribute("href", "mailto:" + encodedEmail);

        tb.InnerHtml = text ?? encodedEmail;

        if (htmlAttributes != null)
        {
            tb.MergeAttributes(new RouteValueDictionary(htmlAttributes));
        }

        return new HtmlString(tb.ToString());
    }

    /// <summary>
    /// encodes email address randomly  
    /// roughly 10% raw, 45% hex, 45% dec 
    /// note that @ is always encoded and : never is
    /// </summary>
    private static string EncodeEmailAddress(string addr)
    {
        var sb = new StringBuilder(addr.Length * 5);
        var rand = new Random();
        int r;
        foreach (char c in addr)
        {
            r = rand.Next(1, 100);
            if ((r > 90 || c == ':') && c != '@')
                sb.Append(c);                         // m
            else if (r < 45)
                sb.AppendFormat("&#x{0:x};", (int)c); // &#x6D
            else
                sb.AppendFormat("&#{0};", (int)c);    // &#109
        }
        return sb.ToString();
    }
Ben Foster
  • 32,767
  • 35
  • 157
  • 274

1 Answers1

3

I do not believe your helper will do anything meaningful to help reduce spam. When crawlers use HTML parsers, they're seeing the decoded strings, not the encoded ones. It is the same logic as in the browser itself. So all they need to do is strip the mailto: prefix, and they now have the original email address.

If you still wish to pursue this, you must use string concatenation. TagBuilder isn't designed to work with input that is already encoded. Make sure that you encode the &, ', and " characters if you go this route.

Levi
  • 32,325
  • 3
  • 84
  • 87
  • When I view the source of the link in my browser I get the encoded string *not* the decoded one so I'm assuming this would be the same if the crawler was not using a parser? – Ben Foster Mar 10 '13 at 12:26