2

I want a user to be able to submit a url, and then display that url to other users as a link.

If I naively redisplay what the user submitted, I leave myself open to urls like

http://somesite.com' ><script>[any javacscript in here]</script>

that when I redisplay it to other users will do something nasty, or at least something that makes me look unprofessional for not preventing it.

Is there a library, preferably in java, that will clean a url so that it retains all valid urls but weeds out any exploits/tomfoolery?

Thanks!

Bruce
  • 1,042
  • 2
  • 17
  • 28
  • Have a look [here](http://stackoverflow.com/a/4605816/1225328). – sp00m Aug 30 '12 at 10:05
  • you want URLEncoding or you don't want to display the whole URL ??? – Harmeet Singh Aug 30 '12 at 10:06
  • This question asks for a **valid URL regular expression: http://stackoverflow.com/questions/161738/what-is-the-best-regular-expression-to-check-if-a-string-is-a-valid-url **. The answers are helpful. You'd simply need to create a regular expression matcher in java – Redandwhite Aug 30 '12 at 11:01

3 Answers3

3

I think what you are looking for is output encoding. Have a look at OWASP ESAPI which is tried and tested way to perform encoding in Java.

Also, just a suggestion, if you want to check if a user is submitting malicious URL, you can check that against Google malware database. You can use SafeBrowing API for that.

josh-cain
  • 4,466
  • 4
  • 32
  • 53
gauravphoenix
  • 2,386
  • 3
  • 22
  • 32
3

URLs having ' in are perfectly valid. If you are outputting them to an HTML document without escaping, then the problem lies in your lack of HTML-escaping, not in the input checking. You need to ensure that you are calling an HTML encoding method every time you output any variable text (including URLs) into an HTML document.

Java does not have a built-in HTML encoder (poor show!) but most web libraries do (take your pick, or write it yourself with a few string replaces). If you use JSTL tags, you get escapeXml to do it for free by default:

<a href="<c:out value="${link}"/>">ok</a>

Whilst your main problem is HTML-escaping, it is still potentially beneficial to validate that an input URL is valid to catch mistakes - you can do that by parsing it with new URL(...) and seeing if you get a MalformedURLException.

You should also check that the URL begins with a known-good protocol such as http:// or https://. This will prevent anyone using dangerous URL protocols like javascript: which can lead to cross-site-scripting as easily as HTML-injection can.

Community
  • 1
  • 1
bobince
  • 498,320
  • 101
  • 621
  • 807
1

You can use apache validator URLValidator

UrlValidator urlValidator = new UrlValidator(schemes);
if (urlValidator.isValid("http://somesite.com")) {
   //valid
}
Eduard
  • 3,018
  • 1
  • 19
  • 28