20

I am using spring MVC and having a problem in jsessionid, what I found is that jsessionid is injected in the url if cookies isn't enabled in the browser producing a url like that:

http://localhost/categories;jsessionid=Bsls4aQFXA5RUDcmZKV5iw?cid=13001

Actually there is no problem with browsers but when Google crawl my site, and seems Google crawlers don't have cookies :), they store urls of my site in that form and my site appears in search results having URLs like that ones containing jsessionid.

Actually it's running without any problems, but I prefer to have URLs appear in Google search results clear without jsessionid.

Any help?

kamaci
  • 65,625
  • 65
  • 210
  • 342
mmohab
  • 2,073
  • 4
  • 23
  • 40

5 Answers5

19

To the point: simply don't let your app create sessions as long as users do not login or perform POST actions. Do not call request.getSession() or request.getSession(true). Do not create nor manage session scoped beans for non-logged-in users. Ensure that the frameworks which you're using do not unnecessarily create sessions without that you say it to do so.

If this is really impossible due to the way your application is designed or due to the limitations/bugs of the (MVC) frameworks used, then your best bet is to redirect Googlebot requests to URLs without JSESSIONID identifier. You can use Tuckey's URL rewrite filter for this (which is, say, the Java variant of Apache HTTPD's well-known mod_rewrite). Here's an extract of relevance from its configuration examples page.

Hide jsessionid for requests from googlebot.


<outbound-rule>
     <name>Strip URL Session ID's</name>
     <note>
         Strip ;jsession=XXX from urls passed through response.encodeURL().
         The characters ? and # are the only things we can use to find out where the jsessionid ends.
         The expression in 'from' below contains three capture groups, the last two being optional.
             1, everything before ;jesessionid
             2, everything after ;jesessionid=XXX starting with a ? (to get the query string) up to #
             3, everything ;jesessionid=XXX and optionally ?XXX starting with a # (to get the target)
         eg,
         from index.jsp;jsessionid=sss?qqq to index.jsp?qqq
         from index.jsp;jsessionid=sss?qqq#ttt to index.jsp?qqq#ttt
         from index.jsp;jsessionid=asdasdasdsadsadasd#dfds - index.jsp#dfds
         from u.jsp;jsessionid=wert.hg - u.jsp
         from /;jsessionid=tyu - /
     </note>
     <condition name="user-agent">googlebot</condition>
     <from>^(.*?)(?:\;jsessionid=[^\?#]*)?(\?[^#]*)?(#.*)?$</from>
     <to>$1$2$3</to>
 </outbound-rule>
BalusC
  • 992,635
  • 352
  • 3,478
  • 3,452
11

Spring can be configured to not do that: Why jsessionid is appended to each url?

Web applications can be configured to block it: http://randomcoder.org/articles/jsessionid-considered-harmful

Community
  • 1
  • 1
ykaganovich
  • 13,997
  • 7
  • 55
  • 90
1

If you don't use Spring http tag.
Go to your applicationFilterChain bean that defines your Spring filter chains.
Normally you will have a filter called httpSessionContextIntegrationFilter or something very close, that is based of the class org.springframework.security.web.context.HttpSessionContextIntegrationFilter or inherits from it.
Add the property:

<property name="securityContextRepository" ref="securityContextRepositoryNoJSession"/>

And add the bean:

<bean id="securityContextRepositoryNoJSession" class="org.springframework.security.web.context.HttpSessionSecurityContextRepository">
    <property name="disableUrlRewriting" value="true"/>
</bean>


This should be equivalent to setting disable-url-rewriting to true

roizaig
  • 11
  • 2
0

Easiest way to get rid off jsessionid in your url is to change to tag on login page where is calling j_spring_security_check to

<c:url var="authUrl" value="/static/j_spring_security_check" />
    <form action="${authUrl}" method="post">
mmohab
  • 2,073
  • 4
  • 23
  • 40
0

I would insert a filter that if it detects a bot (like googlebot) uses a custom HttpServletResponse which overrides the encodeUrl methods to simply return the raw URL. If the filter does not detect a bot it would simply let the chain continue which should let url encoding etc continue as per the default.

mP.
  • 17,011
  • 10
  • 66
  • 101