To the point: simply don't let your app create sessions as long as users do not login or perform POST actions. Do not call request.getSession()
or request.getSession(true)
. Do not create nor manage session scoped beans for non-logged-in users. Ensure that the frameworks which you're using do not unnecessarily create sessions without that you say it to do so.
If this is really impossible due to the way your application is designed or due to the limitations/bugs of the (MVC) frameworks used, then your best bet is to redirect Googlebot requests to URLs without JSESSIONID identifier. You can use Tuckey's URL rewrite filter for this (which is, say, the Java variant of Apache HTTPD's well-known mod_rewrite
). Here's an extract of relevance from its configuration examples page.
Hide jsessionid for requests from googlebot.
<outbound-rule>
<name>Strip URL Session ID's</name>
<note>
Strip ;jsession=XXX from urls passed through response.encodeURL().
The characters ? and # are the only things we can use to find out where the jsessionid ends.
The expression in 'from' below contains three capture groups, the last two being optional.
1, everything before ;jesessionid
2, everything after ;jesessionid=XXX starting with a ? (to get the query string) up to #
3, everything ;jesessionid=XXX and optionally ?XXX starting with a # (to get the target)
eg,
from index.jsp;jsessionid=sss?qqq to index.jsp?qqq
from index.jsp;jsessionid=sss?qqq#ttt to index.jsp?qqq#ttt
from index.jsp;jsessionid=asdasdasdsadsadasd#dfds - index.jsp#dfds
from u.jsp;jsessionid=wert.hg - u.jsp
from /;jsessionid=tyu - /
</note>
<condition name="user-agent">googlebot</condition>
<from>^(.*?)(?:\;jsessionid=[^\?#]*)?(\?[^#]*)?(#.*)?$</from>
<to>$1$2$3</to>
</outbound-rule>