0

I am trying to access http://myanimelist.net with the following Java code I found on this git https://github.com/Autumn/javaMAL :

String result = "";
        try {
            URL url = new URL(sURL);
            URLConnection urlc = url.openConnection();
            BufferedReader buffer = new BufferedReader(new InputStreamReader(urlc.getInputStream()));
            StringBuilder sb = new StringBuilder();
            String str;
            while ((str = buffer.readLine()) != null) {
                sb.append(str);
            }
            result = sb.toString();

        } catch (Exception e) {
            e.printStackTrace();
        }
        return result;

This is what I get back from the site

<html><head><META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW"></head><iframe src="/_Incapsula_Resource?CWUDNSAI=9&incident_id=163000490097146940-336953331276776217&edet=12&cinfo=464f095fc753818104000000" frameborder=0 width="100%" height="100%" marginheight="0px" marginwidth="0px">Request unsuccessful. Incapsula incident ID: 163000490097146940-336953331276776217</iframe></html>

but that is not the correct source for the website. Instead I should be getting something like this:

<div id="header">
    <a href="/">MyAnimeList.net</a>
</div>

<div id="menu">

<div id="menu_right">

    <div id="searchBar">

    <input type="text" class="inputtext" id="topSearchText" value="Search" onkeydown="return ts_checkEnter(event);" size="30" /> 
    <select id="topSearchValue" class="inputtext" onchange="ts_selection();">
    <option value="0">Anime</option>
    <option value="1">Manga</option>
    <option value="2">Characters</option>
    <option value="6">People</option>
    <option value="3">Fansub Groups</option>
    <option value="4">Clubs</option>
    <option value="5">Users</option>
    </select>

    <input type="image" src="http://cdn.myanimelist.net/images/magnify.gif" value="Search" onclick="ts_subSearch(5);" />

    </div>

</div>

<div id="menu_left">
    <ul id="nav">
                    <li class="small"><a href="/anime.php">Anime</a>
            <ul class="wider">
                                    <li><a href="/reviews.php?t=anime">Reviews</a></li>
                <li><a href="/recommendations.php?s=recentrecs&t=anime">Recommendations</a></li>
                <li><a href="/topanime.php">Top Anime</a></li>
                <li><a href="/fansub-groups.php">Fansub Groups</a></li>
            </ul>
        </li>
        <li class="small"><a href="/manga.php">Manga</a>
            <ul class="wider">
                                    <li><a href="/reviews.php?t=manga">Reviews</a></li>
                <li><a href="/recommendations.php?s=recentrecs&t=manga">Recommendations</a></li>
                <li><a href="/topmanga.php">Top Manga</a></li>
            </ul>
        </li>
        <li><a href="#">Community</a>
            <ul>
                <li><a href="/forum/">Forums</a></li>
                <li><a href="/clubs.php">Clubs</a></li>
                <li><a href="/blog.php">Blogs</a></li>
                <li><a href="/users.php">Users</a></li>
                <li><a href="/about.php?go=team">Staff</a></li>
                <li><a href="/about.php?go=support">Help</a></li>
            </ul>
        </li>
        <li class="medium"><a href="#">Industry</a>
            <ul class="wide">
                <li><a href="/people.php">People</a></li>
                <li><a href="/character.php">Characters</a></li>
                <li><a href="/news.php">News</a></li>
                <li><a href="/favorites.php">Top Favorites</a></li>
            </ul>
        </li>

        <li class="tiny"><a href="/register.php">Join</a></li>
        <li class="smaller"><a href="/ajaxtb.php?login" id="malLogin">Login</a></li>
                    </ul>
</div>
</div>

(I copied from Google Chrome using the view page source feature)

For some reason it seems that I am being identified as a bot, but no one else using the git https://github.com/Autumn/javaMAL seemed to have that problem. Can someone explain what went wrong and how I can fix it?

EDIT: I tried opening it with the demo JavaFX Browser and it works fine, but I tried using a JEditorPane in my program and it doesn't work... (gets the same code as I got with my code)...

XQEWR
  • 578
  • 3
  • 9
  • 23

2 Answers2

0

I use MyAnimeList API for access.

It requires username and password (this question might be helpful), so you need to write something similar to this:

URL url = new URL(location);
java.net.URLConnection connection = url.openConnection();
String userpass = "username:passwd";
String basicAuth = "Basic " + javax.xml.bind.DatatypeConverter.printBase64Binary(userpass.getBytes());
connection.setRequestProperty ("Authorization", basicAuth);

By the way, there is some dtd problem when I tried XML parser. So I use jsoup to do the parsing.

Community
  • 1
  • 1
cshu
  • 4,814
  • 23
  • 36
0

What you got was Incapsula, a DDOS prevention measure. Try this to get around:

String result = "";
try {
    URL url = new URL(sURL);
    URLConnection urlc = url.openConnection();
    urlc.setRequestProperty("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8");
    urlc.setRequestProperty("Accept-Language", "en");
    urlc.setRequestProperty("User-Agent", "Scrapy/0.24.2 (+http://scrapy.org)");
    BufferedReader buffer = new BufferedReader(new InputStreamReader(urlc.getInputStream()));
    StringBuilder sb = new StringBuilder();
    String str;
    while ((str = buffer.readLine()) != null) {
        sb.append(str);
    }
    result = sb.toString();

} catch (Exception e) {
    e.printStackTrace();
}
return result;

You can use different user agent, try http://www.whatsmyuseragent.com/ to find out your what user agent your current browser is.

Yukio Fukuzawa
  • 3,514
  • 6
  • 36
  • 58