1

I am trying to login into a website to scrap some data. It works when using

scrapy.FormRequest.from_response{...}

and specifiyng the formElement with xpath. However, I am somehow forced to use the requests library.

I have not been able to address the formElement correctly using

# Login Information
payload = {'login': USER_NAME, 'pwd': PASSWORD}
# Login
login_response = request_session.post(LOGIN_URL, data=payload)

where the LOGIN_URL ends with

LOGIN_URL = ".../authentication?action=2"

I am very new to these things. I have tried a lot of things. I have read (here) that I might have to specify the URL better. Here is a fragment of the page's HTML code containing the complete form.

<form name="form" method="post" action="authentication"
onsubmit="return onSubmit()">
<input type="hidden" name="action"
    value="1" />

<div>
    <div class="floatLeft">
        <div class="formElement">
            <label class="text" for="inputLogin">Login</label><br />
            <span class="inputtext"><input type="text"
                name="login"
                maxlength="80"
                id="inputLogin" class="text" /></span>
        </div>
        <div class="formElement">
            <label class="text" for="inputPassword">Password</label><br />
            <span class="inputtext"><input type="password"
                name="pwd"
                maxlength="35"
                id="inputPassword" class="text" /></span>
        </div>
        <br />
        <div class="formElement">
            <span class="inputsubmit"><input type="submit" name="submit"
                value="Log in"
                class="btn" /></span>
        </div>
    </div>
</div>

Being this my first post I hope that it is clear enough and I hope that somebody can help me. Thanks!

sanwall
  • 53
  • 5
  • the ? specifies a GET request which should not be used for sensitive data.. maybe the URL is wrong or maybe its just used to get the form.. – johnashu Feb 10 '18 at 20:44

1 Answers1

1

In case somebody comes across a similar problem:

Be sure you address all the inputs. When you look at the HTML code be sure to check the not visible ones. For example in the HTML fragment above:

<input type="hidden" name="action" value="1" />

The solution to it was to include this input "action" in the payload:

I am trying to login into a website to scrap some data. It works when using

scrapy.FormRequest.from_response{...} and specifiyng the formElement with xpath. However, I am somehow forced to use the requests library.

I have not been able to address the formElement correctly using

# Login Information
payload = {'login': USER_NAME, 'pwd': PASSWORD, 'action': 1}

Further, as pointed out by johnashu, I had to crop the URL to

LOGIN_URL = '.../authentication'    # i.e. without the GET request '?action=2'

(To get the exact URL you could use the DevTools in Chrome.)

sanwall
  • 53
  • 5