Questions tagged [diffbot]

A licenced visual learning robot that identifies and extracts the important parts of any web page

For more info, see http://www.diffbot.com

13 questions
8
votes
3 answers

Best way to extract text (e.g. articles) from web page

So I am trying to write a program which can collect certain information from different articles and combine them. The step in which I am having trouble is extracting the article from the web page. I was wondering whether you could provide any…
Saad Attieh
  • 1,096
  • 2
  • 18
  • 39
5
votes
4 answers

How do I submit a JSON array to this API?

I'm trying to use Diffbot to parse some URLs into the relevant article portion. They have an "Article API" that allows you to submit one link at a time and receive it back, but for speed I'd prefer to use the Batch API which basically allows you to…
Doug Smith
  • 27,683
  • 54
  • 189
  • 363
2
votes
2 answers

How to change api token when limit exceeds in python?

I have written a Diffbot API. It has 10,000 calls and 1 call per second. What should I do when the limit is exceeded?
blackmamba
  • 1,824
  • 9
  • 28
  • 54
2
votes
1 answer

HTML content extraction using Diffbot

Can someone help me I want to extract html data from http://www.quranexplorer.com/Hadith/English/Index.html. I have found a service that does exactly that http://diffbot.com/dev/docs/ they support data extraction via a simple api, the problem it…
user5601
  • 55
  • 1
  • 11
1
vote
2 answers

Python: No module found even though it exists in directory

Consider I have the following project structure: +PROJECT | +models | | =__init__.py | | =client.py | | =config.py | | +tests | | =__init__.py | | =example.py | | =example_two.py | | README.md | requirements.txt When I try to import a…
user9975313
1
vote
0 answers

Diifbot Product Api version 3 is returning images .But PRODUCT API CRAWL job doesn't . How can I get images in product api crawl job?

Diifbot product api version 3 is returning images . But when I am creating product api crawl job, it doesn't return any image . How can I get images in product api crawl job ?
Rachita
  • 19
  • 3
1
vote
1 answer

Diffbot API:"How can I get multiple images using Diffbot in node.js ?"

I am using Diffbot's article API for scraping the articles from any site. Currently I am getting articles with single image, but I want to scrape all the images for the particular article. Any suggestion will be appreciated.
abdulbarik
  • 5,407
  • 3
  • 31
  • 54
1
vote
1 answer

Does Diffbot execute JavaScript?

When using Diffbot API, do the APIs grab the content that's added via JS after the HTML has been loaded, or does Diffbot only see the immediately available HTML?
Swader
  • 10,807
  • 14
  • 46
  • 82
0
votes
1 answer

Sending cookies with Diffbot

Diffbot docs suggest that to set custom headers, including Cookies,I simply add the X-Forward prefix to the header. For example I do the following cookie='SportsDirect_AnonymousUserCurrency=CNY' user_agent = 'Mozilla/5.0 (X11; Linux x86_64)…
fpghost
  • 2,462
  • 2
  • 26
  • 46
0
votes
1 answer

Why doesn't diffbot see the price here?

I'm using diffbot to scrape products. It gets things right on most sites, and if it doesn't the custom API usually allows me to easily tweak until correct. However there are a few cases that are baffling me. I know diffbot doesn't execute javascript…
fpghost
  • 2,462
  • 2
  • 26
  • 46
0
votes
1 answer

Diffbot: "Where I can pass stats argument in analyze API?"

I am using Diffbot analyze API for detecting the page type and I want result like this {"stats":{"times":…
abdulbarik
  • 5,407
  • 3
  • 31
  • 54
0
votes
3 answers

Regex to tokenize and then get arbitrary tokens

I am not very familiar with regular expressions and ran into a problem which is beyond me. I would like help with coming up with an expression which tokenizes a string and then gets me everything BUT arbitrary tokens counting from the end. For…
Rabee
  • 487
  • 3
  • 17
-1
votes
1 answer

Diffbot URL encode

I got the problem with diffbot url encode problem. I have a URL and I pass url when I call diffbot api like this. //JsonNode json= (JsonNode)client.analyze(DiffbotClient.ResponseType.Jackson,url); but I got error massage about url encoding.this is…
sai aung myint soe
  • 139
  • 1
  • 1
  • 10