Questions tagged [html-parsing]

HTML parsing is the process of consuming a serialization of an HTML document and producing a representation that you can work with programmatically — e.g., in order to extract data from it. The HTML specification defines a standard algorithm for parsing HTML, which is implemented in all major browsers.

HTML parsing typically involves converting an HTML document to a tree-based Document Object Model (DOM)

https://html.spec.whatwg.org/multipage/parsing.html#parsing has the standard algorithm for parsing HTML, which is implemented in all major browsers.

See also .

5674 questions
2
votes
1 answer

Why single quotes are being shown as double quotes?

The output is: 'b.gif' is being shown as "b.gif" Viewing the source code in FireBug also shows double quotes. Why is this…
gom
  • 629
  • 2
  • 9
  • 21
2
votes
1 answer

Temporary removal of HTML from string for Google Translate API to reduce cost

I have to translate some details using a Google API which we're paying for. The details contain HTML, and Google charges for each character. I don't want to send the complete content, but only the English text instead, with the HTML removed. I can…
Atif Ali
  • 25
  • 4
2
votes
1 answer

How to remove all style sheet using ganon DOM parser

I am using ganon(http://code.google.com/p/ganon/) DOM parser to manipulate the html content. I need to manupulate the given html page. For that first I need to remove all stylesheet (link tags) from dom. But I didnt find any function to remove all…
Rahul PK
  • 270
  • 3
  • 22
2
votes
2 answers

UnknownHostException while accessing HTTPS url using jsoup

I am trying to parse and manipulate HTML using jsoup. It is working perfectly fine for HTTP URLs but it's throwing UnknownHostException if a HTTPS URL is used. Following is my code: System.setProperty("http.proxyHost",…
Umer Hayat
  • 1,913
  • 4
  • 30
  • 56
2
votes
1 answer

jsf user html input security

in my application i have some tinymce editors and the userinput is shown with but how can i prevent malicious input, like javascript or iframes? Is there any lib which can filter the input strings? UPDATE: i found…
wutzebaer
  • 12,445
  • 18
  • 77
  • 144
2
votes
3 answers

BeautifulSoup: parse only part of the page

I want to parse a part of html page, say my_string = """

Some text. Some text. Some text. Some text. Some text. Some text. Link1 Link2

One more paragraph

""" I pass this…
Vlad T.
  • 2,229
  • 2
  • 22
  • 36
2
votes
4 answers

How can I selectively modify the src attributes of script tags in an HTML document using Perl?

I need to write a regular expression in Perl that will prefix all srcs with [perl]texthere[/perl], like such: Any help? Thanks!
2
votes
2 answers

Displaying multilingual characters in a Android app

I am getting multilingual characters in my response. The HTML code of some of them are 中国(Chinese) 日本 (Japanese), हिंदी Hindi [Indian]), 한국의 (Korean). How do I display this characters in my app?
Dhrumil Shah - dhuma1981
  • 10,650
  • 5
  • 28
  • 37
2
votes
2 answers

Dual select box not POSTing correctly

I'm still trying to learn jquery so bear with me. I have a dual select box that only works if I select all the results of the second select box after I move them there. What I want is when the first box transfers values to the second second select…
john
  • 1,222
  • 3
  • 19
  • 34
2
votes
1 answer

Parsing XHTML string with Regex in Javascript and converting it to DOM

Disclaimer: before the you-can't-parse-html-with-regex blind mantra begins - please give me the benefit of the doubt and read this question to the end (+ assume I already know about That RegEx-ing the HTML will drive you crazy and Parsing Html The…
Michael
  • 1,662
  • 2
  • 17
  • 24
2
votes
1 answer

Extract information from HTML document with C

In my quest to learn C (Plain C, not C#, nor C++. I have my reasons.), I have come across the need to extract some information from a HTML document, fetched from a URL. Namely, I want all href attributes from the links residing in a certain…
mkaito
  • 795
  • 6
  • 17
2
votes
1 answer

Using Html Agility Pack for parsing Html

I have this html
BILL
  • 4,303
  • 10
  • 51
  • 91
2
votes
2 answers

How to get HTML content of a website

In viewDidLoad, I'm using NSURLRequest and NSURLConnection: NSURLRequest *site_request = [NSURLRequest requestWithURL:[NSURL URLWithString:@"http://www.google.com/"] cachePolicy:NSURLRequestUseProtocolCachePolicy …
invader7
  • 412
  • 1
  • 5
  • 11
2
votes
3 answers

Tool to write XPATH automatically for web parser?

Currently I need to extract data from websites. I tried using HTML Agility Pack, which uses XPATH to extract data. Is there a tool available which automates writing XPATH so that even a naive user can use the configure the parsing tool without…
Madhana Kumar
  • 68
  • 1
  • 1
  • 10
1 2 3
99
100