Questions tagged [html-parsing]

HTML parsing is the process of consuming a serialization of an HTML document and producing a representation that you can work with programmatically — e.g., in order to extract data from it. The HTML specification defines a standard algorithm for parsing HTML, which is implemented in all major browsers.

HTML parsing typically involves converting an HTML document to a tree-based Document Object Model (DOM)

https://html.spec.whatwg.org/multipage/parsing.html#parsing has the standard algorithm for parsing HTML, which is implemented in all major browsers.

Why single quotes are being shown as double quotes?

The output is:

'b.gif' is being shown as "b.gif" Viewing the source code in FireBug also shows double quotes. Why is this…

html parsing html-parsing dom

asked Jul 18 '12 at 19:08

gom

votes

1 answer

Temporary removal of HTML from string for Google Translate API to reduce cost

I have to translate some details using a Google API which we're paying for. The details contain HTML, and Google charges for each character. I don't want to send the complete content, but only the English text instead, with the HTML removed. I can…

php api html-parsing translation google-translate

asked Jul 18 '12 at 12:55

Atif Ali

votes

1 answer

How to remove all style sheet using ganon DOM parser

I am using ganon(http://code.google.com/p/ganon/) DOM parser to manipulate the html content. I need to manupulate the given html page. For that first I need to remove all stylesheet (link tags) from dom. But I didnt find any function to remove all…

php dom html-parsing ganon

asked Jul 12 '12 at 09:07

Rahul PK

votes

2 answers

UnknownHostException while accessing HTTPS url using jsoup

I am trying to parse and manipulate HTML using jsoup. It is working perfectly fine for HTTP URLs but it's throwing UnknownHostException if a HTTPS URL is used. Following is my code: System.setProperty("http.proxyHost",…

java https html-parsing jsoup

asked Jul 09 '12 at 09:47

Umer Hayat

1,913
4
30
56

votes

1 answer

jsf user html input security

in my application i have some tinymce editors and the userinput is shown with but how can i prevent malicious input, like javascript or iframes? Is there any lib which can filter the input strings? UPDATE: i found…

security jsf html-parsing wysiwyg

asked Jul 03 '12 at 13:38

wutzebaer

12,445
18
77
144

votes

3 answers

BeautifulSoup: parse only part of the page

I want to parse a part of html page, say my_string = """

Some text. Some text. Some text. Some text. Some text. Some text. Link1 Link2

One more paragraph

""" I pass this…

html-parsing beautifulsoup

asked Jun 30 '12 at 23:56

Vlad T.

2,229
2
22
36

votes

4 answers

How can I selectively modify the src attributes of script tags in an HTML document using Perl?

I need to write a regular expression in Perl that will prefix all srcs with [perl]texthere[/perl], like such: Any help? Thanks!

regex perl html-parsing html-parser

asked Jun 28 '12 at 20:24

eggplantkiller

votes

2 answers

Displaying multilingual characters in a Android app

I am getting multilingual characters in my response. The HTML code of some of them are 中国(Chinese) 日本 (Japanese), हिंदी Hindi [Indian]), 한국의 (Korean). How do I display this characters in my app?

android html-parsing json multilingual

asked Jun 27 '12 at 13:34

Dhrumil Shah - dhuma1981

10,650
5
28
37

votes

2 answers

Dual select box not POSTing correctly

I'm still trying to learn jquery so bear with me. I have a dual select box that only works if I select all the results of the second select box after I move them there. What I want is when the first box transfers values to the second second select…

php jquery html forms html-parsing

asked Jun 25 '12 at 14:20

john

1,222
3
19
34

votes

1 answer

Parsing XHTML string with Regex in Javascript and converting it to DOM

Disclaimer: before the you-can't-parse-html-with-regex blind mantra begins - please give me the benefit of the doubt and read this question to the end (+ assume I already know about That RegEx-ing the HTML will drive you crazy and Parsing Html The…

javascript regex dom xhtml html-parsing

asked Jun 21 '12 at 21:44

Michael

1,662
2
17
24

votes

2 answers

How to output exact character string "

I'm new to php, I have not been able to figure out how to output the exact string "

php html-parsing

asked Jun 19 '12 at 15:20

new2php

votes

1 answer

Extract information from HTML document with C

In my quest to learn C (Plain C, not C#, nor C++. I have my reasons.), I have come across the need to extract some information from a HTML document, fetched from a URL. Namely, I want all href attributes from the links residing in a certain…

c html-parsing

asked Jun 18 '12 at 14:44

mkaito

votes

1 answer

Using Html Agility Pack for parsing Html

I have this html

c# c#-4.0 html-parsing html-agility-pack

asked Jun 14 '12 at 12:15

BILL

4,303
10
51
91

votes

2 answers

How to get HTML content of a website

In viewDidLoad, I'm using NSURLRequest and NSURLConnection: NSURLRequest *site_request = [NSURLRequest requestWithURL:[NSURL URLWithString:@"http://www.google.com/"] cachePolicy:NSURLRequestUseProtocolCachePolicy …

ios uiwebview html-parsing xcode4.3

asked Jun 07 '12 at 01:59

invader7

votes

3 answers

Tool to write XPATH automatically for web parser?

Currently I need to extract data from websites. I tried using HTML Agility Pack, which uses XPATH to extract data. Is there a tool available which automates writing XPATH so that even a naive user can use the configure the parsing tool without…

xpath html-parsing html-agility-pack

asked May 30 '12 at 08:02

Madhana Kumar

Prev 1 2 3

…

100 Next