HTML parsing is the process of consuming a serialization of an HTML document and producing a representation that you can work with programmatically — e.g., in order to extract data from it. The HTML specification defines a standard algorithm for parsing HTML, which is implemented in all major browsers.
HTML parsing typically involves converting an HTML document to a tree-based Document Object Model (DOM)
https://html.spec.whatwg.org/multipage/parsing.html#parsing has the standard algorithm for parsing HTML, which is implemented in all major browsers.
See also parsing.