0

I am trying to get information from this website that my company uses I have the html file and the div tags that they use along with all the id information. I want to pull the different ids so that my company can use the same ids when identifying different suppliers.

<div id="Vend_Sel_372129" style="overflow:hidden;" class="Sel_Disp" data_sel="0" data_value="Supplierxx" data_id="372129" data_codec="729" data_prefix="0183011" data_content="" onclick="Test_Init_Supplier_Change(this,0);">
<a class="s" title="Suplierxx" href="javascript:void(0)">Suplierxx</a>

I have a large amount of this data and I want to be able to pull out the id, data_id and the data_value and put it all in a table. Preferably not doing it all one at a time because I have around a thousand div tags like these.

Marcucciboy2
  • 3,053
  • 3
  • 15
  • 33
Jamie
  • 3
  • 1

2 Answers2

1

Depending on how well formed the html file is this might work. See XPath tutorial for more search options. Parsing 10,000 tags took 1 seconds for me.

Option Explicit
Sub extract()

    Const HTML_FILE = "C:\temp\test10000.html"

    Dim obj, ws As Worksheet, iRow As Long, tags As Variant, t0 As Single
    tags = Array("id", "data_id", "data_value")

    ' set up results sheet
    t0 = Timer
    Set ws = ThisWorkbook.Sheets("Sheet1")
    ws.Cells.Clear
    ws.Range("A1:C1") = Array("id", "data_id", "data_value")
    iRow = 1

    ' create xml parser
    Set obj = CreateObject("MSXML2.DOMDocument.6.0")
    With obj
        .setProperty "SelectionLanguage", "XPath"
        .validateOnParse = True
        .Load HTML_FILE
    End With

    ' tags to search for
    Dim xpath As String
    xpath = "//div[@class ='Sel_Disp']"

    ' search
    Dim nodes As Object, node As Object, i As Long
    Set nodes = obj.SelectNodes(xpath)

    ' output to sheet1
    For Each node In nodes
        iRow = iRow + 1
        'Debug.Print iRow, node.XML
        For i = 0 To UBound(tags)
           ws.Cells(iRow, i + 1) = node.getAttribute(tags(i))
        Next
    Next

    ' end
    MsgBox iRow - 1 & " rows written", vbInformation, "Completed in " & Int(Timer - t0) & " secs"
End Sub
Marcucciboy2
  • 3,053
  • 3
  • 15
  • 33
CDP1802
  • 4,364
  • 2
  • 3
  • 12
0

With Javascript, you can loop over the whole dataset, extracting the data with element.getAttribute(attributeName). You can then output it in csv format, which excel can read.

More Info:

EDIT: I just saw that the data doesn't come from a table but instead many divs. Here's another link that will help

lukstru
  • 26
  • 1
  • 7
  • I tried out this but it didn't work to find and print the Vend_Sel ids – Jamie Mar 03 '20 at 14:02
  • you need to loop over the parent of the divs, not the divs themselves. E.g. the id of the body tag or whatever tag is around your data. It would look like this `$('#parent_id').find('div').each(() => {let csvRow = $(this).attr('id') + csvSeparator + $(this).attr('data_id')...})` – lukstru Mar 03 '20 at 14:16