5

I have a mhtml file in my local folder stored at file_path which contains the user event logs. I am able to open it using the following code:

with open(file_path, 'r') as fp:
    message = email.message_from_file(fp)
    for part in message.walk():
        if (part.get_content_type() == "text/html"):
            soup = BeautifulSoup(part.get_payload(decode=False), 'html.parser')
            parsed_data = soup.prettify()
            print(parsed_data)

Given below is a portion of the output I got using the above code:

<!DOCTYPE html>
<html>
 <head>
  <meta content="text/html; charset=utf-8" http-equiv="Content-Type"/>
  <link href="main.css" rel="stylesheet" type="text/css"/>
  <script type="text/javascript">
   function zoomToggle(num)
{
  var img = document.getElementById("ss-" + num);

  if (img.className == "screenshot") {
    img.className = "screenshot-thumb";
  }
  else {
    img.className = "screenshot";
  }

  return false;
}
  </script>
  <title>
   Recorded Steps
  </title>
 </head>
 <body>
  <!-- This is the recorded XML data that was used in generating this page. -->
  <xml id="recordeddata">
   <script id="myXML" type="text/xml">
    <?xml version="1.0" encoding="UTF-8"?>
<Report>
  <System MajorVersion="10" MinorVersion="0" ServicePackMajor="0" ServicePackMinor="0" BuildNumber="18362" Sku="101" Platform="2" />
  <UserActionData>
    <RecordSession SessionCount="1" StartTime="11:51:58 AM" StopTime="11:52:39 AM" ActionCount="11" MissedActionCount="0">
      <EachAction ActionNumber="1" Time="11:52:01 AM" Pid="8316" ProgramId="0000f519feec486de87ed73cb92d3cac802400000000" FileId="0000c07130e269ebfeefcb7d893f01498bc96aa5ab24" FileVersion="10.0.18362.1 (WinBuild.160101.0800)" FileDescription="Windows Explorer" FileCompany="Microsoft Corporation" FileName="EXPLORER.EXE" CommandLine="EXPLORER.EXE">
        <Description>User left click on "Word - 1 running window (button)"</Description>
        <Action>Mouse Left Click</Action>
        <CursorCoordsXY>1383,1061</CursorCoordsXY>
        <ScreenCoordsXYWH>0,0,1920,1080</ScreenCoordsXYWH>
        <UIAStack>
          <Level AutomationId="Microsoft.Office.WINWORD.EXE.15" BoundingRectangle="1358,1030,62,50" ControlType="50000" Name="Word - 1 running window" LocalizedControlType="button" />
          <Level BoundingRectangle="610,1030,950,50" ClassName="MSTaskListWClass" ControlType="50021" FrameworkId="Win32" Name="Running applications" LocalizedControlType="tool bar" />
          <Level BoundingRectangle="610,1030,950,50" ClassName="MSTaskSwWClass" ControlType="50033" FrameworkId="Win32" Name="Running applications" LocalizedControlType="pane" />
          <Level AutomationId="40965" BoundingRectangle="608,1030,952,50" ClassName="ReBarWindow32" ControlType="50033" FrameworkId="Win32" LocalizedControlType="pane" />
          <Level BoundingRectangle="0,1030,1920,50" ClassName="Shell_TrayWnd" ControlType="50033" FrameworkId="Win32" Name="Taskbar" LocalizedControlType="pane" />
        </UIAStack>
        <ScreenshotFileName>screenshot0001.JPEG</ScreenshotFileName>
      </EachAction>
      <EachAction ActionNumber="2" Time="11:52:03 AM" Pid="17892" ProgramId="00062dde378ad4a45da1a2e77a67bb6a717100000000" 

Also there are multiple ... following after that.

I am trying to retrieve the features in EachAction tab using Python. I am not able to find a solution to this. Could someone please help me on this?

Thanks

Swap2019
  • 93
  • 4
  • 1
    What features do you want? Can you post expected output? – Andrej Kesely Jun 03 '20 at 07:08
  • I am trying to fetch the 'ActionNumber' value, 'Time' value, and 'FileDescription' value from all the 'EachAction' tab. Also, I need the 'Action' tab value as well. I need to save all these data in a tabular format for all the 'EachAction' tab. – Swap2019 Jun 03 '20 at 16:47

1 Answers1

0

As stated in comments, to get the necessary data you can use this this script (for the txt variable I used HTML stated below, you can use your HTML payload):

soup = BeautifulSoup(txt, 'html.parser')

# locate <script> with XML data
script = soup.select_one('script#myXML')

# parse the XML data
xml_soup = BeautifulSoup(script.contents[0], 'html.parser')

# get data
all_data = []
for each_eaction in xml_soup.select('EachAction'):
    all_data.append({'ActionNumber': each_eaction['actionnumber'],
                     'Time':each_eaction['time'],
                     'FileDescription':each_eaction['filedescription'],
                     'Action':each_eaction.find('action').get_text(strip=True)})

# print data:
for line in all_data:
    print('{:<30}{:<30}{:<30}{:<30}'.format(*line.values()))

Prints:

1                             11:52:01 AM                   Windows Explorer              Mouse Left Click              
2                             12:52:01 AM                   Windows Explorer 1            Mouse Left Click 2            
3                             10:52:01 AM                   Windows Explorer 2            Mouse Left Click 3            

The HTML I used:

txt = '''<!DOCTYPE html>
<html>
 <head>
  <meta content="text/html; charset=utf-8" http-equiv="Content-Type"/>
  <link href="main.css" rel="stylesheet" type="text/css"/>
  <script type="text/javascript">
   function zoomToggle(num)
{
  var img = document.getElementById("ss-" + num);

  if (img.className == "screenshot") {
    img.className = "screenshot-thumb";
  }
  else {
    img.className = "screenshot";
  }

  return false;
}
  </script>
  <title>
   Recorded Steps
  </title>
 </head>
 <body>
  <!-- This is the recorded XML data that was used in generating this page. -->
  <xml id="recordeddata">
   <script id="myXML" type="text/xml">
    <?xml version="1.0" encoding="UTF-8"?>
<Report>
  <System MajorVersion="10" MinorVersion="0" ServicePackMajor="0" ServicePackMinor="0" BuildNumber="18362" Sku="101" Platform="2" />
  <UserActionData>
    <RecordSession SessionCount="1" StartTime="11:51:58 AM" StopTime="11:52:39 AM" ActionCount="11" MissedActionCount="0">
      <EachAction ActionNumber="1" Time="11:52:01 AM" Pid="8316" ProgramId="0000f519feec486de87ed73cb92d3cac802400000000" FileId="0000c07130e269ebfeefcb7d893f01498bc96aa5ab24" FileVersion="10.0.18362.1 (WinBuild.160101.0800)" FileDescription="Windows Explorer" FileCompany="Microsoft Corporation" FileName="EXPLORER.EXE" CommandLine="EXPLORER.EXE">
        <Description>User left click on "Word - 1 running window (button)"</Description>
        <Action>Mouse Left Click</Action>
        <CursorCoordsXY>1383,1061</CursorCoordsXY>
        <ScreenCoordsXYWH>0,0,1920,1080</ScreenCoordsXYWH>
        <UIAStack>
          <Level AutomationId="Microsoft.Office.WINWORD.EXE.15" BoundingRectangle="1358,1030,62,50" ControlType="50000" Name="Word - 1 running window" LocalizedControlType="button" />
          <Level BoundingRectangle="610,1030,950,50" ClassName="MSTaskListWClass" ControlType="50021" FrameworkId="Win32" Name="Running applications" LocalizedControlType="tool bar" />
          <Level BoundingRectangle="610,1030,950,50" ClassName="MSTaskSwWClass" ControlType="50033" FrameworkId="Win32" Name="Running applications" LocalizedControlType="pane" />
          <Level AutomationId="40965" BoundingRectangle="608,1030,952,50" ClassName="ReBarWindow32" ControlType="50033" FrameworkId="Win32" LocalizedControlType="pane" />
          <Level BoundingRectangle="0,1030,1920,50" ClassName="Shell_TrayWnd" ControlType="50033" FrameworkId="Win32" Name="Taskbar" LocalizedControlType="pane" />
        </UIAStack>
        <ScreenshotFileName>screenshot0001.JPEG</ScreenshotFileName>
      </EachAction>
      <EachAction ActionNumber="2" Time="12:52:01 AM" Pid="8316" ProgramId="0000f519feec486de87ed73cb92d3cac802400000000" FileId="0000c07130e269ebfeefcb7d893f01498bc96aa5ab24" FileVersion="10.0.18362.1 (WinBuild.160101.0800)" FileDescription="Windows Explorer 1" FileCompany="Microsoft Corporation" FileName="EXPLORER.EXE" CommandLine="EXPLORER.EXE">
        <Description>User left click on "Word - 1 running window (button)"</Description>
        <Action>Mouse Left Click 2</Action>
        <CursorCoordsXY>1383,1061</CursorCoordsXY>
        <ScreenCoordsXYWH>0,0,1920,1080</ScreenCoordsXYWH>
        <UIAStack>
          <Level AutomationId="Microsoft.Office.WINWORD.EXE.15" BoundingRectangle="1358,1030,62,50" ControlType="50000" Name="Word - 1 running window" LocalizedControlType="button" />
          <Level BoundingRectangle="610,1030,950,50" ClassName="MSTaskListWClass" ControlType="50021" FrameworkId="Win32" Name="Running applications" LocalizedControlType="tool bar" />
          <Level BoundingRectangle="610,1030,950,50" ClassName="MSTaskSwWClass" ControlType="50033" FrameworkId="Win32" Name="Running applications" LocalizedControlType="pane" />
          <Level AutomationId="40965" BoundingRectangle="608,1030,952,50" ClassName="ReBarWindow32" ControlType="50033" FrameworkId="Win32" LocalizedControlType="pane" />
          <Level BoundingRectangle="0,1030,1920,50" ClassName="Shell_TrayWnd" ControlType="50033" FrameworkId="Win32" Name="Taskbar" LocalizedControlType="pane" />
        </UIAStack>
        <ScreenshotFileName>screenshot0001.JPEG</ScreenshotFileName>
      </EachAction>
      <EachAction ActionNumber="3" Time="10:52:01 AM" Pid="8316" ProgramId="0000f519feec486de87ed73cb92d3cac802400000000" FileId="0000c07130e269ebfeefcb7d893f01498bc96aa5ab24" FileVersion="10.0.18362.1 (WinBuild.160101.0800)" FileDescription="Windows Explorer 2" FileCompany="Microsoft Corporation" FileName="EXPLORER.EXE" CommandLine="EXPLORER.EXE">
        <Description>User left click on "Word - 1 running window (button)"</Description>
        <Action>Mouse Left Click 3</Action>
        <CursorCoordsXY>1383,1061</CursorCoordsXY>
        <ScreenCoordsXYWH>0,0,1920,1080</ScreenCoordsXYWH>
        <UIAStack>
          <Level AutomationId="Microsoft.Office.WINWORD.EXE.15" BoundingRectangle="1358,1030,62,50" ControlType="50000" Name="Word - 1 running window" LocalizedControlType="button" />
          <Level BoundingRectangle="610,1030,950,50" ClassName="MSTaskListWClass" ControlType="50021" FrameworkId="Win32" Name="Running applications" LocalizedControlType="tool bar" />
          <Level BoundingRectangle="610,1030,950,50" ClassName="MSTaskSwWClass" ControlType="50033" FrameworkId="Win32" Name="Running applications" LocalizedControlType="pane" />
          <Level AutomationId="40965" BoundingRectangle="608,1030,952,50" ClassName="ReBarWindow32" ControlType="50033" FrameworkId="Win32" LocalizedControlType="pane" />
          <Level BoundingRectangle="0,1030,1920,50" ClassName="Shell_TrayWnd" ControlType="50033" FrameworkId="Win32" Name="Taskbar" LocalizedControlType="pane" />
        </UIAStack>
        <ScreenshotFileName>screenshot0001.JPEG</ScreenshotFileName>
      </EachAction>
    </UserActionData>
</Report>
</script>
</body>
</html>'''
Andrej Kesely
  • 81,807
  • 10
  • 31
  • 56