2

I am doing a project on web-page classification, for which I am interested in extracting data of visible text, images and svg's (including elements outside the viewport). I am having trouble accurately determining this.

I have checked in all corners of the internet for potential causes but unfortunately have not been succesful.

My current code looks like this:

var isHidden = el => {
    return Object.values(potentialHiddenCauses(el)).filter(x => x).length > 0;
}
var potentialHiddenCauses = el => {
    var style = window.getComputedStyle(el);
    var rect = el.getBoundingClientRect();
    var hasNodeWithVisibleText = Array.from(el.childNodes).filter(x => x.nodeType == 3 && x.nodeValue.replace(/\s/g, "").length > 0).length > 0;
    var data = {
        // a : !el.offsetParent, // can be a false positive?
        b : style.display == "none",
        c : style.opacity == 0,
        d : style.visibility == "hidden",
        e: el.offsetWidth == 0,
        f: el.offsetHeight == 0,
        i : rect.width == 0,
        j : rect.height == 0,
        k : rect.x < 0,
        l: rect.x > document.documentElement.scrollWidth,
        m : rect.y < 0,
        n: rect.y > document.documentElement.scrollHeight,
        o : hasNodeWithVisibleText && style.fontSize == "0px",
        p : el.tagName.toLowerCase() == "img" && !el.src
    }
    return data;
}

I previously also checked element.style properties but to my understanding they may not be accurate while getComputedStyle should be.

Every time I think my check is solid, there is an edge-case, causing me to have to restart the entire data collection process.

My latest problem is on https://www.jbhifi.com.au/bose/bose-quietcomfort-35-ii-wireless-over-ear-headphones-black/505852/, where texts and images in the dropdown menu are considered visible when they are not displayed.

It would be great if someone could tell me about missing checks or flaws in my approach.

  • Have you considered something like this? https://stackoverflow.com/questions/20791374/jquery-check-if-element-is-visible-in-viewport there is a library for jQuery which can detect for you. I don't know if jQuery is suitable for your project (and I do not know how to do it the hard-core way) but it might be a way forward for you. – Rack Sinchez Jul 18 '19 at 15:29
  • Check [this post](https://stackoverflow.com/questions/123999/how-to-tell-if-a-dom-element-is-visible-in-the-current-viewport/7557433#7557433). – Peter Jul 18 '19 at 15:37
  • @Peter thanks for the suggestion to use event listeners, however this does not seem to be the problem in this case. A simple document.onclick or window.setInterval function to print the state of the invisible element is considered hidden according to my checks. – masterpusher Jul 18 '19 at 15:59
  • @RackSinchez I am using selenium to visit the pages and then executing some javascript. I guess I could try to interact with any loaded jquery or insert my own jquery source, however I feel that would cause its own range of problems. It could be a possibility if I cannot find a solution to this however I feel like I am probably missing something simple.. – masterpusher Jul 18 '19 at 16:02

0 Answers0