0

I have tried to use this:

c=requests.get('https://www.uniberg.com/referenzen.html').text
c.count('Programmierung')

But the output shows 2 occurances while there are actually none.

Also I tried this:

a=requests.get('https://www.uniberg.com/index.html').text.count('Mitarbeiter')

but it also returns the count of words like Mitarbeiterphilosophie which I don't want. Can someone find a way to improve this or suggest another method?

  • `

    Design und Architektur einer OpenStack-Umgebung zur Integration einer virtualisierten IMS Open Source Lösung. Aufbau, Integration und Installation. Programmierung und Automatisierung der funktionalen Erweiterungen zur Integration in die Rechenzentrums-Infrastruktur insbesondere hinsichtlich Deployment und Skalierung.

    `
    – abarnert Jun 26 '18 at 06:35
  • What made you think there are no occurrences? – abarnert Jun 26 '18 at 06:35
  • Use NLTK to find the count https://www.reddit.com/r/pythontips/comments/4mu9qq/word_count_using_text_mining_module_nltk_natural/ – Surya Tej Jun 26 '18 at 06:39
  • Possible duplicate of [item frequency count in python](https://stackoverflow.com/questions/893417/item-frequency-count-in-python) – bobrobbob Jun 26 '18 at 07:57

2 Answers2

1

Today https://www.uniberg.com/referenzen.html contanins 2 occurances Programmierung

I think, you need check in HTML source code, not in the render using a browser.

The words Programmierung are on HTML section with this CSS

section .detail {
    display: none;
}

For the second point :

try this (using regex) :

import re
len(re.findall(r'\WMitarbeiter\W', requests.get('https://www.uniberg.com/index.html').text))

With regex :

  • \w stands for "word character", usually [A-Za-z0-9_].
  • \W is short for [^\w], the negated version of \w.
Indent
  • 4,465
  • 1
  • 13
  • 30
0

requests.get(URL) returns the entire Web-page(look at it with ctrl+U on Google-Chrome or just use wget to download the webpage) and not just what is rendered by web browser.That's why count is showing up as 2.

Tom Riddle
  • 83
  • 1
  • 9