I have used material from here and a previous forum page to write some code for a program that will automatically calculate the semantic similarity between consecutive sentences across a whole text. Here it is;
The code for the first part is copy pasted from the first link, then I have this stuff below which I put in after the 245 line. I removed all excess after line 245.
with open ("File_Name", "r") as sentence_file:
while x and y:
x = sentence_file.readline()
y = sentence_file.readline()
similarity(x, y, true)
#boolean set to false or true
x = y
y = sentence_file.readline()
My text file is formatted like this;
Red alcoholic drink. Fresh orange juice. An English dictionary. The Yellow Wallpaper.
In the end I want to display all the pairs of consecutive sentences with the similarity next to it, like this;
["Red alcoholic drink.", "Fresh orange juice.", 0.611],
["Fresh orange juice.", "An English dictionary.", 0.0]
["An English dictionary.", "The Yellow Wallpaper.", 0.5]
if norm(vec_1) > 0 and if norm(vec_2) > 0:
return np.dot(vec_1, vec_2.T) / (np.linalg.norm(vec_1)* np.linalg.norm(vec_2))
elif norm(vec_1) < 0 and if norm(vec_2) < 0:
???Move On???