Questions tagged [fuzzy-comparison]

Fuzzy comparison is the colloquial name for Approximate String matching, the technique of finding strings that match a pattern approximately (rather than exactly).

Fuzzy comparison is the colloquial name for Approximate String matching, the technique of finding strings that match a pattern approximately (rather than exactly). This problem is typically divided into two sub-problems: finding approximate substring matches inside a given string and finding dictionary strings that match the pattern approximately.

Useful links

Wikipedia entry

Related tags

307 questions

votes

4 answers

SQL and fuzzy comparison

Let's assume we have a table of People (name, surname, address, SSN, etc). We want to find all rows that are "very similar" to specified person A. I would like to implement some kind of fuzzy logic comparation of A and all rows from table People.…

mysql sql select fuzzy-logic fuzzy-comparison

asked Apr 03 '13 at 23:12

running.t

4,161
2
22
45

votes

2 answers

fuzzy join with stringdist_join() in R, Error: NAs are not allowed in subscripted assignments

First of all I am sorry if my formatting is bad, this is my first time posting, (also new to programming & R) I am trying to merge two data frames together on string variables. I am merging university names, which might not match up perfectly, so I…

r dplyr merge fuzzy-comparison fuzzyjoin

asked Nov 01 '18 at 21:07

Brian

votes

3 answers

How to group / compare similar news articles

In an app that i'm creating, I want to add functionality that groups news stories together. I want to group news stories about the same topic from different sources into the same group. For example, an article on XYZ from CNN and MSNBC would be in…

fuzzy-comparison

asked Jul 23 '10 at 17:25

Randy

votes

4 answers

Canonical URL compare in Python?

Are there any tools to do a URL compare in Python? For example, if I have http://google.com and google.com/ I'd like to know that they are likely to be the same site. If I were to construct a rule manually, I might Uppercase it, then strip off the…

python fuzzy-comparison

asked Jul 19 '10 at 21:36

Colin Davis

votes

0 answers

Fuzzy merging in R - seeking help to improve my code

Inspired by the experimental fuzzy_join function from the statar package I wrote a function myself which combines exact and fuzzy (by string distances) matching. The merging job I have to do is quite big (resulting into multiple string distance…

r parallel-processing data.table fuzzy-comparison stringdist

asked Apr 04 '15 at 17:38

chameau13

votes

2 answers

Comparing (similar) images with Python/PIL

I'm trying to calculate the similarity (read: Levenshtein distance) of two images, using Python 2.6 and PIL. I plan to us e the python-levenshtein library for fast comparison. Main question: What is a good strategy for comparing images? My idea is…

python python-imaging-library fuzzy-logic fuzzy-comparison

asked Apr 08 '10 at 21:53

Attila O.

13,553
9
51
82

votes

3 answers

How can I find the best fit subsequences of a large string?

Say I have one large string and an array of substrings that when joined equal the large string (with small differences). For example (note the subtle differences between the strings): large_str = "hello, this is a long string, that may be made up of…

python algorithm levenshtein-distance fuzzy-comparison lcs

asked Aug 31 '17 at 21:15

Josh Voigts

3,830
1
16
39

votes

1 answer

How to perform a fuzzy join with fuzzyjoin::difference_* in R

I'm working with two different datasets that I want to merge based on a threshold. Let's say the two dataframes look like this: library(dplyr) library(fuzzyjoin) library(lubridate) df1 = data_frame(Item=1:5, DateTime=c("2015-01-01…

r fuzzy-comparison fuzzyjoin

asked Sep 22 '16 at 16:55

brittenb

5,849
3
30
58

votes

3 answers

How to merge two pandas DataFrames based on a similarity function?

Given dataset 1 name,x,y st. peter,1,2 big university portland,3,4 and dataset 2 name,x,y saint peter3,4 uni portland,5,6 The goal is to merge on d1.merge(d2, on="name", how="left") There are no exact matches on name though. So I'm looking to do…

python pandas merge fuzzy-comparison

asked Feb 13 '16 at 14:10

PascalVKooten

18,070
15
82
140

votes

3 answers

Fast way to match strings with typo

I have a huge list of strings (city-names) and I want to find the name of a city even if the user makes a typo. Example User types "chcago" and the system finds "Chicago" Of course I could calculate the Levenshtein distance of the query for all…

string algorithm performance match fuzzy-comparison

asked Oct 31 '15 at 13:16

user2033412

1,598
1
17
40

votes

1 answer

Generate "fuzzy" difference of two files in Python, with approximate comparison of floats

I have an issue for comparing two files. Basically, what I want to do is a UNIX-like diff between two files, for example: $ diff -u left-file right-file However my two files contain floats; and because these files were generated on distinct…

python floating-point fuzzy-comparison inexact-arithmetic

asked Jun 24 '10 at 08:23

piwi

votes

1 answer

Merge dataframes on multiple columns with fuzzy match in Python

I have two example dataframes as follows: df1 = pd.DataFrame({'Name': {0: 'John', 1: 'Bob', 2: 'Shiela'}, 'Degree': {0: 'Masters', 1: 'Graduate', 2: 'Graduate'}, 'Age': {0: 27, 1: 23, 2: 21}}) df2 =…

python pandas dataframe fuzzy-comparison

asked Jan 05 '19 at 08:22

ah bon

5,121
5
26
65

votes

2 answers

Fuzzy record matching with multiple columns of information

I have a question that is somewhat high level, so I'll try to be as specific as possible. I'm doing a lot of research that involves combining disparate data sets with header information that refers to the same entity, usually a company or a…

algorithm theory string-matching fuzzy-comparison record-linkage

asked Mar 08 '11 at 19:55

WildGunman

votes

4 answers

SQL Fuzzy Join - MSSQL

I have two sets of data. Existing customers and potential customers. My main objective is to figure out if any of the potential customers are already existing customers. However, the naming conventions of customers across data sets are…

sql tsql fuzzy-search fuzzy-logic fuzzy-comparison

asked Aug 31 '16 at 13:35

hansolo

votes

1 answer

The best way to search millions of fuzzy hashes

I have the spamsum composite hashes for about ten million files in a database table and I would like to find the files that are reasonably similar to each other. Spamsum hashes are composed of two CTPH hashes of maximum 64 bytes and they look like…

lucene levenshtein-distance fuzzy-search fuzzy-comparison

asked Jun 01 '15 at 00:29

retrography

4,972
3
16
27

Prev 1

…

20 21 Next