How can the Euclidean distance be calculated with NumPy?

Question

I have two points in 3D:

(xa, ya, za)
(xb, yb, zb)

And I want to calculate the distance:

dist = sqrt((xa-xb)^2 + (ya-yb)^2 + (za-zb)^2)

What's the best way to do this with NumPy, or with Python in general? I have:

import numpy
a = numpy.array((xa ,ya, za))
b = numpy.array((xb, yb, zb))

To be clear, your 3D coords of points are actually 1D arrays ;-) — smci, Mar 19 '21 at 21:12

score 1068 · Accepted Answer · edited Oct 09 '20 at 21:39

1068

Use numpy.linalg.norm:

dist = numpy.linalg.norm(a-b)

You can find the theory behind this in Introduction to Data Mining

This works because the Euclidean distance is the l2 norm, and the default value of the ord parameter in numpy.linalg.norm is 2.

edited Oct 09 '20 at 21:39

yatu

75,195
11
47
89

answered Sep 09 '09 at 20:12

u0b34a0f6ae

42,509
13
86
97

14

The linalg.norm docs can be found here: http://docs.scipy.org/doc/numpy/reference/generated/numpy.linalg.norm.html My only real comment was sort of pointing out the connection between a norm (in this case the Frobenius norm/2-norm which is the default for norm function) and a metric (in this case Euclidean distance). – Mark Lavin Sep 09 '09 at 20:27
7

If OP wanted to calculate the distance between an array of coordinates it is also possible to use [scipy.spatial.distance.cdist](https://docs.scipy.org/doc/scipy/reference/generated/generated/scipy.spatial.distance.cdist.html). – mnky9800n May 02 '17 at 09:47
2

my question is: why use this in opposite of this?https://stackoverflow.com/a/21986532/189411 from scipy.spatial import distance a = (1,2,3) b = (4,5,6) dst = distance.euclidean(a,b) – Domenico Monaco Sep 22 '17 at 08:19
3

updated link to SciPy's cdist function: https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.cdist.html – Steven C. Howell Mar 07 '19 at 13:37
there are even more faster methods than numpy.linalg.norm: https://semantive.com/blog/high-performance-computation-in-python-numpy-2/ – Muhammad Ashfaq Apr 26 '20 at 12:22
Sometimes it is giving NaN values in the column – Avinash Jul 04 '20 at 16:58
You should note it doesn't find the distance, it returns a numpy array containing the distance. How do you get the number that is the distance? – NoBugs Oct 17 '20 at 22:13

score 199 · Answer 2 · edited Jun 26 '18 at 20:49

199

There's a function for that in SciPy. It's called Euclidean.

Example:

from scipy.spatial import distance
a = (1, 2, 3)
b = (4, 5, 6)
dst = distance.euclidean(a, b)

edited Jun 26 '18 at 20:49

Peter Mortensen

28,342
21
95
123

answered Feb 24 '14 at 11:32

Avision

2,854
1
15
20

63

If you look for efficiency it is better to use the numpy function. The scipy distance is twice as slow as numpy.linalg.norm(a-b) (and numpy.sqrt(numpy.sum((a-b)**2))). On my machine I get 19.7 µs with scipy (v0.15.1) and 8.9 µs with numpy (v1.9.2). Not a relevant difference in many cases but if in loop may become more significant. From a quick look at the scipy code it seems to be slower because it validates the array before computing the distance. – Algold Jul 22 '15 at 10:29
@MikePalmice yes, scipy functions are fully compatible with numpy. But take a look at what aigold suggested here (which also works on numpy array, of course) – Avision Jan 12 '18 at 08:48
@Avision not sure if it will work for me since my matrices have different numbers of rows; trying to subtract them to get one matrix doesn't work – 3pitt Jan 15 '18 at 01:29
@MikePalmice what exactly are you trying to compute with these two matrices? what is the expected input/output? – Avision Jan 16 '18 at 14:07
ty for following up. There's a description here: https://stats.stackexchange.com/questions/322620/analysis-or-comparison-of-euclidean-distance-matrix/322653 . I have 2 tables of 'operations'; each has a 'code' label, but the two sets of labels are totally different. my goal is to find the best or closest code from the second table corresponding to a fixed code in the first (I know what the answer should be from manual inspection, but want to scale up to hundreds of tables later). So the first subset is fixed; I calculate avg euclid dist bw this and all code subsets of the 2nd, then sort – 3pitt Jan 16 '18 at 17:30
Only on 1-dimensional array tho – Daniel Braun Aug 19 '18 at 15:03

Nico Schlömer · Answer 3 · 2021-02-17T11:01:22.973

For anyone interested in computing multiple distances at once, I've done a little comparison using perfplot (a small project of mine).

The first advice is to organize your data such that the arrays have dimension (3, n) (and are C-contiguous obviously). If adding happens in the contiguous first dimension, things are faster, and it doesn't matter too much if you use sqrt-sum with axis=0, linalg.norm with axis=0, or

a_min_b = a - b
numpy.sqrt(numpy.einsum('ij,ij->j', a_min_b, a_min_b))

which is, by a slight margin, the fastest variant. (That actually holds true for just one row as well.)

The variants where you sum up over the second axis, axis=1, are all substantially slower.

Code to reproduce the plot:

import numpy
import perfplot
from scipy.spatial import distance


def linalg_norm(data):
    a, b = data[0]
    return numpy.linalg.norm(a - b, axis=1)


def linalg_norm_T(data):
    a, b = data[1]
    return numpy.linalg.norm(a - b, axis=0)


def sqrt_sum(data):
    a, b = data[0]
    return numpy.sqrt(numpy.sum((a - b) ** 2, axis=1))


def sqrt_sum_T(data):
    a, b = data[1]
    return numpy.sqrt(numpy.sum((a - b) ** 2, axis=0))


def scipy_distance(data):
    a, b = data[0]
    return list(map(distance.euclidean, a, b))


def sqrt_einsum(data):
    a, b = data[0]
    a_min_b = a - b
    return numpy.sqrt(numpy.einsum("ij,ij->i", a_min_b, a_min_b))


def sqrt_einsum_T(data):
    a, b = data[1]
    a_min_b = a - b
    return numpy.sqrt(numpy.einsum("ij,ij->j", a_min_b, a_min_b))


def setup(n):
    a = numpy.random.rand(n, 3)
    b = numpy.random.rand(n, 3)
    out0 = numpy.array([a, b])
    out1 = numpy.array([a.T, b.T])
    return out0, out1


perfplot.save(
    "norm.png",
    setup=setup,
    n_range=[2 ** k for k in range(22)],
    kernels=[
        linalg_norm,
        linalg_norm_T,
        scipy_distance,
        sqrt_sum,
        sqrt_sum_T,
        sqrt_einsum,
        sqrt_einsum_T,
    ],
    xlabel="len(x), len(y)",
)

Thank you. I learnt something new today! For single dimension array, the string will be `i,i->` — Tirtha R, Dec 17 '18 at 19:26
itd be evern more cool if there was a comparision of memory consumptions — dragonLOLz, Feb 17 '19 at 15:18
I would like to use your code but I am struggling with understanding how the data is supposed to be organized. Can you give an example? How does `data` have to look like? — Johannes Wiesner, Sep 18 '19 at 10:23
Really neat project and findings. I've been doing some half-a***ed plots of the same nature, so I think I'll switch to your project and contribute the differences, if you like them. — Mad Physicist, Mar 17 '20 at 14:19

kfsone · Answer 4 · 2018-12-10T04:55:16.470

I want to expound on the simple answer with various performance notes. np.linalg.norm will do perhaps more than you need:

dist = numpy.linalg.norm(a-b)

Firstly - this function is designed to work over a list and return all of the values, e.g. to compare the distance from pA to the set of points sP:

sP = set(points)
pA = point
distances = np.linalg.norm(sP - pA, ord=2, axis=1.)  # 'distances' is a list

Remember several things:

Python function calls are expensive.
[Regular] Python doesn't cache name lookups.

So

def distance(pointA, pointB):
    dist = np.linalg.norm(pointA - pointB)
    return dist

isn't as innocent as it looks.

>>> dis.dis(distance)
  2           0 LOAD_GLOBAL              0 (np)
              2 LOAD_ATTR                1 (linalg)
              4 LOAD_ATTR                2 (norm)
              6 LOAD_FAST                0 (pointA)
              8 LOAD_FAST                1 (pointB)
             10 BINARY_SUBTRACT
             12 CALL_FUNCTION            1
             14 STORE_FAST               2 (dist)

  3          16 LOAD_FAST                2 (dist)
             18 RETURN_VALUE

Firstly - every time we call it, we have to do a global lookup for "np", a scoped lookup for "linalg" and a scoped lookup for "norm", and the overhead of merely calling the function can equate to dozens of python instructions.

Lastly, we wasted two operations on to store the result and reload it for return...

First pass at improvement: make the lookup faster, skip the store

def distance(pointA, pointB, _norm=np.linalg.norm):
    return _norm(pointA - pointB)

We get the far more streamlined:

>>> dis.dis(distance)
  2           0 LOAD_FAST                2 (_norm)
              2 LOAD_FAST                0 (pointA)
              4 LOAD_FAST                1 (pointB)
              6 BINARY_SUBTRACT
              8 CALL_FUNCTION            1
             10 RETURN_VALUE

The function call overhead still amounts to some work, though. And you'll want to do benchmarks to determine whether you might be better doing the math yourself:

def distance(pointA, pointB):
    return (
        ((pointA.x - pointB.x) ** 2) +
        ((pointA.y - pointB.y) ** 2) +
        ((pointA.z - pointB.z) ** 2)
    ) ** 0.5  # fast sqrt

On some platforms, **0.5 is faster than math.sqrt. Your mileage may vary.

**** Advanced performance notes.

Why are you calculating distance? If the sole purpose is to display it,

 print("The target is %.2fm away" % (distance(a, b)))

move along. But if you're comparing distances, doing range checks, etc., I'd like to add some useful performance observations.

Let’s take two cases: sorting by distance or culling a list to items that meet a range constraint.

# Ultra naive implementations. Hold onto your hat.

def sort_things_by_distance(origin, things):
    return things.sort(key=lambda thing: distance(origin, thing))

def in_range(origin, range, things):
    things_in_range = []
    for thing in things:
        if distance(origin, thing) <= range:
            things_in_range.append(thing)

The first thing we need to remember is that we are using Pythagoras to calculate the distance (dist = sqrt(x^2 + y^2 + z^2)) so we're making a lot of sqrt calls. Math 101:

dist = root ( x^2 + y^2 + z^2 )
:.
dist^2 = x^2 + y^2 + z^2
and
sq(N) < sq(M) iff M > N
and
sq(N) > sq(M) iff N > M
and
sq(N) = sq(M) iff N == M

In short: until we actually require the distance in a unit of X rather than X^2, we can eliminate the hardest part of the calculations.

# Still naive, but much faster.

def distance_sq(left, right):
    """ Returns the square of the distance between left and right. """
    return (
        ((left.x - right.x) ** 2) +
        ((left.y - right.y) ** 2) +
        ((left.z - right.z) ** 2)
    )

def sort_things_by_distance(origin, things):
    return things.sort(key=lambda thing: distance_sq(origin, thing))

def in_range(origin, range, things):
    things_in_range = []

    # Remember that sqrt(N)**2 == N, so if we square
    # range, we don't need to root the distances.
    range_sq = range**2

    for thing in things:
        if distance_sq(origin, thing) <= range_sq:
            things_in_range.append(thing)

Great, both functions no-longer do any expensive square roots. That'll be much faster. We can also improve in_range by converting it to a generator:

def in_range(origin, range, things):
    range_sq = range**2
    yield from (thing for thing in things
                if distance_sq(origin, thing) <= range_sq)

This especially has benefits if you are doing something like:

if any(in_range(origin, max_dist, things)):
    ...

But if the very next thing you are going to do requires a distance,

for nearby in in_range(origin, walking_distance, hotdog_stands):
    print("%s %.2fm" % (nearby.name, distance(origin, nearby)))

consider yielding tuples:

def in_range_with_dist_sq(origin, range, things):
    range_sq = range**2
    for thing in things:
        dist_sq = distance_sq(origin, thing)
        if dist_sq <= range_sq: yield (thing, dist_sq)

This can be especially useful if you might chain range checks ('find things that are near X and within Nm of Y', since you don't have to calculate the distance again).

But what about if we're searching a really large list of things and we anticipate a lot of them not being worth consideration?

There is actually a very simple optimization:

def in_range_all_the_things(origin, range, things):
    range_sq = range**2
    for thing in things:
        dist_sq = (origin.x - thing.x) ** 2
        if dist_sq <= range_sq:
            dist_sq += (origin.y - thing.y) ** 2
            if dist_sq <= range_sq:
                dist_sq += (origin.z - thing.z) ** 2
                if dist_sq <= range_sq:
                    yield thing

Whether this is useful will depend on the size of 'things'.

def in_range_all_the_things(origin, range, things):
    range_sq = range**2
    if len(things) >= 4096:
        for thing in things:
            dist_sq = (origin.x - thing.x) ** 2
            if dist_sq <= range_sq:
                dist_sq += (origin.y - thing.y) ** 2
                if dist_sq <= range_sq:
                    dist_sq += (origin.z - thing.z) ** 2
                    if dist_sq <= range_sq:
                        yield thing
    elif len(things) > 32:
        for things in things:
            dist_sq = (origin.x - thing.x) ** 2
            if dist_sq <= range_sq:
                dist_sq += (origin.y - thing.y) ** 2 + (origin.z - thing.z) ** 2
                if dist_sq <= range_sq:
                    yield thing
    else:
        ... just calculate distance and range-check it ...

And again, consider yielding the dist_sq. Our hotdog example then becomes:

# Chaining generators
info = in_range_with_dist_sq(origin, walking_distance, hotdog_stands)
info = (stand, dist_sq**0.5 for stand, dist_sq in info)
for stand, dist in info:
    print("%s %.2fm" % (stand, dist))

Why not add such an optimized function to numpy? An extension for pandas would also be great for a question like this https://stackoverflow.com/questions/47643952/calculate-distance-based-on-a-lookup-dataframe — Keith, Dec 05 '17 at 04:52
I edited your first mathematical approach to distance. You were using a `pointZ` that didn't exist. I think what you meant was two points in three dimensional space and I edited accordingly. If I was wrong, please let me know. — Bram Vanroy, Nov 14 '18 at 09:42

score 37 · Answer 5 · edited Jun 26 '18 at 20:43

37

Another instance of this problem solving method:

def dist(x,y):   
    return numpy.sqrt(numpy.sum((x-y)**2))

a = numpy.array((xa,ya,za))
b = numpy.array((xb,yb,zb))
dist_a_b = dist(a,b)

edited Jun 26 '18 at 20:43

Peter Mortensen

28,342
21
95
123

answered Sep 09 '09 at 19:56

Nathan Fellman

108,984
95
246
308

1

can you use numpy's sqrt and/or sum implementations? That should make it faster (?). – u0b34a0f6ae Sep 09 '09 at 20:03
1

I found this on the other side of the interwebs `norm = lambda x: N.sqrt(N.square(x).sum())` ; `norm(x-y)` – u0b34a0f6ae Sep 09 '09 at 20:09
2

scratch that. it had to be somewhere. here it is: `numpy.linalg.norm(x-y)` – u0b34a0f6ae Sep 09 '09 at 20:11

Xavier Guihot · Answer 6 · 2019-07-28T05:11:16.673

19

Starting Python 3.8, the math module directly provides the dist function, which returns the euclidean distance between two points (given as tuples or lists of coordinates):

from math import dist

dist((1, 2, 6), (-2, 3, 2)) # 5.0990195135927845

And if you're working with lists:

dist([1, 2, 6], [-2, 3, 2]) # 5.0990195135927845

edited Jul 28 '19 at 05:11

answered Jan 15 '19 at 10:13

Xavier Guihot

32,132
15
193
118

score 13 · Answer 7 · edited Jun 26 '18 at 20:47

13

It can be done like the following. I don't know how fast it is, but it's not using NumPy.

from math import sqrt
a = (1, 2, 3) # Data point 1
b = (4, 5, 6) # Data point 2
print sqrt(sum( (a - b)**2 for a, b in zip(a, b)))

edited Jun 26 '18 at 20:47

Peter Mortensen

28,342
21
95
123

answered Oct 31 '12 at 10:33

The Demz

6,336
4
33
42

Doing maths directly in python is not a good idea as python is very slow, specifically `for a, b in zip(a, b)`. But useful none the less. – Sigex May 05 '19 at 13:30
1

You don't even need to zip a and b. `sqrt(sum( (a - b)**2))` would do the trick. Nice answer by the way – Josmy Faure Jul 15 '20 at 06:35

score 12 · Answer 8 · edited Apr 19 '20 at 06:31

12

A nice one-liner:

dist = numpy.linalg.norm(a-b)

However, if speed is a concern I would recommend experimenting on your machine. I've found that using math library's sqrt with the ** operator for the square is much faster on my machine than the one-liner NumPy solution.

I ran my tests using this simple program:

#!/usr/bin/python
import math
import numpy
from random import uniform

def fastest_calc_dist(p1,p2):
    return math.sqrt((p2[0] - p1[0]) ** 2 +
                     (p2[1] - p1[1]) ** 2 +
                     (p2[2] - p1[2]) ** 2)

def math_calc_dist(p1,p2):
    return math.sqrt(math.pow((p2[0] - p1[0]), 2) +
                     math.pow((p2[1] - p1[1]), 2) +
                     math.pow((p2[2] - p1[2]), 2))

def numpy_calc_dist(p1,p2):
    return numpy.linalg.norm(numpy.array(p1)-numpy.array(p2))

TOTAL_LOCATIONS = 1000

p1 = dict()
p2 = dict()
for i in range(0, TOTAL_LOCATIONS):
    p1[i] = (uniform(0,1000),uniform(0,1000),uniform(0,1000))
    p2[i] = (uniform(0,1000),uniform(0,1000),uniform(0,1000))

total_dist = 0
for i in range(0, TOTAL_LOCATIONS):
    for j in range(0, TOTAL_LOCATIONS):
        dist = fastest_calc_dist(p1[i], p2[j]) #change this line for testing
        total_dist += dist

print total_dist

On my machine, math_calc_dist runs much faster than numpy_calc_dist: 1.5 seconds versus 23.5 seconds.

To get a measurable difference between fastest_calc_dist and math_calc_dist I had to up TOTAL_LOCATIONS to 6000. Then fastest_calc_dist takes ~50 seconds while math_calc_dist takes ~60 seconds.

You can also experiment with numpy.sqrt and numpy.square though both were slower than the math alternatives on my machine.

My tests were run with Python 2.6.6.

edited Apr 19 '20 at 06:31

Xavier Guihot

32,132
15
193
118

answered Nov 12 '10 at 21:40

user118662

2,573
2
15
8

51

You're badly misunderstanding how to use numpy... _Don't_ use loops or list comprehensions. If you're iterating through, and applying the function to _each_ item, then, yeah, the numpy functions will be slower. The whole point is to vectorize things. – Joe Kington Nov 13 '10 at 03:36
If I move the numpy.array call into the loop where I am creating the points I do get better results with numpy_calc_dist, but it is still 10x slower than fastest_calc_dist. If I have that many points and I need to find the distance between each pair I'm not sure what else I can do to advantage numpy. – user118662 Nov 13 '10 at 16:41
15

I realize this thread is old, but I just want to reinforce what Joe said. You are not using numpy correctly. What you are calculating is the sum of the distance from every point in p1 to every point in p2. The solution with numpy/scipy is over 70 times quicker on my machine. Make p1 and p2 into an array (even using a loop if you have them defined as dicts). Then you can get the total sum in one step, `scipy.spatial.distance.cdist(p1, p2).sum()`. That is it. – Scott B May 14 '11 at 00:14
3

Or use `numpy.linalg.norm(p1-p2).sum()` to get the sum between each point in p1 and the corresponding point in p2 (i.e. not every point in p1 to every point in p2). And if you do want every point in p1 to every point in p2 and don't want to use scipy as in my previous comment, then you can use np.apply_along_axis along with numpy.linalg.norm to still do it much, much quicker then your "fastest" solution. – Scott B May 14 '11 at 00:16
2

Previous versions of NumPy had very slow norm implementations. In current versions, there's no need for all this. – Fred Foo Oct 20 '13 at 10:04
besides, if your p is multidimensional like more than 100, numpy is even better. – 1a1a11a Oct 17 '16 at 19:58

score 10 · Answer 9 · edited Jun 26 '18 at 20:48

10

I find a 'dist' function in matplotlib.mlab, but I don't think it's handy enough.

I'm posting it here just for reference.

import numpy as np
import matplotlib as plt

a = np.array([1, 2, 3])
b = np.array([2, 3, 4])

# Distance between a and b
dis = plt.mlab.dist(a, b)

edited Jun 26 '18 at 20:48

Peter Mortensen

28,342
21
95
123

answered Jan 06 '14 at 04:46

Alan Wang

415
5
4

This is no longer applicable. (mpl 3.0) – Nico Schlömer Jul 31 '19 at 08:18

score 9 · Answer 10 · edited Oct 21 '20 at 05:40

9

You can just subtract the vectors and then innerproduct.

Following your example,

a = numpy.array((xa, ya, za))
b = numpy.array((xb, yb, zb))

tmp = a - b
sum_squared = numpy.dot(tmp.T, tmp)
result = numpy.sqrt(sum_squared)

edited Oct 21 '20 at 05:40

amin saffar

1,614
3
16
29

answered Sep 10 '11 at 19:08

PuercoPop

6,067
4
26
36

5

this will give me the square of the distance. you're missing a sqrt here. – Nathan Fellman Sep 10 '11 at 20:37

score 8 · Answer 11 · edited Mar 01 '19 at 16:18

8

I like np.dot (dot product):

a = numpy.array((xa,ya,za))
b = numpy.array((xb,yb,zb))

distance = (np.dot(a-b,a-b))**.5

edited Mar 01 '19 at 16:18

Xavier Guihot

32,132
15
193
118

answered Sep 02 '16 at 22:14

travelingbones

6,168
6
26
40

score 6 · Answer 12 · edited Nov 03 '17 at 11:18

6

Having a and b as you defined them, you can use also:

distance = np.sqrt(np.sum((a-b)**2))

edited Nov 03 '17 at 11:18

Mona Jalal

24,172
49
166
311

answered Dec 28 '16 at 15:48

Alejandro Sazo

765
14
27

score 6 · Answer 13 · answered Dec 05 '19 at 12:27

With Python 3.8, it's very easy.

https://docs.python.org/3/library/math.html#math.dist

math.dist(p, q)

Return the Euclidean distance between two points p and q, each given as a sequence (or iterable) of coordinates. The two points must have the same dimension.

Roughly equivalent to:

sqrt(sum((px - qx) ** 2.0 for px, qx in zip(p, q)))

score 5 · Answer 14 · answered May 17 '16 at 18:22

5

Here's some concise code for Euclidean distance in Python given two points represented as lists in Python.

def distance(v1,v2): 
    return sum([(x-y)**2 for (x,y) in zip(v1,v2)])**(0.5)

answered May 17 '16 at 18:22

Andy Lee

91
1
2
8

1

Numpy also accepts lists as inputs (no need to explicitly pass a numpy array) – Alejandro Sazo Apr 02 '17 at 19:07

ePi272314 · Answer 15 · 2019-10-15T23:03:12.287

Since Python 3.8

Since Python 3.8 the math module includes the function math.dist().
See here https://docs.python.org/3.8/library/math.html#math.dist.

math.dist(p1, p2)
Return the Euclidean distance between two points p1 and p2, each given as a sequence (or iterable) of coordinates.

import math
print( math.dist( (0,0),   (1,1)   )) # sqrt(2) -> 1.4142
print( math.dist( (0,0,0), (1,1,1) )) # sqrt(3) -> 1.7321

score 3 · Answer 16 · edited Jun 26 '18 at 20:50

3

Calculate the Euclidean distance for multidimensional space:

 import math

 x = [1, 2, 6] 
 y = [-2, 3, 2]

 dist = math.sqrt(sum([(xi-yi)**2 for xi,yi in zip(x, y)]))
 5.0990195135927845

edited Jun 26 '18 at 20:50

Peter Mortensen

28,342
21
95
123

answered Jun 14 '17 at 11:58

Gennady Nikitin

51
4

score 3 · Answer 17 · edited Feb 22 '18 at 16:43

3

import math

dist = math.hypot(math.hypot(xa-xb, ya-yb), za-zb)

edited Feb 22 '18 at 16:43

K.Dᴀᴠɪs

9,384
11
31
39

answered Feb 22 '18 at 16:41

Jonas De Schouwer

439
4
10

Python 3.8+ math.hypot() isn't limited to 2 dimensions. `dist = math.hypot( xa-xb, ya-yb, za-zb )` – Doyousketch2 Jan 17 '21 at 03:00

score 2 · Answer 18 · answered Feb 10 '18 at 06:09

2

import numpy as np
from scipy.spatial import distance
input_arr = np.array([[0,3,0],[2,0,0],[0,1,3],[0,1,2],[-1,0,1],[1,1,1]]) 
test_case = np.array([0,0,0])
dst=[]
for i in range(0,6):
    temp = distance.euclidean(test_case,input_arr[i])
    dst.append(temp)
print(dst)

answered Feb 10 '18 at 06:09

Ankur Nadda

21
1

2

What's the difference from [this answer](https://stackoverflow.com/a/21986532/5376789)? – xskxzr Feb 10 '18 at 06:36

Jonas De Schouwer · Answer 19 · 2018-04-19T18:51:50.880

2

You can easily use the formula

distance = np.sqrt(np.sum(np.square(a-b)))

which does actually nothing more than using Pythagoras' theorem to calculate the distance, by adding the squares of Δx, Δy and Δz and rooting the result.

edited Apr 19 '18 at 18:51

answered Apr 19 '18 at 17:50

Jonas De Schouwer

439
4
10

score 1 · Answer 20 · answered Jul 26 '18 at 15:11

Find difference of two matrices first. Then, apply element wise multiplication with numpy's multiply command. After then, find summation of the element wise multiplied new matrix. Finally, find square root of the summation.

def findEuclideanDistance(a, b):
    euclidean_distance = a - b
    euclidean_distance = np.sum(np.multiply(euclidean_distance, euclidean_distance))
    euclidean_distance = np.sqrt(euclidean_distance)
    return euclidean_distance

score 1 · Answer 21 · answered Mar 12 '20 at 02:49

import numpy as np
# any two python array as two points
a = [0, 0]
b = [3, 4]

You first change list to numpy array and do like this: print(np.linalg.norm(np.array(a) - np.array(b))). Second method directly from python list as: print(np.linalg.norm(np.subtract(a,b)))

RecursivelyIronic · Answer 22 · 2020-11-16T21:10:35.560

The other answers work for floating point numbers, but do not correctly compute the distance for integer dtypes which are subject to overflow and underflow. Note that even scipy.distance.euclidean has this issue:

>>> a1 = np.array([1], dtype='uint8')
>>> a2 = np.array([2], dtype='uint8')
>>> a1 - a2
array([255], dtype=uint8)
>>> np.linalg.norm(a1 - a2)
255.0
>>> from scipy.spatial import distance
>>> distance.euclidean(a1, a2)
255.0

This is common, since many image libraries represent an image as an ndarray with dtype="uint8". This means that if you have a greyscale image which consists of very dark grey pixels (say all the pixels have color #000001) and you're diffing it against black image (#000000), you can end up with x-y consisting of 255 in all cells, which registers as the two images being very far apart from each other. For unsigned integer types (e.g. uint8), you can safely compute the distance in numpy as:

np.linalg.norm(np.maximum(x, y) - np.minimum(x, y))

For signed integer types, you can cast to a float first:

np.linalg.norm(x.astype("float") - y.astype("float"))

For image data specifically, you can use opencv's norm method:

import cv2
cv2.norm(x, y, cv2.NORM_L2)

How can the Euclidean distance be calculated with NumPy?

22 Answers22

Since Python 3.8

Linked

Related