I have been curious about this for some time. I can live with that, but it always bites me when enough care is not taken, so I decide to post it here. Suppose the following example (Numpy version = 1.8.2):
a = array([[0, 1], [2, 3]])
print shape(a[0:0, :]) # (0, 2)
print shape(a[0:1, :]) # (1, 2)
print shape(a[0:2, :]) # (2, 2)
print shape(a[0:100, :]) # (2, 2)
print shape(a[0]) # (2, )
print shape(a[0, :]) # (2, )
print shape(a[:, 0]) # (2, )
I don't know how other people feel, but the result feels inconsistent to me. The last line is a column vector while the second to last line is a row vector, they should have different dimension -- in linear algebra they do! (Line 5 is another surprise, but I will neglect it for now). Consider a second example:
solution = scipy.sparse.linalg.dsolve.linsolve.spsolve(A, b) # solution of dimension (n, )
analytic = reshape(f(x, y), (n, 1)) # analytic of dimension (n, 1)
error = solution - analytic
Now error is of dimension (n, n). Yes, in the second line I should use (n, ) instead of (n, 1), but why? I used to use MATLAB a lot, where one-d vector has dimension (n, 1), linspace/arange returns array of dimension (n, 1), and there never exists (n, ). But in Numpy (n, 1) and (n, ) coexist, and there are many functions for dimension handling alone: atleast, newaxis and different uses of reshape, but to me those functions are more of confusion than help. If an array print like [1,2,3], then intuitively the dimension should be [1,3] instead of [3,], right? If Numpy does not have (n, ), I can only see a gain in clarity, not a loss in functionality.
So there must be some design reason behind this. I have been searching from time to time, without finding a clear answer or report. Could someone help clarifying this confusion or provide me some useful references? Your help is much appreciated.