2

I'm a beginner to Python and I've recently learned about NumPy and its famous ndarrays. At first, after reading many people praising them (some references here, here, here), I thought:

"Well, if NumPy's arrays are so much better, and assuming I don't really care about having heterogeneous data types on the same list/array, why should I ever use Python's list?"

However, after a deeper research, I've found that using ndarrays also have negative sides (some references here and here). I've understood the basic pros and cons of using each of these data structures, but this all still seems very confusing to me. So, my question is: as a beginner in Python, when should I use NumPy's arrays and when should I use Python's lists? How can I, given a situation, evaluate which option is the best?

Some may be inclined to consider this post a duplicate - and there are indeed many "ndarrays vs lists" topics already. However, I've searched for a while and I didn't find a satisfying answer for my question. There are many people talking about the benefits of ndarrays and lists, but it's still not clear, specially for beginners like me, how to choose between them. Should I use NumPy arrays in my day-to-day coding and save lists for special situations? Or should I do the opposite? Thank you!

Note: since it might be relevant for the answers, I intend to use Python mostly for Machine Learning.

Alec
  • 6,521
  • 7
  • 23
  • 48
Talendar
  • 1,322
  • 10
  • 20
  • Lists are basic python structures, that get used in many roles. `numpy` is primarily a numeric tool - matrices, higher dimensional arrays. It's been extended to time series with `pandas`, and machine learning with `sklearn` and `tensorflow/keras`, – hpaulj May 25 '19 at 05:02

1 Answers1

3

Python lists are more bulky. They're basically arrays of pointers, which take up far more memory than numpy's ndarrays. As a result, for mathematical operations involving matrices and complex calculations, ndarrays are the better option. Because of this, most mathematical operations have been optimized for numpy and there are more mathematically useful functions for ndarrays.

Python lists are much more flexible, though. They can hold heterogeneous, arbitrary data, and appending/removing is very efficient. If you'd like to add and remove many different objects, Python lists are the way to go.

For the purpose of machine learning, ndarrays are definitely your best bet. Tensorflow and keras, the two most popular machine learning libraries, are more suited to numpy's memory-efficient arrays because they deal with large amounts of homogeneous data.

Alec
  • 6,521
  • 7
  • 23
  • 48