In Python, we can use the numpy.where() function to select elements from a numpy array, based on a condition.
Not only that, but we can perform some operations on those elements if the condition is met.
Let’s see how we can use this function, using some illustrative examples!
Python syntax numpy.where() This function accepts an array similar to numpy (for example, a NumPy array of integers/Booleans). Returns a new numpy array, after filtering based on a condition, which is a
numpy array of Boolean values
. For example, condition can take the value of array([[True, True, True]]), which is a
Boolean array similar
to
numpy.
(
By default, NumPy only supports numeric values, but we can also convert them to bool)
For example, if condition is array([[True, True, False]]), and our array is a = ndarray([[1, 2, 3]]
), when applying a condition to array (a[:, condition]), we will obtain the array ndarray([[1 2]]).
import numpy as np a = np.arange(10) print(a[a <= 2]) # It will only capture elements <= 2 and ignore others
Output matrix
([0 1 2])
NOTE: The same condition condition can also be represented as a <= 2. This is the recommended format for the condition matrix, as it is very tedious to write it as a Boolean array
. But what if we want to preserve the dimension of the result and not lose elements of our original matrix? We can use numpy.where() for this.
numpy.where(condition [, x, y])
We have two more parameters x and y. What are those?
Basically, what this says is that if the condition is valid for some element in our array, the new
array will choose elements of x. Otherwise, if false, elements of y will be
taken.
With that, our final output matrix will be an array with elements of x where condition = True, and elements of and always condition = False.
Note that although x and y are optional, if you specify x, you MUST specify y. This is because, in this case, the shape of the output matrix must be the same as the input matrix.
NOTE: The same logic also applies to individual and multidimensional arrays. In both cases, we filter based on the condition. Also remember that the forms of x, y, and condition are transmitted together.
Now, let’s look at some examples, to understand this function correctly.
Using Python
numpy.where()
Suppose we want to take only positive elements from a numpy array and set all negative elements to 0, let’s write the code using numpy.where().
1. Replace Elements with numpy.where
() We’ll use a random 2-dimensional array here, and only show the positive elements. import numpy as np # Random initialization of a (2D array) a = np.random.randn(2, 3) print(a) # b will be all elements of a as long as the condition is true (i.e.
only positive elements
) # Otherwise, set it to 0 b = np.where(a > 0, a, 0) print(b)
Possible output
[[-1.06455975 0.94589166 -1.94987123] [-1.72083344 -0.69813711 1.05448464]] [[0.94589166 0. ] [0. 0. 1.05448464]]
As you can see, now only the positive elements are preserved!
2. Using numpy.where
() with only one condition There may be some confusion regarding the previous code, as
some of you may think that the most intuitive way would be to simply write the condition like this:
import random import numpy as np a = np.random.randn(2, 3) b = np.where(a > 0) print(b)
If you now try to run the above code, With this change, you will get output like this:
(array([0, 1]), array([2, 1]))
If you look closely, b is now a tuple of numb matrices. And each matrix is the location of a positive element. What does this mean?
As long as we provide only one condition, this function is actually equivalent to np.asarray.nonzero().
In our example, np.asarray(a > 0) will return a Boolean array after applying the condition, and np.nonzero(arr_like) will return indexes for nonzero elements of arr_like. (See this link)
So, now we’ll look at a simpler example, which shows us how flexible we can be with numpy!
import numpy as np a = np.arange(10) b = np.where(a < 5, a, a * 10) print(a) print(b)
Ouptut
[0 1 2 3 4 5 6 7 8 9] [ 0 1 2 3 4 50 60 70 80 90] Here, the
condition is a < 5, which will be the numpy-like array [True True True True False False False False], x is the matrix a, e y is the matrix a * 10. Therefore, we choose between a single if a < 5, and a * 10, if a > 5.
So, this transforms all the elements > = 5, multiplying with 10. This is what we get in fact!
Broadcast with numpy.where
()
If we provide all the condition matrices, x and y, numpy will transmit them together. import numpy
as np a = np.arange(12).reshape(3, 4) b = np.arange(4).reshape(1, 4) print(a) print(b) # Emissions (a < 5, a and b * 10) # shape (3, 4), (3, 4) and (1, 4) c = np.where(a < 5, a, b * 10) print(c)
Output
[[ 0 1 2 3] [ 4 5 6 7] [ 8 9 10 11]] [[0 1 2 3]] [[ 0 1 2 3] [ 4 10 20 30] [ 0 10 20 30]] Again, here, the
output is selected based on the condition, so all elements, but here, b is transmitted to the form of a. (One of its dimensions has only one element, so there will be no errors during transmission
) Then, b will now become [[0 1 2 3] [0 1 2 3]
[0 1 2 3]], and now, we can select elements even from this transmitted matrix. So the form of the
output is the same as the form of a
.
Conclusion
In this article, we learned how we can use Python’s numpy.where() function to select arrays based on another array of conditions.
References
- SciPy documentation in Python numpy.where() function