The numpy array object¶
What are Numpy and numpy arrays¶
Python has built-in:
- containers: lists (costless insertion and append), dictionnaries (fast lookup)
- high-level number objects: integers, floating point
- extension package to Python for multidimensional arrays
- closer to hardware (efficiency)
- designed for scientific computation (convenience)
An array containing:
- discretized time of an experiment/simulation
- signal recorded by a measurement device
- pixels of an image
Why it is useful: Memory-efficient and fast container for numerical operations.
On the web: http://docs.scipy.org/
Interactive help:>>> help(np.array)array(object, dtype=None, copy=True, order=None, subok=False, ndmin=0)Create an array.Parameters----------object : array_like...Examples-------->>> np.array([1,2,3])array([1, 2, 3])...
Looking for something:>>> np.lookfor('create array')Search results for 'create array'---------------------------------numpy.array Create an array.numpy.memmap Create a memory-map to an array stored in a *binary* file on disk....>>> help(np.lookfor)...
2-D, 3-D, ...:
In practice, we rarely enter items one by one...
or by number of points:>>> importnumpyasnp>>> a=np.arange(10)# 0 .. n-1 (!)>>> aarray([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])>>> b=np.arange(1,9,2)# start, end (exlusive), step>>> barray([1, 3, 5, 7])>>> c=np.linspace(0,1,6)# start, end, num-points>>> carray([ 0. , 0.2, 0.4, 0.6, 0.8, 1. ])>>> d=np.linspace(0,1,5,endpoint=False)>>> darray([ 0. , 0.2, 0.4, 0.6, 0.8])
Common arrays:>>> a=np.ones((3,3))# reminder: (3, 3) is a tuple>>> aarray([[ 1., 1., 1.], [ 1., 1., 1.], [ 1., 1., 1.]])>>> b=np.zeros((2,2))>>> barray([[ 0., 0.], [ 0., 0.]])>>> c=np.eye(3)>>> carray([[ 1., 0., 0.], [ 0., 1., 0.], [ 0., 0., 1.]])>>> d=np.diag(np.array([1,2,3,4]))>>> darray([[1, 0, 0, 0], [0, 2, 0, 0], [0, 0, 3, 0], [0, 0, 0, 4])
: random numbers (Mersenne Twister PRNG):>>> a=np.random.rand(4)# uniform in [0, 1]>>> aarray([ 0.58597729, 0.86110455, 0.9401114 , 0.54264348])>>> b=np.random.randn(4)# Gaussian>>> barray([-2.56844807, 0.06798064, -0.36823781, 0.86966886])>>> np.random.seed(1234)# Setting the random seed
Exercise: Array creation
Create the following arrays (with correct data types):
Par on course: 3 statements for each
Exercise: Tiling for array creation
Skim through the documentation for , and use this function to construct the array:
Basic data types¶
You probably noted the and above. These are different data types:
Much of the time you don’t necessarily need to care, but remember they are there.
You can also choose which one you want:
The default data type is floating point:
There are also other types:
>>> d=np.array([1+2j,3+4j,5+6*1j])>>> d.dtypedtype('complex128')
>>> e=np.array([True,False,False,True])>>> e.dtypedtype('bool')
>>> f=np.array(['Bonjour','Hello','Hallo',])>>> f.dtypedtype('S7') # <--- strings containing max. 7 letters
|Much more: int32/int64...|
Now that we have our first data arrays, we are going to visualize them.
Matplotlib is a 2D plotting package. We can import its functions as below:
The recommended way of working is use , started in mode:
[source code, hires.png, pdf]
2D arrays (such as images)
[source code, hires.png, pdf]
For 3D visualization, we can use another package: Mayavi. A quick example: start with relaunching iPython with these options: ipython -pylab -wthread (or ipython –pylab=wx in IPython >= 0.10).
The mayavi/mlab window that opens is interactive : by clicking on the left mouse button you can rotate the image, zoom with the mouse wheel, etc.
For more information on Mayavi : http://github.enthought.com/mayavi/mayavi
Indexing and slicing¶
The items of an array can be accessed and assigned to the same way as other Python sequences (e.g. lists)
Indices begin at 0, like other Python sequences (and C/C++). In contrast, in Fortran or Matlab, indices begin at 1.
For multidimensional arrays, indexes are tuples of integers:
- In 2D, the first dimension corresponds to rows, the second to columns.
- for multidimensional , is interpreted by taking all elements in the unspecified dimensions.
Slicing Arrays, like other Python sequences can also be sliced:
Note that the last index is not included!:
All three slice components are not required: by default, is 0, is the last and is 1:
A small illustrated summary of Numpy indexing and slicing...
Copies and views¶
A slicing operation creates a view on the original array, which is just a way of accessing array data. Thus the original array is not copied in memory.
When modifying the view, the original array is modified as well:
This behavior can be surprising at first sight... but it allows to save both memory and time.
The transpose is a view
As a result, a matrix cannot be made symmetric in-place:
Worked example: Prime number sieve
Compute prime numbers in 0–99, with a sieve
Construct a shape (100,) boolean array , filled with True in the beginning:>>> is_prime=np.ones((100,),dtype=bool)
Cross out 0 and 1 which are not primes:>>> is_prime[:2]=0
For each integer starting from 2, cross out its higher multiples:>>> N_max=int(np.sqrt(len(is_prime)))>>> forjinrange(2,N_max):... is_prime[2*j::j]=False
Skim through , and print the prime numbers
- Move the above code into a script file named
- Run it to check it works
- Convert the simple sieve to the sieve of Eratosthenes:
- Skip which are already known to not be primes
- The first number to cross out is
Numpy arrays can be indexed with slices, but also with boolean or integer arrays (masks). This method is called fancy indexing. It creates copies not view.
Using boolean masks¶
Indexing with a mask can be very useful to assign a new value to a sub-array:
Indexing with an array of integers¶
Indexing can be done with an array of integers, where the same index is repeated several time:
New values can be assigned with this kind of indexing:
When a new array is created by indexing with an array of integers, the new array has the same shape than the array of integers:
We can even use fancy indexing and broadcasting at the same time:
Numerical operations on arrays¶
All arithmetic operates elementwise:
Array multiplication is not matrix multiplication:
For arrays: “” and “” for logical operations, not “” and “”.
‘Broadcast’? We’ll return to that later.
The sub-module implements basic linear algebra, such as eigenvalue decomposition, solving linear systems... However, it is not garantied to be compiled using efficient routines, and thus we recommend to use , as detailed in section Linear algebra operations: scipy.linalg
Generate arrays and
Sum by rows and by columns:
Same idea in higher dimensions:
Other reductions — works the same way (and take )
Statistics:>>> x=np.array([1,2,3,1])>>> y=np.array([[1,2,3],[5,6,1]])>>> x.mean()1.75>>> np.median(x)1.5>>> np.median(y,axis=-1)# last axisarray([ 2., 5.])>>> x.std()# full population standard dev.0.82915619758884995
Extrema:>>> x=np.array([1,3,2])>>> x.min()1>>> x.max()3>>> x.argmin()# index of minimum0>>> x.argmax()# index of maximum1
Can be used for array comparisons:>>> a=np.zeros((100,100))>>> np.any(a!=0)False>>> np.all(a==a)True>>> a=np.array([1,2,3,2])>>> b=np.array([2,2,3,2])>>> c=np.array([6,4,4,5])>>> ((a<=b)&(b<=c)).all()True>>> np.all([True,True,False])False>>> np.any([True,True,False])True
... and many more (best to learn as you go).
Example: data statistics
Data in describes the populations of hares and lynxes (and carrots) in northern Canada during 20 years.
We can first plot the data:
[source code, hires.png, pdf]
The mean populations over time:
The sample standard deviations:
Which species has the highest population each year?
Example: diffusion simulation using a random walk algorithm
What is the typical distance from the origin of a random walker after left or right jumps?
We randomly choose all the steps 1 or -1 of the walk
We build the walks by summing steps along the time
We get the mean in the axis of the stories
Plot the results:
[source code, hires.png, pdf]
The RMS distance grows as the square root of the time!
Basic operations on arrays (addition, etc.) are elementwise
This works on arrays of the same size.
Nevertheless, It’s also possible to do operations on arrays of different
sizes if Numpy can transform these arrays so that they all have
the same size: this conversion is called broadcasting.
The image below gives an example of broadcasting:
An useful trick:
We have already used broadcasting without knowing it!:
Broadcasting seems a bit magical, but it is actually quite natural to use it when we want to solve a problem whose output data is an array with more dimensions than input data.
Let’s construct an array of distances (in miles) between cities of Route 66: Chicago, Springfield, Saint-Louis, Tulsa, Oklahoma City, Amarillo, Santa Fe, Albuquerque, Flagstaff and Los Angeles.
Explicit variable names (no need of a comment to explain what is in the variable)
Style: spaces after commas, around , etc.
A certain number of rules for writing “beautiful” code (and, more importantly, using the same conventions as everybody else!) are given in the Style Guide for Python Code and the Docstring Conventions page (to manage help strings).
Except some rare cases, variable names and comments in English.
A lot of grid-based or network-based problems can also use broadcasting. For instance, if we want to compute the distance from the origin of points on a 10x10 grid, we can do:
Or in color:>>> x,y=np.arange(5),np.arange(5)>>> distance=np.sqrt(x**2+y[:,np.newaxis]**2)>>> distancearray([[ 0. , 1. , 2. , 3. , 4. ], [ 1. , 1.41421356, 2.23606798, 3.16227766, 4.12310563], [ 2. , 2.23606798, 2.82842712, 3.60555128, 4.47213595], [ 3. , 3.16227766, 3.60555128, 4.24264069, 5. ], [ 4. , 4.12310563, 4.47213595, 5. , 5.65685425]])
Array indexing refers to any use of the square brackets () to index array values. There are many options to indexing, which give numpy indexing great power, but with power comes some complexity and the potential for confusion. This section is just an overview of the various options and issues related to indexing. Aside from single element indexing, the details on most of these options are to be found in related sections.
Assignment vs referencing¶
Most of the following examples show the use of indexing when referencing data in an array. The examples work just as well when assigning to an array. See the section at the end for specific examples and explanations on how assignments work.
Single element indexing¶
Single element indexing for a 1-D array is what one expects. It work exactly like that for other standard Python sequences. It is 0-based, and accepts negative indices for indexing from the end of the array.
Unlike lists and tuples, numpy arrays support multidimensional indexing for multidimensional arrays. That means that it is not necessary to separate each dimension’s index into its own set of square brackets.
Note that if one indexes a multidimensional array with fewer indices than dimensions, one gets a subdimensional array. For example:
That is, each index specified selects the array corresponding to the rest of the dimensions selected. In the above example, choosing 0 means that the remaining dimension of length 5 is being left unspecified, and that what is returned is an array of that dimensionality and size. It must be noted that the returned array is not a copy of the original, but points to the same values in memory as does the original array. In this case, the 1-D array at the first position (0) is returned. So using a single index on the returned array, results in a single element being returned. That is:
So note that though the second case is more inefficient as a new temporary array is created after the first index that is subsequently indexed by 2.
Note to those used to IDL or Fortran memory order as it relates to indexing. NumPy uses C-order indexing. That means that the last index usually represents the most rapidly changing memory location, unlike Fortran or IDL, where the first index represents the most rapidly changing location in memory. This difference represents a great potential for confusion.
Other indexing options¶
It is possible to slice and stride arrays to extract arrays of the same number of dimensions, but of different sizes than the original. The slicing and striding works exactly the same way it does for lists and tuples except that they can be applied to multiple dimensions as well. A few examples illustrates best:
Note that slices of arrays do not copy the internal array data but also produce new views of the original data.
It is possible to index arrays with other arrays for the purposes of selecting lists of values out of arrays into new arrays. There are two different ways of accomplishing this. One uses one or more arrays of index values. The other involves giving a boolean array of the proper shape to indicate the values to be selected. Index arrays are a very powerful tool that allow one to avoid looping over individual elements in arrays and thus greatly improve performance.
It is possible to use special features to effectively increase the number of dimensions in an array through indexing so the resulting array aquires the shape needed for use in an expression or with a specific function.
NumPy arrays may be indexed with other arrays (or any other sequence- like object that can be converted to an array, such as lists, with the exception of tuples; see the end of this document for why this is). The use of index arrays ranges from simple, straightforward cases to complex, hard-to-understand cases. For all cases of index arrays, what is returned is a copy of the original data, not a view as one gets for slices.
Index arrays must be of integer type. Each value in the array indicates which value in the array to use in place of the index. To illustrate:
The index array consisting of the values 3, 3, 1 and 8 correspondingly create an array of length 4 (same as the index array) where each index is replaced by the value the index array has in the array being indexed.
Negative values are permitted and work as they do with single indices or slices:
It is an error to have index values out of bounds:
Generally speaking, what is returned when index arrays are used is an array with the same shape as the index array, but with the type and values of the array being indexed. As an example, we can use a multidimensional index array instead:
Indexing Multi-dimensional arrays¶
Things become more complex when multidimensional arrays are indexed, particularly with multidimensional index arrays. These tend to be more unusual uses, but they are permitted, and they are useful for some problems. We’ll start with the simplest multidimensional case (using the array y from the previous examples):
In this case, if the index arrays have a matching shape, and there is an index array for each dimension of the array being indexed, the resultant array has the same shape as the index arrays, and the values correspond to the index set for each position in the index arrays. In this example, the first index value is 0 for both index arrays, and thus the first value of the resultant array is y[0,0]. The next value is y[2,1], and the last is y[4,2].
If the index arrays do not have the same shape, there is an attempt to broadcast them to the same shape. If they cannot be broadcast to the same shape, an exception is raised:
The broadcasting mechanism permits index arrays to be combined with scalars for other indices. The effect is that the scalar value is used for all the corresponding values of the index arrays:
Jumping to the next level of complexity, it is possible to only partially index an array with index arrays. It takes a bit of thought to understand what happens in such cases. For example if we just use one index array with y:
What results is the construction of a new array where each value of the index array selects one row from the array being indexed and the resultant array has the resulting shape (number of index elements, size of row).
An example of where this may be useful is for a color lookup table where we want to map the values of an image into RGB triples for display. The lookup table could have a shape (nlookup, 3). Indexing such an array with an image with shape (ny, nx) with dtype=np.uint8 (or any integer type so long as values are with the bounds of the lookup table) will result in an array of shape (ny, nx, 3) where a triple of RGB values is associated with each pixel location.
In general, the shape of the resultant array will be the concatenation of the shape of the index array (or the shape that all the index arrays were broadcast to) with the shape of any unused dimensions (those not indexed) in the array being indexed.
Boolean or “mask” index arrays¶
Boolean arrays used as indices are treated in a different manner entirely than index arrays. Boolean arrays must be of the same shape as the initial dimensions of the array being indexed. In the most straightforward case, the boolean array has the same shape:
Unlike in the case of integer index arrays, in the boolean case, the result is a 1-D array containing all the elements in the indexed array corresponding to all the true elements in the boolean array. The elements in the indexed array are always iterated and returned in row-major (C-style) order. The result is also identical to . As with index arrays, what is returned is a copy of the data, not a view as one gets with slices.
The result will be multidimensional if y has more dimensions than b. For example:
Here the 4th and 5th rows are selected from the indexed array and combined to make a 2-D array.
In general, when the boolean array has fewer dimensions than the array being indexed, this is equivalent to y[b, ...], which means y is indexed by b followed by as many : as are needed to fill out the rank of y. Thus the shape of the result is one dimension containing the number of True elements of the boolean array, followed by the remaining dimensions of the array being indexed.
For example, using a 2-D boolean array of shape (2,3) with four True elements to select rows from a 3-D array of shape (2,3,5) results in a 2-D result of shape (4,5):
For further details, consult the numpy reference documentation on array indexing.
Combining index arrays with slices¶
Index arrays may be combined with slices. For example:
In effect, the slice is converted to an index array np.array([[1,2]]) (shape (1,2)) that is broadcast with the index array to produce a resultant array of shape (3,2).
Likewise, slicing can be combined with broadcasted boolean indices:
Assigning values to indexed arrays¶
As mentioned, one can select a subset of an array to assign to using a single index, slices, and index and mask arrays. The value being assigned to the indexed array must be shape consistent (the same shape or broadcastable to the shape the index produces). For example, it is permitted to assign a constant to a slice:
or an array of the right size:
Note that assignments may result in changes if assigning higher types to lower types (like floats to ints) or even exceptions (assigning complex to floats or ints):
Unlike some of the references (such as array and mask indices) assignments are always made to the original data in the array (indeed, nothing else would make sense!). Note though, that some actions may not work as one may naively expect. This particular example is often surprising to people:
Where people expect that the 1st location will be incremented by 3. In fact, it will only be incremented by 1. The reason is because a new array is extracted from the original (as a temporary) containing the values at 1, 1, 3, 1, then the value 1 is added to the temporary, and then the temporary is assigned back to the original array. Thus the value of the array at x+1 is assigned to x three times, rather than being incremented 3 times.
Dealing with variable numbers of indices within programs¶
The index syntax is very powerful but limiting when dealing with a variable number of indices. For example, if you want to write a function that can handle arguments with various numbers of dimensions without having to write special case code for each number of possible dimensions, how can that be done? If one supplies to the index a tuple, the tuple will be interpreted as a list of indices. For example (using the previous definition for the array z):
So one can use code to construct tuples of any number of indices and then use these within an index.
Slices can be specified within programs by using the slice() function in Python. For example:
Likewise, ellipsis can be specified by code by using the Ellipsis object:
For this reason it is possible to use the output from the np.nonzero() function directly as an index since it always returns a tuple of index arrays.
Because the special treatment of tuples, they are not automatically converted to an array as a list would be. As an example: