NumPy is the foundation of Python machine learning. NumPy provides useful operations on the data structures usually used in machine learning. vectors, matrices, and tensors.

**≡ Creating a Vector Use NumPy**

**≡ Analysis**

NumPy’s main data structure is the multidimensional array. To create a vector, we just create a one-dimensional array. Just like vectors, these arrays can be designed horizontally (i.e., rows) or vertically (i.e., columns).

**≡ Creating a Matrix use NumPy**

**≡ Analysis**

To build a matrix we can use a NumPy two-dimensional array. the matrix includes three rows and two columns (a column of 1s and a column of 2s). NumPy really has a dedicated matrix data structure:

matrix_object = np.mat([[1, 2], [1, 2], [1, 2]]) matrix([[1, 2], [1, 2], [1, 2]])

But, the matrix data structure is not approved for two reasons. First, arrays are the de facto standard data structure of NumPy. Then Second, the vast bulk of NumPy operations return arrays, not matrix objects.

**≡ Creating a Sparse Matrix Use NumPy**

**≡ Analysis**

A normal position in NumPy machine learning is having a huge amount of data. but, most of the elements in the data are zeros. For example, imagine a matrix wherever the columns are every movie on Netflix, the rows are every Netflix user and the values

are how many times a user has viewed that particular movie.

This matrix would have tens of thousands of columns and millions of rows! Yet, since most users do not watch most movies, the vast majority of elements would be zero.

Meager matrices only store nonzero elements and allow all other values will be zero, leading to important computational savings. we designed a NumPy array with two nonzero values, then changed it into a sparse matrix. If we view the sparse matrix we can see that only the nonzero values are stored:

# View sparse matrix print(matrix_sparse) (1, 1) 1 (2, 0) 3

There are some samples of sparse matrices. Though, in compressed sparse row (CSR) matrices, (1, 1) and (2, 0) describe the (zero-indexed) indices of the nonzero values 1 and 3, severally.

For example, element 1 is in the second row and second column. We can recognize the use of sparse matrices if we build a much larger matrix with several more zero elements and then compare this larger matrix with our new sparse matrix:

# Create larger matrix matrix_large = np.array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 1, 0, 0, 0, 0, 0, 0, 0, 0], [3, 0, 0, 0, 0, 0, 0, 0, 0, 0]]) # Create compressed sparse row (CSR) matrix matrix_large_sparse = sparse.csr_matrix(matrix_large) # View original sparse matrix print(matrix_sparse) (1, 1) 1 (2, 0) 3 # View larger sparse matrix print(matrix_large_sparse) (1, 1) 1 (2, 0) 3

As we can understand, although we joined many more zero elements in the larger matrix, its sparse design is exactly the same as our original sparse matrix. That is, the increase of zero elements did not modify the size of the sparse matrix.

As discussed, there are many different types of sparse matrices, such as compressed sparse columns, a list of lists, and a dictionary of keys.

**≡ Selecting Elements use NumPy**

**≡ Analysis**

Similar to most things in Python, NumPy arrays are zero-indexed, meaning that the index of the first element is 0, not 1. With that caveat, NumPy allows a wide variety of methods for selecting (i.e., indexing and slicing) elements or groups of elements in arrays.

# Select all elements of a vector vector[:] array([1, 2, 3, 4, 5, 6]) # Select everything up to and including the third element vector[:3] array([1, 2, 3]) # Select everything after the third element vector[3:] array([4, 5, 6]) # Select the last element vector[-1] 6 # Select the first two rows and all columns of a matrix matrix[:2,:] array([[1, 2, 3], [4, 5, 6]]) # Select all rows and the second column matrix[:,1:2] array([[2], [5], [8]])

**≡ Describing a Matrix in NumPy**

**≡ Analysis**

This might seem basic (and it is). but, time and again it will be valuable to check the shape and size of an array both for further calculations and just as a gut check after some operation.

**≡ Applying Operations to Elements In NumPy**

**≡ Analysis**

NumPy’s vectorize class converts a function into a function that can connect to all elements in an array or slice of an array. It’s worth noting that vectorize is truly a for loop over the elements and does not enhance performance. Besides, NumPy

arrays enable us to perform operations between arrays even if their dimensions are not the same. For example, we can build a much simpler version of our explication using broadcasting:

# Add 100 to all elements matrix + 100 array([[101, 102, 103], [104, 105, 106], [107, 108, 109]])

**≡ Finding the Maximum and Minimum Values in ****NumPy**

**≡ Analysis**

Usually, we require to identify the maximum and minimum value in an array or subset of an array. This can be accomplished with the max and min methods. Applying the axis parameter we can also apply the operation with a certain axis:

# Find maximum element in each column np.max(matrix, axis=0) array([7, 8, 9]) # Find maximum element in each row np.max(matrix, axis=1) array([3, 6, 9])

**≡ Calculating the Average, Variance, and Standard Deviation using NumPy**

**≡ Analysis**

Simply like with max and min, we can get descriptive statistics about the whole matrix or do calculations along a single axis:

# Find the mean value in each column np.mean(matrix, axis=0) array([ 4., 5., 6.])

**≡ Reshaping Arrays in Numpy**

**≡ Analysis**

reshape provides us to restructure an array so that we maintain similar data but it is organized as a different number of rows and columns. The only want is that the shape of the original and new matrix contains the same number of elements (i.e., the same size). We can understand the size of a matrix using size:

matrix.size 12

One helpful argument in reshaping is -1, which effectively means “as many as needed,” so reshape(-1, 1) means one row and as many columns as required:

matrix.reshape(1, -1) array([[ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]])

Ultimately, if we provide one integer, reshape will return a 1D array of that length:

matrix.reshape(12) array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])

**≡ Transposing a Vector or Matrix in ****NumPy**

**≡ Analysis**

Transposing is a common operation in linear algebra where the column and row indices of each element are swapped. One nuanced point that is typically viewed outside of a linear algebra class is that, technically, a vector cannot be transposed

because it is only a collection of values:

# Transpose vector np.array([1, 2, 3, 4, 5, 6]).T array([1, 2, 3, 4, 5, 6])

But, it is common to refer to transposing a vector as converting a row vector to a column vector (notice the second pair of brackets) or vice versa:

# Tranpose row vector np.array([[1, 2, 3, 4, 5, 6]]).T array([[1], [2], [3], [4], [5], [6]])

**≡ Flattening a Matrix in NumPy**

**≡ Analysis**

flatten is an easy way to transform a matrix into a one-dimensional array. Alternatively, we can apply to reshape to create a row vector:

matrix.reshape(1, -1) array([[1, 2, 3, 4, 5, 6, 7, 8, 9]])

**≡ Finding the Rank of a Matrix in NumPy**

**≡ Analysis**

The rank of a matrix is the dimensions of the vector space spanned by its columns or rows. Finding the rank of a matrix is simple in NumPy thanks to matrix_rank.

**≡ Calculating the Determinant in Numpy**

**≡ Analysis**

It can seldom be useful to calculate the determinant of a matrix. NumPy makes this simple with det.

**≡ Getting the Diagonal of a Matrix in NumPy**

**≡ Analysis**

NumPy does receive the diagonal elements of a matrix simple with diagonal. It is also possible to get a diagonal off from the main diagonal by applying the offset parameter:

# Return diagonal one above the main diagonal matrix.diagonal(offset=1) array([2, 6]) # Return diagonal one below the main diagonal matrix.diagonal(offset=-1) array([2, 8])

**≡ Calculating the Trace of a Matrix In numPy**

**≡ Analysis**

The trace of a matrix is the sum of the diagonal elements and is often applied under the hood in NumPy machine learning methods. Addressed a NumPy multidimensional array, we can calculate the trace using trace. We can also replace the diagonal of a matrix and calculate its sum:

# Return diagonal and sum elements sum(matrix.diagonal()) 14

**≡ Finding Eigenvalues and Eigenvectors in Numpy**

**≡ Analysis**

Eigenvectors are generally used in machine learning libraries. given a linear transformation represented by a matrix A eigenvectors are vectors that, if that transformation is applied, change only in scale. More formally:

Av = λv

where A is a square matrix, λ contains the eigenvalues and v contains the eigenvectors. In NumPy’s linear algebra toolset, eaglets us calculate the eigenvalues, and eigenvectors of any square matrix.

**≡ Adding and Subtracting Matrices in NumPy**

**≡ Analysis**

Alternatively, we can only use the + and – operators:

# Add two matrices matrix_a + matrix_b array([[ 2, 4, 2], [ 2, 4, 2], [ 2, 4, 10]])

**≡ Inverting a Matrix in NumPy**

**≡ Analysis**

The inverse of a square matrix, A, is a second matrix A–1, such that:

AA^{−1} = I

where I am the identity matrix. In NumPy, we can apply linalg.inv to calculate A–1 if it exists. To see this in action, we can multiply a matrix by its inverse and the result is the identity matrix:

# Multiply matrix and its inverse matrix @ np.linalg.inv(matrix) array([[ 1., 0.], [ 0., 1.]])

**≡ Generating Random Values in NumPy**

**≡ Analysis**

NumPy gives a wide variety of means to generate random numbers, many more than can be included here.

# Generate three random integers between 1 and 10 np.random.randint(0, 11, 3) array([3, 7, 9])

Alternatively, we can generate numbers by drawing them from a number:

# Draw three numbers from a normal distribution with mean 0.0 # and standard deviation of 1.0 np.random.normal(0.0, 1.0, 3) array([-1.42232584, 1.52006949, -0.29139398]) # Draw three numbers from a logistic distribution with mean 0.0 and scale of 1.0 np.random.logistic(0.0, 1.0, 3) array([-0.98118713, -0.08939902, 1.46416405]) # Draw three numbers greater than or equal to 1.0 and less than 2.0 np.random.uniform(1.0, 2.0, 3) array([ 1.47997717, 1.3927848 , 1.83607876])

Ultimately, it can seldom be useful to return the same random numbers multiple times to get predictable, repeatable results. We can do this by setting the “seed” (an integer) of the pseudorandom generator. Random processes with the same seed will always give the same output.

More Machine Learning Tutorial

Visit My Blog

799 total views, 1 views today