Introduction to NumPy | Machine learning Tutorial

NumPy is the foundation of Python machine learning. NumPy provides useful operations on the data structures usually used in machine learning. vectors, matrices, and tensors.


≡ Creating a Vector Use NumPy



≡ Analysis


NumPy’s main data structure is the multidimensional array. To create a vector, we just create a one-dimensional array. Just like vectors, these arrays can be designed horizontally (i.e., rows) or vertically (i.e., columns).


≡ Creating a Matrix use NumPy



≡ Analysis


To build a matrix we can use a NumPy two-dimensional array.  the matrix includes three rows and two columns (a column of 1s and a column of 2s). NumPy really has a dedicated matrix data structure:

matrix_object = np.mat([[1, 2], [1, 2],  [1, 2]])
matrix([[1, 2],
       [1, 2],
       [1, 2]])

But, the matrix data structure is not approved for two reasons. First, arrays are the de facto standard data structure of NumPy. Then Second, the vast bulk of NumPy operations return arrays, not matrix objects.


≡ Creating a Sparse Matrix Use NumPy



≡ Analysis


A normal position in NumPy machine learning is having a huge amount of data.  but, most of the elements in the data are zeros. For example, imagine a matrix wherever the columns are every movie on Netflix, the rows are every Netflix user and the values
are how many times a user has viewed that particular movie.

This matrix would have tens of thousands of columns and millions of rows! Yet, since most users do not watch most movies, the vast majority of elements would be zero.

Meager matrices only store nonzero elements and allow all other values will be zero, leading to important computational savings. we designed a NumPy array with two nonzero values, then changed it into a sparse matrix. If we view the sparse matrix we can see that only the nonzero values are stored:

# View sparse matrix
print(matrix_sparse)
 (1, 1) 1
 (2, 0) 3

There are some samples of sparse matrices. Though, in compressed sparse row (CSR) matrices, (1, 1) and (2, 0) describe the (zero-indexed) indices of the nonzero values 1 and 3, severally.

For example, element 1 is in the second row and second column. We can recognize the use of sparse matrices if we build a much larger matrix with several more zero elements and then compare this larger matrix with our new sparse matrix:

# Create larger matrix
matrix_large = np.array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
                         [0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
                         [3, 0, 0, 0, 0, 0, 0, 0, 0, 0]])

# Create compressed sparse row (CSR) matrix
matrix_large_sparse = sparse.csr_matrix(matrix_large)

# View original sparse matrix
print(matrix_sparse)
            (1, 1) 1
            (2, 0) 3
# View larger sparse matrix
print(matrix_large_sparse)
                (1, 1) 1
                (2, 0) 3

As we can understand, although we joined many more zero elements in the larger matrix, its sparse design is exactly the same as our original sparse matrix. That is, the increase of zero elements did not modify the size of the sparse matrix.

As discussed, there are many different types of sparse matrices, such as compressed sparse columns, a list of lists, and a dictionary of keys.


≡ Selecting Elements use NumPy



≡ Analysis


Similar to most things in Python, NumPy arrays are zero-indexed, meaning that the index of the first element is 0, not 1. With that caveat, NumPy allows a wide variety of methods for selecting (i.e., indexing and slicing) elements or groups of elements in arrays.

# Select all elements of a vector
vector[:]
array([1, 2, 3, 4, 5, 6])

# Select everything up to and including the third element
vector[:3]
array([1, 2, 3])

# Select everything after the third element
vector[3:]
array([4, 5, 6])

# Select the last element
vector[-1]
6

# Select the first two rows and all columns of a matrix
matrix[:2,:]
array([[1, 2, 3],
 [4, 5, 6]])

# Select all rows and the second column
matrix[:,1:2]
array([[2],
 [5],
 [8]])

≡ Describing a Matrix in NumPy



≡ Analysis


This might seem basic (and it is).  but, time and again it will be valuable to check the shape and size of an array both for further calculations and just as a gut check after some operation.


≡ Applying Operations to Elements In NumPy



≡ Analysis


NumPy’s vectorize class converts a function into a function that can connect to all elements in an array or slice of an array. It’s worth noting that vectorize is truly a  for loop over the elements and does not enhance performance. Besides, NumPy
arrays enable us to perform operations between arrays even if their dimensions are not the same. For example, we can build a much simpler version of our explication using broadcasting:

# Add 100 to all elements
matrix + 100

array([[101, 102, 103],
     [104, 105, 106],
     [107, 108, 109]])

≡ Finding the Maximum and Minimum Values in NumPy



≡ Analysis


Usually, we require to identify the maximum and minimum value in an array or subset of an array. This can be accomplished with the max and min methods. Applying the axis parameter we can also apply the operation with a certain axis:

# Find maximum element in each column
np.max(matrix, axis=0)
array([7, 8, 9])

# Find maximum element in each row
np.max(matrix, axis=1)
array([3, 6, 9])

≡ Calculating the Average, Variance, and Standard Deviation using NumPy



≡ Analysis


Simply like with max and min, we can get descriptive statistics about the whole matrix or do calculations along a single axis:

# Find the mean value in each column
np.mean(matrix, axis=0)
array([ 4., 5., 6.])

≡ Reshaping Arrays in Numpy



≡ Analysis


reshape provides us to restructure an array so that we maintain similar data but it is organized as a different number of rows and columns. The only want is that the shape of the original and new matrix contains the same number of elements (i.e., the same size). We can understand the size of a matrix using size:

matrix.size
12

One helpful argument in reshaping is -1, which effectively means “as many as needed,” so reshape(-1, 1) means one row and as many columns as required:

matrix.reshape(1, -1)
array([[ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]])

Ultimately, if we provide one integer, reshape will return a 1D array of that length:

matrix.reshape(12)
array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])

≡ Transposing a Vector or Matrix in NumPy



≡ Analysis


Transposing is a common operation in linear algebra where the column and row indices of each element are swapped. One nuanced point that is typically viewed outside of a linear algebra class is that, technically, a vector cannot be transposed
because it is only a collection of values:

# Transpose vector
np.array([1, 2, 3, 4, 5, 6]).T
array([1, 2, 3, 4, 5, 6])

But, it is common to refer to transposing a vector as converting a row vector to a column vector (notice the second pair of brackets) or vice versa:

# Tranpose row vector
np.array([[1, 2, 3, 4, 5, 6]]).T
array([[1],
 [2],
 [3],
 [4],
 [5],
 [6]])

≡ Flattening a Matrix in NumPy



≡ Analysis


flatten is an easy way to transform a matrix into a one-dimensional array. Alternatively, we can apply to reshape to create a row vector:

matrix.reshape(1, -1)
array([[1, 2, 3, 4, 5, 6, 7, 8, 9]])

≡ Finding the Rank of a Matrix in NumPy



≡ Analysis


The rank of a matrix is the dimensions of the vector space spanned by its columns or rows. Finding the rank of a matrix is simple in NumPy thanks to matrix_rank.


≡ Calculating the Determinant in Numpy



≡ Analysis


It can seldom be useful to calculate the determinant of a matrix. NumPy makes this simple with det.


≡ Getting the Diagonal of a Matrix in NumPy



≡ Analysis


NumPy does receive the diagonal elements of a matrix simple with diagonal. It is also possible to get a diagonal off from the main diagonal by applying the offset parameter:

# Return diagonal one above the main diagonal
matrix.diagonal(offset=1)
array([2, 6])

# Return diagonal one below the main diagonal
matrix.diagonal(offset=-1)
array([2, 8])

≡ Calculating the Trace of a Matrix In numPy



≡ Analysis


The trace of a matrix is the sum of the diagonal elements and is often applied under the hood in  NumPy machine learning methods. Addressed a NumPy multidimensional array, we can calculate the trace using trace. We can also replace the diagonal of a matrix and calculate its sum:

# Return diagonal and sum elements
sum(matrix.diagonal())
14

≡ Finding Eigenvalues and Eigenvectors in Numpy



≡ Analysis


Eigenvectors are generally used in machine learning libraries. given a linear transformation represented by a matrix A eigenvectors are vectors that, if that transformation is applied, change only in scale. More formally:

Av = λv

where A is a square matrix, λ contains the eigenvalues and v contains the eigenvectors. In NumPy’s linear algebra toolset, eaglets us calculate the eigenvalues, and eigenvectors of any square matrix.


≡ Adding and Subtracting Matrices in NumPy



≡ Analysis


Alternatively, we can only use the + and – operators:

# Add two matrices
matrix_a + matrix_b
array([[ 2, 4, 2],
 [ 2, 4, 2],
 [ 2, 4, 10]])

≡ Inverting a Matrix in NumPy



≡ Analysis


The inverse of a square matrix, A, is a second matrix A–1, such that:

AA−1 = I

where I am the identity matrix. In NumPy, we can apply linalg.inv to calculate A–1 if it exists. To see this in action, we can multiply a matrix by its inverse and the result is the identity matrix:

# Multiply matrix and its inverse
matrix @ np.linalg.inv(matrix)
array([[ 1., 0.],
       [ 0., 1.]])

≡ Generating Random Values in NumPy



≡ Analysis


NumPy gives a wide variety of means to generate random numbers, many more than can be included here.

# Generate three random integers between 1 and 10
np.random.randint(0, 11, 3)
array([3, 7, 9])

Alternatively, we can generate numbers by drawing them from a number:

# Draw three numbers from a normal distribution with mean 0.0
# and standard deviation of 1.0
np.random.normal(0.0, 1.0, 3)
array([-1.42232584, 1.52006949, -0.29139398])
# Draw three numbers from a logistic distribution with mean 0.0 and scale of 1.0
np.random.logistic(0.0, 1.0, 3)
array([-0.98118713, -0.08939902, 1.46416405])
# Draw three numbers greater than or equal to 1.0 and less than 2.0
np.random.uniform(1.0, 2.0, 3)
array([ 1.47997717, 1.3927848 , 1.83607876])

Ultimately, it can seldom be useful to return the same random numbers multiple times to get predictable, repeatable results. We can do this by setting the “seed” (an integer) of the pseudorandom generator. Random processes with the same seed will always give the same output.

More Machine Learning Tutorial 

Visit My Blog 

 739 total views,  1 views today