Log5- Learning FastBook along with Study Group

fastbook
myself
ML
Deep learning
Published

July 14, 2021

Just a quick recap In the first chapter, we learned Arthur Samuel mentioned the following:

Samuel said we need an automatic means of testing the effectiveness of any current weight assignment in terms of actual performance. In the case of his checkers program, the “actual performance” of a model would be how well it plays. And you could automatically test the performance of two models by setting them to play against each other, and seeing which one usually wins. Finally, he says we need a mechanism for altering the weight assignment so as to maximize the performance. For instance, we could look at the difference in weights between the winning model and the losing model, and adjust the weights a little further in the winning direction.

And the book mentions:

Neural networks - a function that is so flexible to be used to solve for any given problem SGD - a mechanism for automatically updating weight for every problem

The whole training loop of any ML task:

This week in the group we were looking more into Second half of Chapter 4 which covers the gradient descent, specifically stochastic gradient descent. As a whole the week 5 was pretty code intensive and let’s looks the whole things covered this week section by section from middle of chapter 4.

Since last week covered we checked how to classify 3s and 7s with pixel similarity. I was trying to classify and do all the steps using the full MNIST dataset which can be found in this Kaggle notebook.

I am sharing rough notes of this section for me to remember what we covered in each section of Chapter 4:

Stochastic Gradient Descent

It consists of seven steps: - Initialize - Predict - Loss - Calculate gradient(weight assignment) - Step (repeat forever till epoch ends) - Stop

Calculating Gradients Stepping with a learning rate End to End SGD

The MNIST Loss Function

def linear1(xb):
  return xb@weight + bias

Then used mnist_loss() function which is calculation difference with loss and targets.

Sigmoid function

SGD and mini batches

DataLoader can take inputs and split inputs to small batrch to process things in GPU memory. WHole dataset need not always fit in memory.

Putting it together

-We are implementing gradient descent algorithm(7 steps here) - call init_params, define train_dl and valid_dl, calculate the gradients, calc_gradients. - Then train_epoch - batch_accuracy - validate_epoch

finally got 96% acuracy

Creating an optimizer

Got 97% accuracy

Using a non-linearity

In the book after this step got 98.2%

Going Deeper with Resnet18

dls = ImageDataLoaders.from_folder(path)
learn = cnn_learner(dls, resnet18, pretrained=False,
                    loss_func=F.cross_entropy, metrics=accuracy)
learn.fit_one_cycle(1, 0.1)

finally got 99.99% accuracy

After the lesson, I tried answering the questionnare at the end of chapter and I scored: 20/37. I just wanted to turn up this week also by writing this blogpost.

There are 2 types of people. Players and commentators. Players don’t stop playing to listen to the commentators. They just play their game. - Prashant Choubey