Going Deep

I’ve recently begun attempting to explore deep learning. A pretty good resource that I came across was the PyTorch quickstart.

The documentation is pretty decent and is relatively beginner friendly albeit a bit terse if you’re not used to ML. (if you aren’t, I strongly recommend understanding what “learning” and “loss” truly mean in the context of ML)

You should be able to comprehend the following statement before proceeding with deep learning:

We determine the loss by comparing our predicted output against the target output. Learning is the process of using an “optimizer” (such as GD) to minimize the loss. In good scenarios, we are able to determine a strong MLE such that our loss can actually be minimized.

Some background info: I only know that generic Neural Networks take the output of all the nodes in the previous layer, perform a computation on them (the activation/link function) for the next layer to consume as input. As you can probably tell, I just know a little bit about traditional ML.

Here’s the link to the PyTorch Quickstart!

A few questions that I had which needed some investigation.

Why are the dataloader batches in the shape of N, C, H, W?
This is because we’re generally going to be working with image data and the terminology for splitting the data in that context is Number of batches, Channels (rgb is 3 and grayscale is 1), Height, Width. This is a 4 dimensional or 4d tensor.
Why do we have to use the flatten operation?
Because we need the data to be in a certain dimension/configuration to be consumable by the next layer. Flattening lets us convert a 4d tensor into a 2d. (2d because that’s the dimensions expected by the next layer in the PyTorch Sequential NN).
What is a Sequential NN in PyTorch?
It’s essentially a container for you to define the different layers of your neural network model. The sequential part essentially means that it runs through one layer and then goes to the next.
Why do we use CrossEntropyLoss?
I’m not entirely certain as to why but from what I’ve found.
- Better for probabilities (classification) than continuous values (regression).
- Designed for multi-class classification tasks.
- Good for Neural Networks because it produces gradients that can be differentiated ( a necessary computation for finding the gradient descents)
- Penalizes based on attributes(?) so for the true label [1, 0], [0.8, 0.2] is better than [0.6, 0.4] even though they are the same distance away. L1/L2 cannot differentiate between the two while this can.
Why do we do a zero grad?
Once we take the optimizer.step(), we’ve taken a decision based on the accumulated errors on how to update the parameters of the model. We no longer need these errors and need to compute new errors and thus gradients anyway, so we reset all the gradients to 0.
What’s the model.eval for?
The model, by default, trains on the data and computes gradients. The model.eval() essentially sets the flag to test mode so that the state of the model isn’t changed and we’re only using it for testing on the data. (Don’t train on the test data!)

I’ve realised that PyTorch seems to be the underlying framework that’s being used in overarching frameworks such as FastAI. So, building up on PyTorch should ideally provide me a chance to learn the foundational structures of modelling and give me enough of an understanding to utilize more abstract frameworks while knowing the background flow of operations.

Over the course of going through PyTorch and implementing projects, I’m pretty sure I’ll hit roadblocks where I need to learn and relearn ML concepts too. But, that’s alright and serves the purpose all the more!