Contents

## How can the generalization gap be reduced?

Adapting the number of weight updates eliminates generalization gap. Hoffer et al. stated that the initial training phase with a high-learning rate enables the model to reach farther locations in the parameter space, which may be necessary to find wider local minima and better generalization.

**Would initial large learning rate improve generalization for shallow learning?**

Although a small initial learning rate allows for faster training and better test performance initially, the large learning rate achieves better generalization soon after the learning rate is annealed.

**What is Generalisation gap?**

An important concept for understanding generalization is the generalization gap, i.e., the difference between a model’s performance on training data and its performance on unseen data drawn from the same distribution.

### What is Ghost batch normalization?

Neofytos Dimitriou, Ognjen Arandjelovic. Batch normalization (BatchNorm) is an effective yet poorly understood technique for neural network optimization. It is often assumed that the degradation in BatchNorm performance to smaller batch sizes stems from it having to estimate layer statistics using smaller sample sizes.

**What is the effect of batch size?**

Batch size controls the accuracy of the estimate of the error gradient when training neural networks. Batch, Stochastic, and Minibatch gradient descent are the three main flavors of the learning algorithm. There is a tension between batch size and the speed and stability of the learning process.

**What is a good learning rate?**

A traditional default value for the learning rate is 0.1 or 0.01, and this may represent a good starting point on your problem.

## What is gap in machine learning?

It’s getting machine learning from the researcher’s laptop to production. That’s the real gap. It’s one thing to build a model; it’s another thing altogether to embody that model in an application and deploy it successfully in production.

**What is the difference between regularization and generalization?**

Generalization is low if there is large gap between training and validation loss. Regularization is a method to avoid high variance and overfitting as well as to increase generalization. Regularization can help avoid high variance and overfitting.

**What happens when your learning rate is too high?**

A learning rate that is too large can cause the model to converge too quickly to a suboptimal solution, whereas a learning rate that is too small can cause the process to get stuck. The learning rate is perhaps the most important hyperparameter. If you have time to tune only one hyperparameter, tune the learning rate.

### What is the gap between validation and training loss?

This gap is referred to as the “generalization gap.” The plot of training loss decreases to a point of stability. The plot of validation loss decreases to a point of stability and has a small gap with the training loss.

**What does a large generalisation gap in deep learning mean?**

This corresponds to the difference in performance (hence, gap) of the model on testing versus training data — with testing performance usually worse than training performance. A model with a large generalisation gap is said to overfit training data.

**How does the learning rate affect the training process?**

The learning rate is a hyperparameter that controls how much to change the model in response to the estimated error each time the model weights are updated. Choosing the learning rate is challenging as a value too small may result in a long training process that could get

## How are learning curves calculated for train validation?

In this case, two plots are created, one for the learning curves of each metric, and each plot can show two learning curves, one for each of the train and validation datasets. Optimization Learning Curves: Learning curves calculated on the metric by which the parameters of the model are being optimized, e.g. loss.