27th

we discussed about cross validation and k fold

In k-fold cross-validation, the dataset is divided into ‘k’ subsets. For each iteration, the model is trained on ‘k-1’ subsets and tested on the remaining one. This is repeated ‘k’ times, each time with a different subset as the test set. For example, in a 5-fold cross-validation, the data is partitioned into 5 parts, training on 4 and testing on the fifth. This cycle is done 5 times. The model’s performance is then averaged across all rounds, giving a more comprehensive assessment of its generalization ability

25th

My opinion on todays class.

Resampling methods involve creating new samples from the original data, which are invaluable for inference, model evaluation, and estimating statistics. One such method is bootstrapping, where we generate multiple samples by randomly selecting data points with replacement – often used to estimate population metrics or assess statistic uncertainty, especially with limited data. Cross-validation, another resampling strategy, is common in machine learning. It divides the dataset into subsets, using them iteratively for training and testing, helping gauge model generalization and identify issues like overfitting. Estimating prediction error is crucial to understanding how our model may perform with new, unseen data, and several methods can be employed based on our data and goals.

22nd sept

P-value:

The p-value, short for probability value, is a statistical measure that helps evaluate the significance of a particular finding in a statistical analysis. It quantifies the level of evidence that contradicts a null hypothesis, which often assumes that there is no effect or connection in the data being examined. A low p-value, typically below 0.05, indicates statistical significance and provides strong evidence against the null hypothesis. Conversely, a high p-value suggests limited evidence supporting the null hypothesis, signifying that the result is not statistically significant.

R-squared:

In the context of regression analysis, the R-squared statistic is employed to assess how well a model fits the given data. It quantifies the proportion of variance in the dependent variable, the variable being predicted, that can be attributed to the independent variables or predictor variables within the model. Higher R-squared values indicate a better fit, and they range from 0 to 1. An R-squared value of 1 signifies that the model perfectly explains all the variance in the data, while a value of 0 indicates that the model cannot account for any variation in the data. R-squared is used to measure the goodness of fit of a model to observed data, although it may not always accurately indicate the model’s predictive ability for future data.