How much data to you need?

Fig. 1. Comparison of accuracy of language model on pre-trained vs non pre-trained model.
Fig.2. Accuracy vs dataset decimation for pre-trained Wikitext 103 language model
Fig. 3. Multi-class classification confusion matrix for pre-trained wikitext-103 language model on the dbpedia dataset — the baseline.
Fig. 4. Average F1 score for all classes vs number of samples for classification results for the dbpedia dataset
F1 Score = 2*((Precision * Recall) / (Precision + Recall))Recall = True Positives / (True Positives + False Negatives)Precision = True Positives / (True Positives + False Positives)
Fig 5. Multi-class classification confusion matrix for pre-trained wikitext-103 language model on the dbpedia dataset trained on 1/1000th of the dataset.
Fig. 6. Multi-class classification confusion matrix for pre-trained wikitext-103 language model on the ag_news dataset — the baseline.
Fig. 7. Average F1 score for all classes vs number of samples for classification results for the ag_news dataset
Fig 8. Multi-class classification confusion matrix for pre-trained wikitext-103 language model on the ag_news dataset trained on 1/1000th of the dataset.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Adrian G

Adrian G

Geophysicist and Deep Learning Practitioner