codeparrot/github-code
Updated • 18.5k • 360
How to use usvsnsp/code-vs-nl with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-classification", model="usvsnsp/code-vs-nl") # Load model directly
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("usvsnsp/code-vs-nl")
model = AutoModelForSequenceClassification.from_pretrained("usvsnsp/code-vs-nl")This model is a fine-tuned version of distilbert-base-uncased on bookcorpus for text and codeparrot/github-code for code datasets. It achieves the following results on the evaluation set:
As it's a finetuned model, it's architecture is same as distilbert-base-uncased for Sequence Classification
Can be used to classify documents into text and code
It is a mix of above two datasets, equally random sampled
The following hyperparameters were used during training:
| Training Loss | Epoch | Step | Validation Loss | Accuracy | F1 Score |
|---|---|---|---|---|---|
| 0.5732 | 0.07 | 500 | 0.5658 | 0.9934 | 0.9934 |
| 0.5254 | 0.14 | 1000 | 0.5180 | 0.9951 | 0.9950 |
Base model
distilbert/distilbert-base-uncased