programming-language-identification
This Model is a fine-tuned version of huggingface/CodeBERTa-small-v1 on cakiki/rosetta-code Dataset for 26 Programming Languages as mentioned below. Training Details: Model is trained for 25 epochs on Azure for nearly 26000 Datapoints for above Mentioned 26 Programming Languages extracted from Dataset having 1006 of total Programming Language. Programming Languages this model is able to detect vs Examples used for training
'ARM Assembly': 'AppleScript' 'C' 'C#' 'C++' 'COBOL' 'Erlang' 'Fortran' 'Go' 'Java' 'JavaScript' 'Kotlin' 'Lua 'Mathematica/Wolfram Language' 'PHP' 'Pascal' 'Perl' 'PowerShell' 'Python' 'R 'Ruby' 'Rust' 'Scala' 'Swift' 'Visual Basic .NET' 'jq'
Training Computer Configuration: GPU:1xNvidia Tesla T4, VRam: 16GB, Ram:112GB, Cores:6 Cores
Training Time taken: exactly 7 hours for 25 epochs Training Hyper-parameters:
Loading the model requires the 🤗 Optimum library installed.