OCR Tamil - Easy, Accurate and Simple to use Tamil OCR - (ஒளி எழுத்துணரி)
OCR Tamil can help you extract text from signboard, nameplates, storefronts etc., from Natural Scenes with high accuracy. This version of OCR is much more robust to tilted text compared to the Tesseract, Paddle OCR and Easy OCR as they are primarily built to work on the documents texts and not on natural scenes. This model is work in progress, feel free to contribute!!!
Input Image | OCR TAMIL 🏆 | Tesseract | EasyOCR | |:--------------------------------------------------------------------------:|:--------------------:|:-----------------:|:-----------------:| | | வாழ்கவளமுடன்✅ | க் க்கஸாரகளள௮ஊகஎளமுடன் ❌ | வாழக வளமுடன்❌| | | தமிழ்வாழ்க✅ | NO OUTPUT ❌ | தமிழ்வாழ்க✅ | | | கோபி ✅ | NO OUTPUT ❌ | ப99❌ | | | தாம்பரம் ✅ | NO OUTPUT ❌ | தாம்பரம❌ | | | நெடுஞ்சாலைத் ✅ | NO OUTPUT ❌ |நெடுஞ்சாலைத் ✅ | | | அண்ணாசாலை ✅ | NO OUTPUT ❌ | ல@I9❌ | | | ரெடிமேடஸ் ❌ |NO OUTPUT ❌ | ரெடிமேடஸ் ❌ |
Obtained Tesseract and EasyOCR results using the Colab notebook with Tamil and english as language
Quick links🌐 📔 Detailed explanation on Medium article.
Pip install instructions🐍 In your command line, run the following command
Tested using Python 3.10 on Windows & Linux (Ubuntu 22.04) Machines
Applications⚡ 1. ADAS system navigation based on the signboards + maps (hybrid approach) 🚁 2. License plate recognition 🚘
1. Unable to read the text if they are present in rotated forms
2. Currently supports Only English and Tamil Language
3. Document Text reading capability is limited. Auto identification of Paragraph, line are not supported along with Text detection model inability to detect and crop the Tamil text leads to accuracy decrease (WORKAROUND Can use your own text detection model along with OCR tamil text recognition model)
Character இ missing due to text detection model error
?யற்கை மூலிகைகளில் இருந்து ஈர்த்தெடுக்கக்கப்பட்ட வீரிய உட்பொருட்களை உள்ளடக்கி எந்த இரசாயன சேர்க்கைகளும் இல்லாமல் உருவாக்கப்பட்ட இந்தியாவின் முதல் சித்த தயாரிப்பு