phayathaibert-thai-pos-tagger

Name: phayathaibert-thai-pos-tagger
Author: sandpapat

license:mit

sandpapat

Other

OTHER

New

0 downloads

Early-stage

Try on Hugging Face Add to Compare

Edge AI:

Mobile

Laptop

Server

Unknown

Mobile

Laptop

Server

Quick Summary

AI model with specialized capabilities.

Code Examples

Usagepythontransformers

from transformers import AutoTokenizer, AutoModelForTokenClassification
import torch

tokenizer = AutoTokenizer.from_pretrained("sandpapat/phayathaibert-thai-pos-tagger")
model = AutoModelForTokenClassification.from_pretrained("sandpapat/phayathaibert-thai-pos-tagger")

model.eval()

Option 1: Raw Thai text (automatic word segmentation)python

from pythainlp.tokenize import word_tokenize

def predict_pos(text: str):
    """POS tag Thai text (raw string)."""

    # 1. Word segmentation
    words = word_tokenize(text)

    # 2. Tokenize with alignment
    encoded = tokenizer(
        words,
        is_split_into_words=True,
        return_tensors="pt"
    )
    word_ids = encoded.word_ids()

    with torch.no_grad():
        outputs = model(**encoded)
        preds = outputs.logits.argmax(dim=-1)[0]

    # 3. Align subwords → words
    results = []
    prev = None
    for idx, w_id in enumerate(word_ids):
        if w_id is None:
            continue
        if w_id != prev:
            label = model.config.id2label[preds[idx].item()]
            results.append((words[w_id], label))
        prev = w_id

    return results

# Example
text = "ฉันกินข้าวที่ร้านอาหาร"
for w, p in predict_pos(text):
    print(f"{w:15s} {p}")

Examplepython

def predict_pos_from_words(words):
    """POS tag a list of pre-segmented Thai words."""

    encoded = tokenizer(
        words,
        is_split_into_words=True,
        return_tensors="pt"
    )
    word_ids = encoded.word_ids()

    with torch.no_grad():
        outputs = model(**encoded)
        preds = outputs.logits.argmax(dim=-1)[0]

    results = []
    prev = None
    for idx, w_id in enumerate(word_ids):
        if w_id is None:
            continue
        if w_id != prev:
            label = model.config.id2label[preds[idx].item()]
            results.append((words[w_id], label))
        prev = w_id

    return results

# Example
words = ["ฉัน", "กิน", "ข้าว", "ที่", "ร้านอาหาร"]
for w, p in predict_pos_from_words(words):
    print(f"{w:15s} {p}")

Exampletext

Input: "ฉันกินข้าวที่ร้านอาหาร"

ฉัน: PRON
กิน: VERB
ข้าว: NOUN
ที่: ADP
ร้านอาหาร: NOUN

Deploy This Model

Production-ready deployment in minutes

Together.ai

Instant API access to this model

Fastest API

Production-ready inference API. Start free, scale to millions.

Try Free API

Replicate

One-click model deployment

Easiest Setup

Run models in the cloud with simple API. No DevOps required.

Deploy Now

Disclosure: We may earn a commission from these partners. This helps keep LLMYourWay free.