Yoonyoul

1 models โ€ข 1 total models in database
Sort by:

Fine Tuned E5 Small Drugproduct

๐Ÿงฌ Fine-tuned E5-small for Korean Drug Product Semantic Embedding ๐Ÿ“˜ Model Overview ์ด ๋ชจ๋ธ์€ intfloat/multilingual-e5-small ๊ธฐ๋ฐ˜์œผ๋กœ, ์˜์•ฝํ’ˆ ์š”์•ฝยท์ƒ์„ธ ๋ฐ์ดํ„ฐ(`drugsummary`, `drugdetails`) ๋ฐ ์ œํ’ˆ ์œ ํ˜• ์ •์˜(`drugtypedefinition`), DUR ๊ทœ์ œ ์ •์˜(`drugdurtypedefinition`)๋ฅผ ํ™œ์šฉํ•˜์—ฌ ํ•œ๊ตญ์–ด ์˜์•ฝํ’ˆ ๋„๋ฉ”์ธ์— ๋งž๊ฒŒ 3๋‹จ๊ณ„ ํŒŒ์ธํŠœ๋‹(fine-tuning) ๋œ SentenceTransformer ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค. - GitHub Repository: https://github.com/ryukato/fine-tuned-e5-drugmodel ์ด ํ”„๋กœ์ ํŠธ๋Š” ๋‹ค๊ตญ์–ด ํ™˜๊ฒฝ์—์„œ๋„ ์˜์•ฝํ’ˆ ๋ช…์นญ, ํšจ๋Šฅ, DUR ๊ทœ์ œ์˜ ๋ณต์žกํ•œ ์˜๋ฏธ ๊ด€๊ณ„๋ฅผ ์ •ํ™•ํžˆ ์ž„๋ฒ ๋”ฉํ•˜๊ธฐ ์œ„ํ•ด E5(multilingual-E5) ๊ณ„์—ด ๋ชจ๋ธ ์ค‘ `intfloat/multilingual-e5-small`์„ ์„ ํƒํ–ˆ์Šต๋‹ˆ๋‹ค. 1. ๋‹ค๊ตญ์–ด ๋ฌธ์žฅ ํ‘œํ˜„๋ ฅ - ์˜์–ด๋ฟ ์•„๋‹ˆ๋ผ ํ•œ๊ตญ์–ด, ์ผ๋ณธ์–ด, ์ค‘๊ตญ์–ด, ๋…์ผ์–ด ๋“ฑ ๋‹ค์–‘ํ•œ ์–ธ์–ด์—์„œ ๊ท ํ˜• ์žกํžŒ ์˜๋ฏธ ํ‘œํ˜„ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. - ์˜์•ฝํ’ˆ ๋ฐ์ดํ„ฐ๋Š” ์™ธ๋ž˜์–ดยทํ•™์ˆ ์šฉ์–ด๊ฐ€ ํ˜ผํ•ฉ๋œ ํ˜•ํƒœ๊ฐ€ ๋งŽ๊ธฐ ๋•Œ๋ฌธ์— multilingual encoder๊ฐ€ ์œ ๋ฆฌํ•ฉ๋‹ˆ๋‹ค. 2. ํšจ์œจ์  ์„ฑ๋Šฅ ๋Œ€๋น„ ํŒŒ๋ผ๋ฏธํ„ฐ ํฌ๊ธฐ (Small Variant) - `small` ๋ชจ๋ธ์€ ์•ฝ 33M ํŒŒ๋ผ๋ฏธํ„ฐ๋กœ, M1/M2 ๋งฅ๋ถ ๋“ฑ ๋กœ์ปฌ ํ™˜๊ฒฝ์—์„œ๋„ ์•ˆ์ •์ ์œผ๋กœ fine-tuning ๊ฐ€๋Šฅํ–ˆ์Šต๋‹ˆ๋‹ค. - FP16 ๋˜๋Š” bfloat16 ์ง€์›์œผ๋กœ GPUยทMPS ํ™˜๊ฒฝ์—์„œ๋„ ํšจ์œจ์ ์ธ ์—ฐ์‚ฐ์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. 3. ๋ฌธ์žฅ ๋‹จ์œ„ ์˜๋ฏธ ๊ฒ€์ƒ‰(semantic retrieval)์— ์ตœ์ ํ™” - E5 ๋ชจ๋ธ์€ โ€œ๋ฌธ์žฅ ๋‹จ์œ„ ์˜๋ฏธ ์ž„๋ฒ ๋”ฉ(Sentence Embedding)โ€์„ ์œ„ํ•ด ํ•™์Šต๋˜์–ด ์žˆ์–ด, ๋‹จ์ˆœ ์งˆ์˜(`"๊ธฐ์นจ์•ฝ"`, `"์—ด ๋‚ด๋ฆฌ๋Š” ์•ฝ"`)์™€ ์ œํ’ˆ๋ช…(`"ํŒ์ฝœ์—์ด"`, `"ํƒ€์ด๋ ˆ๋†€"`) ๊ฐ„ ์˜๋ฏธ ๋งค์นญ์— ๋›ฐ์–ด๋‚œ ์„ฑ๋Šฅ์„ ๋ณด์ž…๋‹ˆ๋‹ค. 4. Sentence-Transformers์™€ ์™„๋ฒฝํ•œ ํ˜ธํ™˜์„ฑ - `SentenceTransformer` ์ธํ„ฐํŽ˜์ด์Šค์™€ 100% ํ˜ธํ™˜๋˜์–ด, PyTorch ๊ธฐ๋ฐ˜ pipeline ํ†ตํ•ฉ์ด ์šฉ์ดํ–ˆ์Šต๋‹ˆ๋‹ค. ๐Ÿ”น Step 1: Drug Type Semantic Alignment - ๋ฐ์ดํ„ฐ์…‹: `drugtypedeflist.csv` - ๋ชฉํ‘œ: `"ํ•ด์—ด์ œ" โ†’ "์ฒด์˜จ์„ ๋‚ฎ์ถ”๋Š” ์•ฝ"` ๊ณผ ๊ฐ™์€ ๊ฐœ๋… ๋งคํ•‘ ํ•™์Šต - ๋ชจ๋ธ ๊ฒฐ๊ณผ: `/model/finetunede5smalldrugtype` ๐Ÿ”น Step 2: DUR Type Semantic Alignment - ๋ฐ์ดํ„ฐ์…‹: `drugdurtypesimilaritytrain.csv` - ๋ชฉํ‘œ: `"์ž„๋ถ€๊ธˆ๊ธฐ"`, `"๋…ธ์ธ์ฃผ์˜"`, `"๋ณ‘์šฉ๊ธˆ๊ธฐ"` ๋“ฑ DUR ํƒ€์ž…๊ณผ ์ „๋ฌธ์  ์„ค๋ช… ๊ฐ„ ์˜๋ฏธ ๋งคํ•‘ ํ•™์Šต - ๋ชจ๋ธ ๊ฒฐ๊ณผ: `/model/finetunede5smalldrugdurtype` ๐Ÿ”น Step 3: Drug Product Semantic Alignment - ๋ฐ์ดํ„ฐ์…‹: `drugproductsimilaritytrain.csv` (์•ฝ 3,000๊ฑด ์ƒ˜ํ”Œ) - ๋ชฉํ‘œ: `"ํŒ์ฝœ์—์ด๋‚ด๋ณต์•ก"` ๊ฐ™์€ ์‹ค์ œ ์ œํ’ˆ๊ณผ `"์—ด์„ ๋‚ด๋ฆฌ๋Š” ์•ฝ"` ๊ฐ™์€ ์งˆ์˜ ๊ฐ„ ์˜๋ฏธ ๋งค์นญ ๊ฐ•ํ™” - ๋ชจ๋ธ ๊ฒฐ๊ณผ: `/model/finetunede5smalldrugproductaccum` ๐Ÿ”น Experimental: Drug Ingredient + Product Type Fine-tuning `finetunede5smalldrugdurtype` ๋ชจ๋ธ์„ ๊ธฐ๋ฐ˜์œผ๋กœ, ์˜์•ฝํ’ˆ ์„ฑ๋ถ„(`ingredientname`)๊ณผ ์ œํ’ˆ ์œ ํ˜•(`producttype`)์„ ๊ฒฐํ•ฉํ•œ ์ž„๋ฒ ๋”ฉ ํ•™์Šต(`finetunede5smalldrugptypeingredients`)์„ ์ถ”๊ฐ€๋กœ ์ˆ˜ํ–‰ํ•˜์˜€์Šต๋‹ˆ๋‹ค. โš™๏ธ ์ ์šฉ ๋‚ด์šฉ | ํ•ญ๋ชฉ | ๊ฐ’ | |------|----| | ํ•™์Šต ๋ฐ์ดํ„ฐ | `"์„ฑ๋ถ„๋ช…์€(๋Š”) ์ œํ’ˆ์œ ํ˜• ์ œ์ œ์—์„œ ์‚ฌ์šฉ๋˜๋Š” ์˜์•ฝ ์„ฑ๋ถ„์ด๋‹ค."` | | ์ƒ˜ํ”Œ ์‚ฌ์ด์ฆˆ | 1,289 | | ํ‰๊ท  ์†์‹ค | 0.0012 | | ์œ ์‚ฌ๋„ ํ‰๊ฐ€ | ์˜๋ฏธ์  ๊ตฌ๋ถ„์ด ์ถฉ๋ถ„ํžˆ ์ด๋ฃจ์–ด์ง€์ง€ ์•Š์Œ | | ๊ด€์ฐฐ ์˜ˆ์‹œ | โ€œ์†Œ์—ผยท์ง„ํ†ต์ œโ€ ๊ณ„์—ด์˜ `์ด๋ถ€ํ”„๋กœํŽœ`๊ณผ ๋น„๊ด€๋ จ ์„ฑ๋ถ„์ธ `์—ผํ™”๋‚˜ํŠธ๋ฅจ`, `์„ธํ‹ฐ๋ฆฌ์ง„`์ด ๋ชจ๋‘ 0.91~0.94 ์ˆ˜์ค€์˜ ์œ ์‚ฌ๋„๋ฅผ ๋ณด์ž„ | ๐Ÿ“‹ ๊ด€์ฐฐ ๋‚ด์šฉ - ๋ชจ๋ธ์€ ์•ˆ์ •์ ์œผ๋กœ ์ˆ˜๋ ดํ•˜์˜€์œผ๋‚˜, ๋ฌธ์žฅ ํŒจํ„ด์˜ ๋ฐ˜๋ณต์„ฑ๊ณผ Positive-only ๋ฐ์ดํ„ฐ ๊ตฌ์„ฑ์œผ๋กœ ์ธํ•ด ํšจ๋Šฅ๊ตฐ ๊ฐ„ ์˜๋ฏธ์  ๊ฒฝ๊ณ„๊ฐ€ ์ œ๋Œ€๋กœ ํ˜•์„ฑ๋˜์ง€ ์•Š์•˜์Œ. - ์ „์ฒด ์œ ์‚ฌ๋„ ๋ถ„ํฌ๊ฐ€ ๊ณผ๋„ํ•˜๊ฒŒ ๋†’๊ฒŒ ์ˆ˜๋ ดํ•˜์—ฌ, ์˜๋ฏธ๋ณด๋‹ค ๋ฌธ์ฒด ํŒจํ„ด์„ ์ค‘์‹ฌ์œผ๋กœ ํ•™์Šต๋œ ๊ฒƒ์œผ๋กœ ๊ด€์ฐฐ๋จ. - ๊ฒฐ๋ก ์ ์œผ๋กœ ๋ณธ ๋ชจ๋ธ์€ ์„ฑ๋ถ„-์ œํ’ˆ์œ ํ˜• ๊ฒฐํ•ฉ ํ•™์Šต์ด ์˜๋ฏธ ๊ฒ€์ƒ‰ ํ’ˆ์งˆ ๊ฐœ์„ ์— ์‹ค์งˆ์  ์ด์ ์„ ์ œ๊ณตํ•˜์ง€ ์•Š์Œ์„ ํ™•์ธํ•˜์˜€์œผ๋ฉฐ, ํ˜„์žฌ ํŒŒ์ดํ”„๋ผ์ธ์—๋Š” ์ ์šฉํ•˜์ง€ ์•Š๊ธฐ๋กœ ๊ฒฐ์ •ํ•จ. | ํ•ญ๋ชฉ | ๋ฒ„์ „ | |------|------| | Python | 3.12.4 | | torch | 2.4.1 | | transformers | 4.44.2 | | sentence-transformers | 3.0.1 | | accelerate | 0.27.0 | | pandas | 2.2.3 | ๐Ÿ“… Release Info - Author: @Yoonyoul - Base Model: `intfloat/multilingual-e5-small` - Fine-tuned Model: `Yoonyoul/fine-tuned-e5-small-drugproduct` - Repository: https://github.com/ryukato/fine-tuned-e5-drugmodel - Last Updated: 2025-10-27

license:mit
91
8