FinLang
finance-embeddings-investopedia
This is the Investopedia embedding for finance application by the FinLang team. The model is trained using our open-sourced finance dataset from https://huggingface.co/datasets/FinLang/investopedia-embedding-dataset This is a finetuned embedding model on top of BAAI/bge-base-en-v1.5. It maps sentences & paragraphs to a 768 dimensional dense vector space and can be used for tasks like clustering or semantic search in RAG applications. This project is for research purposes only. Third-party datasets may be subject to additional terms and conditions under their associated licenses. Plans The research paper will be published soon. We are working on a v2 version of the model where we are increasing the training corpus of financial data and using improved techniques for training embeddings. Simply specify the Finlang embedding during the indexing procedure for your Financial RAG applications. Using this model becomes easy when you have sentence-transformers installed (see https://huggingface.co/sentence-transformers): We evaluate our model on unseen pairs of sentences for similarity and unseen shuffled pairs of sentences for dissimilarity. Our evaluation suite contains sentence pairs from: Investopedia (to test for proficiency on finance), and Gooaq, MSMARCO,stackexchangeduplicatequestionstitletitle, yahooanswerstitleanswer (to evaluate models ability to avoid forgetting after finetuning). Since non-commercial datasets are used for fine-tuning, we release this model as cc-by-nc-4.0.