zai-org
GLM-OCR
GLM-OCR is a multimodal OCR model for complex document understanding, built on the GLM-V encoder–decoder architecture. It introduces Multi-Token Prediction (MTP) loss and stable full-task reinforce...
GLM-5-FP8
📄 Read the paper: GLM-5: from Vibe Coding to Agentic Engineering .
GLM-4.6-FP8
--- language: - en - zh library_name: transformers license: mit pipeline_tag: text-generation ---
chatglm2-6b
--- language: - zh - en tags: - glm - chatglm - thudm ---
GLM-4.7-Flash
GLM-4.5-Air
--- language: - en - zh library_name: transformers license: mit pipeline_tag: text-generation ---
GLM-4.1V-9B-Thinking
--- license: mit language: - en - zh base_model: - zai-org/GLM-4-9B-0414 pipeline_tag: image-text-to-text library_name: transformers tags: - reasoning ---
GLM-4.5V-FP8
--- base_model: - zai-org/GLM-4.5-Air-Base language: - zh - en library_name: transformers license: mit pipeline_tag: image-text-to-text ---
GLM-5
GLM-ASR-Nano-2512
GLM-4.5-Air-FP8
📖 Check out the GLM-4.5 technical blog , technical report , and Zhipu AI technical documentation . 📍 Use GLM-4.5 API services on Z.ai API Platform (Global) or Zhipu AI Open Platform (Mainland Chi...
AutoGLM-Phone-9B
glm-4v-9b
2024/08/12, 本仓库代码已更新并使用 `transforemrs>=4.44.0`, 请及时更新依赖。 GLM-4V-9B 是智谱 AI 推出的最新一代预训练模型 GLM-4 系列中的开源多模态版本。 GLM-4V-9B 具备 1120 1120 高分辨率下的中英双语多轮对话能力,在中英文综合能力、感知推理、文字识别、图表理解等多方面多模态评测中,GLM-4V-9B 表现出超越 GPT-4-turbo-2024-04-09、Gemini 1.0 Pro、Qwen-VL-Max 和 Claude 3 Opus 的卓越性能。 | | MMBench-EN-Test | MMBench-CN-Test | SEEDBenchIMG | MMStar | MMMU | MME | HallusionBench | AI2D | OCRBench | |-------------------------|---------------------|---------------------|-------------------|------------|----------|---------|--------------------|----------|--------------| | | 英文综合 | 中文综合 | 综合能力 | 综合能力 | 学科综合 | 感知推理 | 幻觉性 | 图表理解 | 文字识别 | | GPT-4o, 20240513 | 83.4 | 82.1 | 77.1 | 63.9 | 69.2 | 2310.3 | 55 | 84.6 | 736 | | GPT-4v, 20240409 | 81 | 80.2 | 73 | 56 | 61.7 | 2070.2 | 43.9 | 78.6 | 656 | | GPT-4v, 20231106 | 77 | 74.4 | 72.3 | 49.7 | 53.8 | 1771.5 | 46.5 | 75.9 | 516 | | InternVL-Chat-V1.5 | 82.3 | 80.7 | 75.2 | 57.1 | 46.8 | 2189.6 | 47.4 | 80.6 | 720 | | LlaVA-Next-Yi-34B | 81.1 | 79 | 75.7 | 51.6 | 48.8 | 2050.2 | 34.8 | 78.9 | 574 | | Step-1V | 80.7 | 79.9 | 70.3 | 50 | 49.9 | 2206.4 | 48.4 | 79.2 | 625 | | MiniCPM-Llama3-V2.5 | 77.6 | 73.8 | 72.3 | 51.8 | 45.8 | 2024.6 | 42.4 | 78.4 | 725 | | Qwen-VL-Max | 77.6 | 75.7 | 72.7 | 49.5 | 52 | 2281.7 | 41.2 | 75.7 | 684 | | GeminiProVision | 73.6 | 74.3 | 70.7 | 38.6 | 49 | 2148.9 | 45.7 | 72.9 | 680 | | Claude-3V Opus | 63.3 | 59.2 | 64 | 45.7 | 54.9 | 1586.8 | 37.8 | 70.6 | 694 | | GLM-4v-9B | 81.1 | 79.4 | 76.8 | 58.7 | 47.2 | 2163.8 | 46.6 | 81.1 | 786 |
CogVideoX-2b
chatglm3-6b
💻 Github Repo • 🐦 Twitter • 📃 [GLM@ACL 22] [GitHub] • 📃 [GLM-130B@ICLR 23] [GitHub] 📍Experience the larger-scale ChatGLM model at chatglm.cn 我们已经发布最新的 GLM-4 模型,该模型在多个指标上有了新的突破,您可以在以下两个渠道体验我们的最...
GLM-4.7
GLM-5.1-FP8
GLM-4.6
Generates text in English, Chinese, and Arabic with a focus on conversational contexts. Supports various applications with 65,597 downloads and is compatible with autotrain and endpoints.
GLM-4-32B-0414
The GLM family welcomes new members, the GLM-4-32B-0414 series models, featuring 32 billion parameters. Its performance is comparable to OpenAI’s GPT series and DeepSeek’s V3/R1 series. It also supports very user-friendly local deployment features. GLM-4-32B-Base-0414 was pre-trained on 15T of high-quality data, including substantial reasoning-type synthetic data. This lays the foundation for subsequent reinforcement learning extensions. In the post-training stage, we employed human preference alignment for dialogue scenarios. Additionally, using techniques like rejection sampling and reinforcement learning, we enhanced the model’s performance in instruction following, engineering code, and function calling, thus strengthening the atomic capabilities required for agent tasks. GLM-4-32B-0414 achieves good results in engineering code, Artifact generation, function calling, search-based Q&A, and report generation. In particular, on several benchmarks, such as code generation or specific Q&A tasks, GLM-4-32B-Base-0414 achieves comparable performance with those larger models like GPT-4o and DeepSeek-V3-0324 (671B). GLM-Z1-32B-0414 is a reasoning model with deep thinking capabilities. This was developed based on GLM-4-32B-0414 through cold start, extended reinforcement learning, and further training on tasks including mathematics, code, and logic. Compared to the base model, GLM-Z1-32B-0414 significantly improves mathematical abilities and the capability to solve complex tasks. During training, we also introduced general reinforcement learning based on pairwise ranking feedback, which enhances the model's general capabilities. GLM-Z1-Rumination-32B-0414 is a deep reasoning model with rumination capabilities (against OpenAI's Deep Research). Unlike typical deep thinking models, the rumination model is capable of deeper and longer thinking to solve more open-ended and complex problems (e.g., writing a comparative analysis of AI development in two cities and their future development plans). Z1-Rumination is trained through scaling end-to-end reinforcement learning with responses graded by the ground truth answers or rubrics and can make use of search tools during its deep thinking process to handle complex tasks. The model shows significant improvements in research-style writing and complex tasks. Finally, GLM-Z1-9B-0414 is a surprise. We employed all the aforementioned techniques to train a small model (9B). GLM-Z1-9B-0414 exhibits excellent capabilities in mathematical reasoning and general tasks. Its overall performance is top-ranked among all open-source models of the same size. Especially in resource-constrained scenarios, this model achieves an excellent balance between efficiency and effectiveness, providing a powerful option for users seeking lightweight deployment. write a Python program that shows a ball bouncing inside a spinning hexagon. The ball should be affected by gravity and friction, and it must bounce off the rotating walls realistically Use HTML to simulate the scenario of a small ball released from the center of a rotating hexagon. Consider the collision between the ball and the hexagon's edges, the gravity acting on the ball, and assume all collisions are perfectly elastic. (Prompt translated from Chinese) Design a drawing board that supports custom function plotting, allowing adding and deleting custom functions, and assigning colors to functions. (Prompt translated from Chinese) Design a UI for a mobile machine learning platform, which should include interfaces for training tasks, storage management, and personal statistics. The personal statistics interface should use charts to display the user's resource usage over a period. Use Tailwind CSS to style the page, and display these 3 mobile interfaces tiled on a single HTML page. (Prompt translated from Chinese) Create a misty Jiangnan scene using SVG. (Prompt translated from Chinese) Use SVG to illustrate the training process of an LLM. (Prompt translated from Chinese) For search-based writing tasks, we use the following system prompt to have the model respond based on search results: When using, you can obtain search results through methods such as `RAG` or `WebSearch`, and wrap them in `observation`, for example: For the above prompt, we use an internal or external search model to obtain the search results. Using the format shown above, we can generate the following analysis report: Analysis Report on Common Characteristics of Children's Literature Children's literature, as a literary genre specifically created for children, possesses unique artistic features and expressive techniques. This report will comprehensively analyze the common characteristics of children's literature from three dimensions: narrative methods, thematic tendencies, and other universal features, integrating academic research, classic examples of children's literature, and expert viewpoints. Narrative Methods and Their Impact on Child Readers The narrative methods of children's literature exhibit diverse characteristics, including first-person, third-person, narration, and interactive storytelling. These different narrative perspectives and methods profoundly influence children's reading experiences and cognitive development. First-person narration is common in children's literature, unfolding the story from the perspective of a child protagonist. This narrative style bridges the gap between the reader and the story, making it easier for child readers to identify with the characters and develop emotional resonance. For example, many coming-of-age novels use first-person narration, allowing the protagonist to directly recount their experiences and feelings, making readers feel as though they are experiencing the joys and sorrows of growth alongside the protagonist. This narrative perspective lends the work a strong sense of authenticity and intimacy, helping to cultivate children's empathy【1†source】. Third-person narration offers a broader perspective, allowing the author to flexibly switch between different characters' viewpoints and present richer layers of the story. In children's literature, third-person omniscient narration enables the author to control the narrative pace, revealing or concealing information as needed to guide children's attention. At the same time, third-person narration facilitates direct dialogue between the author and the reader, conveying values or explaining complex concepts through narration. This narrative method positively influences children's macro-thinking and comprehensive understanding【1†source】. Narration (authorial intrusion) is a unique narrative technique in children's literature, where the author directly appears as the "storyteller," explaining the background, commenting on characters, or posing questions to the reader. This technique is particularly common in classic fairy tales, such as the opening lines of Andersen's Fairy Tales: "Once, there was a child..." Narration helps children understand the story's context, fills cognitive gaps, and conveys the author's educational intent. Research shows that appropriate authorial intrusion aids children in grasping the story's structure and improving reading comprehension【5†source】. Interactive storytelling is a new trend in contemporary children's literature, especially prominent in the digital media era. Interactive storytelling breaks the traditional unidirectional author-reader relationship, encouraging child readers to participate in the story's creation, such as by choosing plot directions, character dialogues, or endings. This participatory reading enhances children's sense of agency and fosters decision-making skills and creative thinking. For example, some children's reading apps incorporate interactive elements, allowing children to influence the story's development through clicks, drag-and-drop actions, and other operations, thereby gaining a stronger sense of immersion and achievement【6†source】. Interactive storytelling transforms children from passive information recipients into active meaning-makers, uniquely contributing to the development of their subjectivity. Table: Common Narrative Methods in Children's Literature and Their Effects | Narrative Method | Characteristics | Impact on Child Readers | Classic Examples | |----------------------|--------------------|----------------------------|---------------------| | First-Person | Told from the child protagonist's perspective | Enhances immersion, fosters empathy | Charlotte's Web, The Straw House | | Third-Person | Omniscient or limited perspective | Expands horizons, develops comprehensive understanding | Harry Potter series | | Narration | Direct authorial intrusion into the narrative | Aids comprehension, conveys values | Andersen's Fairy Tales | | Interactive | Encourages reader participation in creation | Cultivates agency and creative thinking | Children's interactive reading apps | Notably, the narrative methods of children's literature are often closely intertwined with the childhood perspective. The childhood perspective does not necessarily mean the narrator must be a child but refers to the work's ability to describe the world to the greatest extent from a child's heart, expressing their inner psychology and external circumstances【2†source】. Through the childhood perspective, readers can embark on a spiritual journey with a child's mindset, a narrative strategy that creates a strong sense of realism, allowing child readers to achieve emotional identification and cognitive resonance during the reading process【1†source】. The use of the childhood perspective gives the work's language a perceptual and naive quality, often with a prose-like and spatial structure, artistic features that align with children's cognitive characteristics and aid their acceptance and understanding【2†source】. Thematic Tendencies and Their Impact on Children's Cognitive and Emotional Development The thematic choices in children's literature exhibit distinct tendencies, with common themes including growth, adventure, friendship, and family. These themes not only form the core content of children's literature but also subtly influence children's cognitive development and emotional shaping. The theme of growth is one of the central motifs in children's literature. Growth narratives are regarded as the artistic lifeblood of children's literature, focusing on depicting the pivotal moments of rapid psychological development in children, particularly the awakening and establishment of self-awareness【3†source】. Growth literature typically includes three elements: an artistic portrayal of the self-awareness construction process in growing adolescents, a developmental story with logical propulsion, and the presentation of the protagonist's spiritual trials and quest for direction【3†source】. By reading growth-themed works, child readers can indirectly experience the confusion and breakthroughs of growing up and understand the formation of self-identity. Classics such as Astrid Lindgren's Pippi Longstocking and Cao Wenxuan's The Straw House vividly depict children's psychological growth trajectories in specific environments. Research indicates that growth-themed literary works help children build a positive self-concept and develop the courage and resilience to face challenges, positively contributing to their psychological development【9†source】. The theme of adventure holds an important place in children's literature, satisfying children's curiosity about exploring the unknown. Adventure stories often feature unusual settings and unknown challenges, with the protagonist growing through overcoming difficulties. Classics like Robinson Crusoe and The Adventures of Tom Sawyer attract child readers with thrilling plots while conveying the importance of qualities such as courage, wisdom, and perseverance. The impact of adventure themes on children's cognitive development mainly lies in expanding their imaginative space and fostering problem-solving skills. In adventure stories, children must analyze situations, make plans, and respond to unexpected events alongside the protagonist, a process that exercises their logical thinking and adaptability【14†source】. At the same time, the unfamiliar environments and novel experiences in adventure stories stimulate children's curiosity and desire to learn, laying the foundation for cultivating an exploratory spirit. As experts point out, excellent children's literature should be grounded in reality, rich in depth, and generate significant inspiration and感染力, guiding children to comprehensively understand the world【14†source】. The theme of friendship is equally prevalent in children's literature, reflecting children's emphasis on peer relationships. Friendship and love are regarded as humanity's most precious qualities, often depicted in children's literature as beacons in the night, guiding children toward the future【9†source】. Friendship stories typically revolve around interactions between children, portraying positive behaviors such as sharing, cooperation, and understanding. Examples include the genuine friendships among the children at Tomoe Gakuen in Totto-Chan: The Little Girl at the Window and the promise and mutual aid between Wilbur and Charlotte in Charlotte's Web. These stories help child readers recognize the value of friendship and learn how to build and maintain interpersonal relationships. Research shows that children need peer support during their growth, as friends provide crucial emotional anchors, offering the greatest emotional support and comfort in unfamiliar environments【16†source】. By reading friendship-themed works, children can learn social skills, develop empathy, and cultivate a spirit of cooperation, qualities essential for their social development【17†source】. The theme of family is an indispensable subject in children's literature, depicting the emotional bonds and interaction patterns among family members. As the primary setting for children's earliest socialization, the family atmosphere and parenting styles profoundly impact children's mental health【10†source】. Family stories in children's literature often focus on parent-child relationships, sibling bonds, and other dynamics, such as Alice's relationship with her sister in Alice's Adventures in Wonderland and the Little Prince's interactions with the rose in The Little Prince. These stories help children understand the responsibilities and expectations of family roles and learn to handle conflicts within the family. Research indicates that a positive family atmosphere and parental support promote the development of children's positive psychological traits, while adverse family environments and parenting behaviors negatively affect their mental health【10†source】【11†source】. By reading family-themed works, children can gain emotional support, learn skills for managing family relationships, and establish healthy family values. Table: Common Themes in Children's Literature and Their Impact on Child Development | Theme Type | Content Representation | Impact on Cognitive Development | Impact on Emotional Development | Classic Examples | |---------------|---------------------------|-------------------------------------|-------------------------------------|---------------------| | Growth | Awakening of self-awareness, psychological trials and breakthroughs | Establishes self-concept, fosters problem-solving skills | Shapes positive self-identity, enhances psychological resilience | The Straw House, Pippi Longstocking | | Adventure | Exploring the unknown, overcoming challenges | Expands imaginative space, exercises logical thinking | Cultivates courage and perseverance | Robinson Crusoe, The Adventures of Tom Sawyer | | Friendship | Peer interactions, mutual aid and cooperation | Learns social skills, understands interpersonal dynamics | Develops empathy, builds a sense of belonging | Charlotte's Web, Totto-Chan: The Little Girl at the Window | | Family | Parent-child relationships, sibling bonds | Understands social roles, learns communication skills | Gains emotional support, establishes secure attachments | Alice's Adventures in Wonderland, The Little Prince | Regarding thematic choices, children's literature researcher Zhu Ziqiang proposed the famous "Three Major Motifs" theory, categorizing children's literary works into "the motif of love," "the motif of the mischievous child," and "the motif of nature"【8†source】. The motif of love focuses on emotional connections between children and adults or peers; the motif of the mischievous child portrays children's free-spirited nature; and the motif of nature emphasizes the harmonious relationship between children and the natural environment. These three motifs reflect the richness of the children's world from different angles, providing diverse emotional experiences and cognitive frameworks for children. Notably, these themes do not exist in isolation; outstanding works often organically integrate multiple themes. For example, the Harry Potter series incorporates growth, friendship, adventure, and family elements, presenting child readers with a multidimensional spiritual world. Other Universal Features and Their Artistic Expression In addition to narrative methods and thematic tendencies, children's literature exhibits a series of universal artistic features, including anthropomorphism, repetitive language, symbolism and metaphor, and educational significance. These features collectively constitute the unique aesthetic style of children's literature, subtly influencing children's cognitive development and aesthetic cultivation. Anthropomorphism is one of the most distinctive artistic features of children's literature. In children's literary works, animals, plants, and even inanimate objects are often endowed with human thoughts, emotions, and behaviors, greatly enhancing the story's fun and imagination. Research shows that anthropomorphism is a frequently used technique by children's literature creators to attribute human characteristics to animals, enabling them to possess perception and communication abilities【19†source】. Through anthropomorphism, children can more easily understand abstract concepts and moral principles, as anthropomorphic characters translate complex ideas into familiar emotional and behavioral patterns. For example, in scientific fairy tales, anthropomorphic characters can help explain scientific principles, making abstract concepts tangible【18†source】. Anthropomorphism not only enriches the narrative techniques of children's literature but also provides children with a unique perspective for understanding the relationship between humans and nature. It is worth noting that excessive anthropomorphism may affect children's accurate understanding of the animal world, so modern children's literature pays more attention to balancing the natural attributes of characters with human characteristics when employing anthropomorphic techniques【19†source】. Repetitive language is extremely common in children's literature, a linguistic feature rooted in oral traditions originally intended to aid memory and dissemination【20†source】. In children's literature, the repetitive use of words, phrases, or sentences serves multiple functions: constructing the story's framework, emphasizing key information, creating rhythm and musicality, and training children's vocabulary skills. For example, in The Very Hungry Caterpillar, the author repeatedly uses phrases like "On Monday, he ate one apple. On Tuesday, he ate two pears..." This not only builds the story's structure but also helps children learn numbers and days of the week. Repetitive structures also aid children in developing an awareness of language patterns during the early stages of language acquisition, fostering a sense of language and memory skills【21†source】. Research indicates that repetitive language in children's literature promotes children's language acquisition, helping them master vocabulary and syntactic rules. At the same time, this linguistic feature enhances the story's participatory nature, as children can often join in reciting the repetitive parts, gaining a sense of achievement. Symbolism and metaphor are common expressive techniques in children's literature, conveying abstract meanings through concrete imagery. Symbolism uses specific objects to represent abstract concepts or emotions, while metaphor connects two different things through comparison, creating new meanings. In children's literature, symbolism and metaphor are usually presented in a simple and clear manner, avoiding overly complex interpretations. For example, the character configurations and metaphorical connotations in The Wizard of Oz are thought-provoking, as these characters not only breathe life into the story but also convey profound life philosophies through their symbolic meanings【24†source】. Symbolism and metaphor in children's literature are often related to themes such as growth, friendship, and courage, helping children understand abstract concepts through concrete and figurative expressions. Research shows that appropriate metaphors can promote children's cognitive development, stimulating their imagination and creativity【23†source】. As children grow older, their ability to understand symbolism and metaphor gradually improves, providing children's literature with multi-layered meaning spaces. Educational significance is an indispensable component of children's literature, which inherently carries the gene of children's education【22†source】. Excellent children's literary works simultaneously possess entertainment and educational functions, not only helping children understand the objective world, enrich their inner emotions, and acquire life wisdom but also cultivating their perception, aesthetic sensibility, thinking skills, and creativity【15†source】. Educational significance in children's literature is often not directly presented through preaching but naturally revealed through the storyline and characters' fates. For example, many classic fairy tales convey the importance of qualities such as bravery and honesty through the protagonist's adventurous experiences, while popular science books introduces scientific knowledge through interesting plots and characters. Experts point out that children's literature writers should shoulder the importantence of education, incorporating care for children's mental growth into their works【22†source】. It is worth noting that the educational significance of children's literature should respect children's receptive abilities, avoiding excessive preaching or moral indoctrination, and instead naturally influencing children's values and behaviors through artistic appeal. Storytelling is the most basic and essential feature of children's literature. Children's perceptual, imagery-driven, and novelty-seeking cognitive characteristics and receptive psychology further determine that "storytelling" is an indispensable ontological feature of children's literature【25†source】. Engaging plots are the most crucial aspect of children's literary works because, compared to adults, children's understanding of things relies mainly on intuition, and plots play a key role in guiding children's comprehension of stories【26†source】. The storytelling quality of children's literature is reflected in multiple aspects: clear cause-and-effect relationships, Compact narrative rhythm and satisfying endings. These elements work together to immerse children in the story world, providing emotional satisfaction and cognitive inspiration. As researchers have noted, plots must be performed by specific characters in specific situations to convey individual experiences in unique space-time environments【7†source】. In children's literature, storytelling is not merely an artistic technique but a bridge connecting children to the world. Through stories, children can safely experience various life scenarios and learn methods for challenges. In terms of language features, children's literature typically adopts a concise, clear, and vivid language style, avoiding complex sentence structures and abstract vocabulary. This linguistic characteristic aligns with children's cognitive development levels, facilitating their understanding and acceptance. At the same time, the language of children's literature is often rich in rhythm and musicality, enhancing readability and memorability through techniques such as rhyming and repetition. For example, Michael Rosen's children's literary works extensively employ repetitive structures and rhymes, a language usage that helps children develop an awareness of language patterns during the early stages of language acquisition【21†source】. The language of children's literature also often includes rich sensory descriptions and emotional expressions, stimulating children's imagination through concrete and tangible imagery. Scholar Jay Davis's research shows that the interactive use of language in children's literature can influence children's language habits and promote their language development【21†source】. In summary, these universal features of children's literature collectively constitute its unique artistic charm and educational value. Anthropomorphism and symbolism expand children's imaginative spaces, repetitive language and storytelling promote language acquisition and cognitive development, and the natural integration of educational significance achieves the artistic effect of "teaching through entertainment." These features do not exist in isolation but are interwoven and organically unified, collectively serving the comprehensive development of child readers. Through a systematic analysis of the narrative methods, thematic tendencies, and other universal features of children's literature, we can draw the following conclusions: As a special literary genre, the creation and reception of children's literature follow unique rules. In terms of narrative methods, children's literature flexibly employs various techniques such as first-person, third-person, narration, and interactive storytelling to adapt to children's cognitive characteristics and receptive psychology. Among these, the use of the childhood perspective is particularly important, as it enhances the work's sense of realism and intimacy, enabling child readers to develop emotional resonance【1†source】【2†source】. In terms of thematic choices, growth, adventure, friendship, and family constitute the main content of children's literature. These themes not only satisfy children's curiosity and desire to explore but also subtly influence their cognitive development and emotional shaping【3†source】【9†source】. Other universal features such as anthropomorphism, repetitive language, symbolism, and educational significance collectively form the unique artistic style and educational value of children's literature【18†source】【20†source】【24†source】. These characteristics of children's literature do not exist in isolation but are interconnected and organically unified. For example, adventure themes are often combined with third-person omniscient narration to attract child readers through compact plots and vivid descriptions; friendship themes frequently employ first-person narration to enhance emotional resonance; and anthropomorphism is commonly found in nature-themed works, helping children understand the relationship between humans and nature. These features collectively serve the comprehensive development of child readers, meeting their entertainment needs while promoting their cognitive growth and emotional maturity. From an academic research perspective, children's literature studies should emphasize the application of narrative theory, as narrative theory focuses more on the "how" of storytelling—narrative form—which aligns closely with the research focus of children's literature【0†source】. At the same time, cognitive research methods provide new perspectives for children's literature studies. By combining cognitive science with literary theory, we can gain a deeper understanding of how children's literature influences children's thinking and cognitive development【4†source】. Future research should continue to explore the application of these theoretical methods in children's literature studies while paying attention to the intersection and integration of children's literature with emerging fields such as digital media and interdisciplinary education. From a creative practice perspective, children's literature writers should fully grasp children's cognitive characteristics and emotional needs, incorporating growth Care and educational wisdom into their work As experts have pointed out, excellent children's literary works should be grounded in reality, rich in depth, and generate significant infection and infectivity, guiding children to comprehensively understand the world and correctly recognize themselves and society【14†source】. At the same time, children's literature Creativity should keep pace with the times, addressing new problems and challenges faced by contemporary children, such as media literacy in the digital age and identity formation in multicultural contexts, to provide targeted spiritual nourishment for children. From an educational application perspective, children's literature should fully leverage its unique role in children's mental growth. Through carefully designed reading activities, teachers and parents can help children deeply understand the themes and meanings in works, guiding them to connect reading experiences with real life. Research shows that children's literature plays an increasingly important role in language education, the construction of a reading society, and children's mental growth【22†source】. Therefore, children's literature should be incorporated as an important component of school and family education, promoting children's cognitive development and emotional maturity through activities such as reading sharing, role-playing, and creative writing. In summary, as a unique art form and educational medium, the common characteristics of children's literature constitute an organic whole, collectively serving the comprehensive development of child readers. By deeply understanding these features and their mechanisms of influence, we can better create, research, and apply children's literature, providing high-quality spiritual nourishment for children's healthy growth. Future children's literature research should continue to deepen theoretical exploration, expand research methods, and strengthen interdisciplinary collaboration to address the ever-changing needs of children and the challenges of the times, promoting the continuous development of children's literature. GLM-4-32B-0414 supports calling external tools in JSON format. This can be done via HuggingFace Transformers, vLLM, or sgLang. The message format for tool calling is as follows: The message format for tool execution results is as follows: The following example demonstrates the process of GLM-4-32B-0414 calling a tool and generating a final response using HuggingFace Transformers. | 模型 | IFEval | BFCL-v3 (Overall) | BFCL-v3 (MultiTurn) | TAU-Bench (Retail) | TAU-Bench (Airline) | SimpleQA | HotpotQA | | ---------------- | ------ | ----------------- | ------------------- | ------------------ | ------------------- | -------- | -------- | | Qwen2.5-Max | 85.6 | 50.9 | 30.5 | 58.3 | 22.0 | 79.0 | 52.8 | | GPT-4o-1120 | 81.9 | 69.6 | 41.0 | 62.8 | 46.0 | 82.8 | 63.9 | | DeepSeek-V3-0324 | 83.4 | 66.2 | 35.8 | 60.7 | 32.4 | 82.6 | 54.6 | | DeepSeek-R1 | 84.3 | 57.5 | 12.4 | 33.0 | 37.3 | 83.9 | 63.1 | | GLM-4-32B-0414 | 87.6 | 69.6 | 41.5 | 68.7 | 51.2 | 88.1 | 63.8 | > For `SimpleQA` and `HotpotQA`, we sampled nearly 500 test cases from each test set, provided all models with basic `search` and `click` tools, ensured other settings remained consistent, and averaged the results over 3 runs. | Model | Framework | SWE-bench Verified | SWE-bench Verified mini | |---|---|---|---| | GLM-4-32B-0414 | Moatless [1] | 33.8 | 38.0 | | GLM-4-32B-0414 | Agentless [2] | 30.7 | 34.0 | | GLM-4-32B-0414 | OpenHands [3] | 27.2 | 28.0 | [1] Moatless v0.0.3 used the following parameters: `responseformat="react", thoughtsinaction=False, maxinterations=30`. No retries on failed trajectories; other settings are default. [2] Agentless v1.5.0 used BGE as the embedding model and FAISS for similarity search. To speed up patch verification while maintaining performance, the timeout for running a single instance was changed from the default 300s to 180s. [3] OpenHands v0.29.1 did not use YaRN context extension but limited runs to a maximum of 60 iterations and summarized the history to prevent exceeding the 32K context limit. Summarization was configured as `llmconfig="condenser", keepfirst=1, maxsize=32`. No retries on failed trajectories.
CogVideoX-5b
📄 中文阅读 | 🤗 Huggingface Space | 🌐 Github | 📜 arxiv 📍 Visit QingYing and API Platform to experience commercial video generation models. .video-container { display: flex; flex-wrap: wrap; justify-content: space-around; } .video-item { width: 45%; margin-bottom: 20px; transition: transform 0.3s; } .video-item:hover { transform: scale(1.1); } .caption { text-align: center; margin-top: 10px; font-size: 11px; } A garden comes to life as a kaleidoscope of butterflies flutters amidst the blossoms, their delicate wings casting shadows on the petals below. In the background, a grand fountain cascades water with a gentle splendor, its rhythmic sound providing a soothing backdrop. Beneath the cool shade of a mature tree, a solitary wooden chair invites solitude and reflection, its smooth surface worn by the touch of countless visitors seeking a moment of tranquility in nature's embrace. A small boy, head bowed and determination etched on his face, sprints through the torrential downpour as lightning crackles and thunder rumbles in the distance. The relentless rain pounds the ground, creating a chaotic dance of water droplets that mirror the dramatic sky's anger. In the far background, the silhouette of a cozy home beckons, a faint beacon of safety and warmth amidst the fierce weather. The scene is one of perseverance and the unyielding spirit of a child braving the elements. A suited astronaut, with the red dust of Mars clinging to their boots, reaches out to shake hands with an alien being, their skin a shimmering blue, under the pink-tinged sky of the fourth planet. In the background, a sleek silver rocket, a beacon of human ingenuity, stands tall, its engines powered down, as the two representatives of different worlds exchange a historic greeting amidst the desolate beauty of the Martian landscape. An elderly gentleman, with a serene expression, sits at the water's edge, a steaming cup of tea by his side. He is engrossed in his artwork, brush in hand, as he renders an oil painting on a canvas that's propped up against a small, weathered table. The sea breeze whispers through his silver hair, gently billowing his loose-fitting white shirt, while the salty air adds an intangible element to his masterpiece in progress. The scene is one of tranquility and inspiration, with the artist's canvas capturing the vibrant hues of the setting sun reflecting off the tranquil sea. In a dimly lit bar, purplish light bathes the face of a mature man, his eyes blinking thoughtfully as he ponders in close-up, the background artfully blurred to focus on his introspective expression, the ambiance of the bar a mere suggestion of shadows and soft lighting. A golden retriever, sporting sleek black sunglasses, with its lengthy fur flowing in the breeze, sprints playfully across a rooftop terrace, recently refreshed by a light rain. The scene unfolds from a distance, the dog's energetic bounds growing larger as it approaches the camera, its tail wagging with unrestrained joy, while droplets of water glisten on the concrete behind it. The overcast sky provides a dramatic backdrop, emphasizing the vibrant golden coat of the canine as it dashes towards the viewer. On a brilliant sunny day, the lakeshore is lined with an array of willow trees, their slender branches swaying gently in the soft breeze. The tranquil surface of the lake reflects the clear blue sky, while several elegant swans glide gracefully through the still water, leaving behind delicate ripples that disturb the mirror-like quality of the lake. The scene is one of serene beauty, with the willows' greenery providing a picturesque frame for the peaceful avian visitors. A Chinese mother, draped in a soft, pastel-colored robe, gently rocks back and forth in a cozy rocking chair positioned in the tranquil setting of a nursery. The dimly lit bedroom is adorned with whimsical mobiles dangling from the ceiling, casting shadows that dance on the walls. Her baby, swaddled in a delicate, patterned blanket, rests against her chest, the child's earlier cries now replaced by contented coos as the mother's soothing voice lulls the little one to sleep. The scent of lavender fills the air, adding to the serene atmosphere, while a warm, orange glow from a nearby nightlight illuminates the scene with a gentle hue, capturing a moment of tender love and comfort. CogVideoX is an open-source version of the video generation model originating from QingYing. The table below displays the list of video generation models we currently offer, along with their foundational information. Model Name CogVideoX-2B CogVideoX-5B (This Repository) Model Description Entry-level model, balancing compatibility. Low cost for running and secondary development. Larger model with higher video generation quality and better visual effects. Inference Precision FP16 (Recommended) , BF16, FP32, FP8, INT8, no support for INT4 BF16 (Recommended) , FP16, FP32, FP8, INT8, no support for INT4 Single GPU VRAM Consumption SAT FP16: 18GB diffusers FP16: starting from 4GB diffusers INT8(torchao): starting from 3.6GB SAT BF16: 26GB diffusers BF16: starting from 5GB diffusers INT8(torchao): starting from 4.4GB Multi-GPU Inference VRAM Consumption FP16: 10GB using diffusers BF16: 15GB using diffusers Inference Speed (Step = 50, FP/BF16) Single A100: ~90 seconds Single H100: ~45 seconds Single A100: ~180 seconds Single H100: ~90 seconds Fine-tuning VRAM Consumption (per GPU) 47 GB (bs=1, LORA) 61 GB (bs=2, LORA) 62GB (bs=1, SFT) 63 GB (bs=1, LORA) 80 GB (bs=2, LORA) 75GB (bs=1, SFT) Video Resolution 720 x 480, no support for other resolutions (including fine-tuning) Positional Encoding 3dsincosposembed 3dropeposembed + When testing using the `diffusers` library, all optimizations provided by the `diffusers` library were enabled. This solution has not been tested for actual VRAM/memory usage on devices other than NVIDIA A100 / H100. Generally, this solution can be adapted to all devices with NVIDIA Ampere architecture and above. If the optimizations are disabled, VRAM usage will increase significantly, with peak VRAM usage being about 3 times higher than the table shows. However, speed will increase by 3-4 times. You can selectively disable some optimizations, including: + When performing multi-GPU inference, the `enablemodelcpuoffload()` optimization needs to be disabled. + Using INT8 models will reduce inference speed. This is to ensure that GPUs with lower VRAM can perform inference normally while maintaining minimal video quality loss, though inference speed will decrease significantly. + The 2B model is trained with `FP16` precision, and the 5B model is trained with `BF16` precision. We recommend using the precision the model was trained with for inference. + PytorchAO and Optimum-quanto can be used to quantize the text encoder, Transformer, and VAE modules to reduce CogVideoX's memory requirements. This makes it possible to run the model on a free T4 Colab or GPUs with smaller VRAM! It is also worth noting that TorchAO quantization is fully compatible with `torch.compile`, which can significantly improve inference speed. `FP8` precision must be used on devices with `NVIDIA H100` or above, which requires installing the `torch`, `torchao`, `diffusers`, and `accelerate` Python packages from source. `CUDA 12.4` is recommended. + The inference speed test also used the above VRAM optimization scheme. Without VRAM optimization, inference speed increases by about 10%. Only the `diffusers` version of the model supports quantization. + The model only supports English input; other languages can be translated into English during refinement by a large model. + Using SAT for inference and fine-tuning of SAT version models. Feel free to visit our GitHub for more information. This model supports deployment using the huggingface diffusers library. You can deploy it by following these steps. We recommend that you visit our GitHub and check out the relevant prompt optimizations and conversions to get a better experience. PytorchAO and Optimum-quanto can be used to quantize the Text Encoder, Transformer and VAE modules to lower the memory requirement of CogVideoX. This makes it possible to run the model on free-tier T4 Colab or smaller VRAM GPUs as well! It is also worth noting that TorchAO quantization is fully compatible with `torch.compile`, which allows for much faster inference speed. Additionally, the models can be serialized and stored in a quantized datatype to save disk space when using PytorchAO. Find examples and benchmarks at these links: 1. More detailed technical details and code explanation. 2. Optimization and conversion of prompt words. 3. Reasoning and fine-tuning of SAT version models, and even pre-release. 4. Project update log dynamics, more interactive opportunities. 5. CogVideoX toolchain to help you better use the model. 6. INT8 model inference code support. This model is released under the CogVideoX LICENSE.
GLM-4.5V
This model is part of the GLM-V family of models, introduced in the paper GLM-4.1V-Thinking and GLM-4.5V: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning. - Paper: https...
CogView4-6B
CogVideoX-5b-I2V
📄 Read in English | 🤗 Huggingface Space | 🌐 Github | 📜 arxiv 📍 Visit Qingying and API Platform for the commercial version of the video generation model CogVideoX is an open-source video generation model originating from Qingying. The table below presents information related to the video generation models we offer in this version. Model Name CogVideoX-2B CogVideoX-5B CogVideoX-5B-I2V (This Repository) Model Description Entry-level model, balancing compatibility. Low cost for running and secondary development. Larger model with higher video generation quality and better visual effects. CogVideoX-5B image-to-video version. Inference Precision FP16(recommended) , BF16, FP32, FP8, INT8, not supported: INT4 BF16 (recommended) , FP16, FP32, FP8, INT8, not supported: INT4 Single GPU Memory Usage SAT FP16: 18GB diffusers FP16: from 4GB diffusers INT8 (torchao): from 3.6GB SAT BF16: 26GB diffusers BF16: from 5GB diffusers INT8 (torchao): from 4.4GB Multi-GPU Inference Memory Usage FP16: 10GB using diffusers BF16: 15GB using diffusers Inference Speed (Step = 50, FP/BF16) Single A100: ~90 seconds Single H100: ~45 seconds Single A100: ~180 seconds Single H100: ~90 seconds Fine-tuning Memory Usage 47 GB (bs=1, LORA) 61 GB (bs=2, LORA) 62GB (bs=1, SFT) 63 GB (bs=1, LORA) 80 GB (bs=2, LORA) 75GB (bs=1, SFT) 78 GB (bs=1, LORA) 75GB (bs=1, SFT, 16GPU) Video Resolution 720 x 480, no support for other resolutions (including fine-tuning) Position Embedding 3dsincosposembed 3dropeposembed 3dropeposembed + learnableposembed + While testing using the diffusers library, all optimizations included in the diffusers library were enabled. This scheme has not been tested for actual memory usage on devices outside of NVIDIA A100 / H100 architectures. Generally, this scheme can be adapted to all NVIDIA Ampere architecture and above devices. If optimizations are disabled, memory consumption will multiply, with peak memory usage being about 3 times the value in the table. However, speed will increase by about 3-4 times. You can selectively disable some optimizations, including: + For multi-GPU inference, the `enablesequentialcpuoffload()` optimization needs to be disabled. + Using INT8 models will slow down inference, which is done to accommodate lower-memory GPUs while maintaining minimal video quality loss, though inference speed will significantly decrease. + The CogVideoX-2B model was trained in `FP16` precision, and all CogVideoX-5B models were trained in `BF16` precision. We recommend using the precision in which the model was trained for inference. + PytorchAO and Optimum-quanto can be used to quantize the text encoder, transformer, and VAE modules to reduce the memory requirements of CogVideoX. This allows the model to run on free T4 Colabs or GPUs with smaller memory! Also, note that TorchAO quantization is fully compatible with `torch.compile`, which can significantly improve inference speed. FP8 precision must be used on devices with NVIDIA H100 and above, requiring source installation of `torch`, `torchao`, `diffusers`, and `accelerate` Python packages. CUDA 12.4 is recommended. + The inference speed tests also used the above memory optimization scheme. Without memory optimization, inference speed increases by about 10%. Only the `diffusers` version of the model supports quantization. + The model only supports English input; other languages can be translated into English for use via large model refinement. + The memory usage of model fine-tuning is tested in an `8 H100` environment, and the program automatically uses `Zero 2` optimization. If a specific number of GPUs is marked in the table, that number or more GPUs must be used for fine-tuning. + Use SAT for inference and fine-tuning SAT version models. Feel free to visit our GitHub for more details. This model supports deployment using the Hugging Face diffusers library. You can follow the steps below to get started. We recommend that you visit our GitHub to check out prompt optimization and conversion to get a better experience. PytorchAO and Optimum-quanto can be used to quantize the text encoder, transformer, and VAE modules to reduce CogVideoX's memory requirements. This allows the model to run on free T4 Colab or GPUs with lower VRAM! Also, note that TorchAO quantization is fully compatible with `torch.compile`, which can significantly accelerate inference. Additionally, these models can be serialized and stored using PytorchAO in quantized data types to save disk space. You can find examples and benchmarks at the following links: 1. More detailed technical explanations and code. 2. Optimized prompt examples and conversions. 3. Detailed code for model inference and fine-tuning. 4. Project update logs and more interactive opportunities. 5. CogVideoX toolchain to help you better use the model. 6. INT8 model inference code. This model is released under the CogVideoX LICENSE.
GLM-5.1
GLM-4.5
Generates text in English, Chinese, and Arabic with 20 billion parameters. Trained on diverse datasets to support conversational tasks and text generation.
AutoGLM-Phone-9B-Multilingual
glm-4-9b-chat
2024/11/25, 我们建议使用从 `transformers>=4.46.0` 开始,使用 glm-4-9b-chat-hf 以减少后续 transformers 升级导致的兼容性问题。 2024/08/12, 本仓库代码已更新并使用 `transformers>=4.44.0`, 请及时更新依赖。 2024/07/24,我们发布了与长文本相关的最新技术解读,关注 这里 查看我们在训练 GLM-4-9B 开源模型中关于长文本技术的技术报告 模型介绍 GLM-4-9B 是智谱 AI 推出的最新一代预训练模型 GLM-4 系列中的开源版本。 在语义、数学、推理、代码和知识等多方面的数据集测评中,GLM-4-9B 及其人类偏好对齐的版本 GLM-4-9B-Chat 均表现出较高的性能。 除了能进行多轮对话,GLM-4-9B-Chat 还具备网页浏览、代码执行、自定义工具调用(Function Call)和长文本推理(支持最大 128K 上下文)等高级功能。 本代模型增加了多语言支持,支持包括日语,韩语,德语在内的 26 种语言。我们还推出了支持 1M 上下文长度(约 200 万中文字符)的模型。 | Model | AlignBench-v2 | MT-Bench | IFEval | MMLU | C-Eval | GSM8K | MATH | HumanEval | NCB | |:--------------------|:-------------:|:--------:|:------:|:----:|:------:|:-----:|:----:|:---------:|:----:| | Llama-3-8B-Instruct | 5.12 | 8.00 | 68.58 | 68.4 | 51.3 | 79.6 | 30.0 | 62.2 | 24.7 | | ChatGLM3-6B | 3.97 | 5.50 | 28.1 | 66.4 | 69.0 | 72.3 | 25.7 | 58.5 | 11.3 | | GLM-4-9B-Chat | 6.61 | 8.35 | 69.0 | 72.4 | 75.6 | 79.6 | 50.6 | 71.8 | 32.2 | 在六个多语言数据集上对 GLM-4-9B-Chat 和 Llama-3-8B-Instruct 进行了测试,测试结果及数据集对应选取语言如下表 | Dataset | Llama-3-8B-Instruct | GLM-4-9B-Chat | Languages |:------------|:-------------------:|:-------------:|:----------------------------------------------------------------------------------------------:| | M-MMLU | 49.6 | 56.6 | all | FLORES | 25.0 | 28.8 | ru, es, de, fr, it, pt, pl, ja, nl, ar, tr, cs, vi, fa, hu, el, ro, sv, uk, fi, ko, da, bg, no | MGSM | 54.0 | 65.3 | zh, en, bn, de, es, fr, ja, ru, sw, te, th | XWinograd | 61.7 | 73.1 | zh, en, fr, jp, ru, pt | XStoryCloze | 84.7 | 90.7 | zh, en, ar, es, eu, hi, id, my, ru, sw, te | XCOPA | 73.3 | 80.1 | zh, et, ht, id, it, qu, sw, ta, th, tr, vi 我们在 Berkeley Function Calling Leaderboard 上进行了测试并得到了以下结果: | Model | Overall Acc. | AST Summary | Exec Summary | Relevance | |:-----------------------|:------------:|:-----------:|:------------:|:---------:| | Llama-3-8B-Instruct | 58.88 | 59.25 | 70.01 | 45.83 | | gpt-4-turbo-2024-04-09 | 81.24 | 82.14 | 78.61 | 88.75 | | ChatGLM3-6B | 57.88 | 62.18 | 69.78 | 5.42 | | GLM-4-9B-Chat | 81.00 | 80.26 | 84.40 | 87.92 |
glm-4-voice-tokenizer
GLM-4-Voice 是智谱 AI 推出的端到端语音模型。GLM-4-Voice 能够直接理解和生成中英文语音,进行实时语音对话,并且能够根据用户的指令改变语音的情感、语调、语速、方言等属性。 GLM-4-Voice is an end-to-end voice model launched by Zhipu AI. GLM-4-Voice can directly understand and generate Chinese and English speech, engage in real-time voice conversations, and change attributes such as emotion, intonation, speech rate, and dialect based on user instructions. 本仓库是 GLM-4-Voice 的 speech tokenizer 部分。通过在 Whisper 的 encoder 部分增加 vector quantization 进行训练,将连续的语音输入转化为离散的 token。每秒音频转化为 12.5 个离散 token。 The repo provides the speech tokenzier of GLM-4-Voice, which is trained by adding vector quantization to the encoder part of Whisper and converts continuous speech input into discrete tokens. Each second of audio is converted into 12.5 discrete tokens. For more information please refer to our repo GLM-4-Voice.
GLM-4-9B-0414
GLM-4.5-FP8
📍 Use GLM-4.5 API services on Z.ai API Platform (Global) or Zhipu AI Open Platform (Mainland China) . We present GLM-4.5, an open-source Mixture-of-Experts (MoE) large language model with 355B total parameters and 32B activated parameters, featuring a hybrid reasoning method that supports both thinking and direct response modes. Through multi-stage training on 23T tokens and comprehensive post-training with expert model iteration and reinforcement learning, GLM-4.5 achieves strong performance across agentic, reasoning, and coding (ARC) tasks, scoring 70.1% on TAU-Bench, 91.0% on AIME 24, and 64.2% on SWE-bench Verified. With much fewer parameters than several competitors, GLM-4.5 ranks 3rd overall among all evaluated models and 2nd on agentic benchmarks. We release both GLM-4.5 (355B parameters) and a compact version, GLM-4.5-Air (106B parameters), to advance research in reasoning and agentic AI systems. Code, models, and more information are available at this https URL . The GLM-4.5 series models are foundation models designed for intelligent agents. GLM-4.5 has 355 billion total parameters with 32 billion active parameters, while GLM-4.5-Air adopts a more compact design with 106 billion total parameters and 12 billion active parameters. GLM-4.5 models unify reasoning, coding, and intelligent agent capabilities to meet the complex demands of intelligent agent applications. Both GLM-4.5 and GLM-4.5-Air are hybrid reasoning models that provide two modes: thinking mode for complex reasoning and tool usage, and non-thinking mode for immediate responses. We have open-sourced the base models, hybrid reasoning models, and FP8 versions of the hybrid reasoning models for both GLM-4.5 and GLM-4.5-Air. They are released under the MIT open-source license and can be used commercially and for secondary development. As demonstrated in our comprehensive evaluation across 12 industry-standard benchmarks, GLM-4.5 achieves exceptional performance with a score of 63.2, in the 3rd place among all the proprietary and open-source models. Notably, GLM-4.5-Air delivers competitive results at 59.8 while maintaining superior efficiency. For more eval results, show cases, and technical details, please visit our technical blog or refer to the technical report (paper). The model code, tool parser and reasoning parser can be found in the implementation of transformers, vLLM and SGLang. You can directly experience the model on Hugging Face or ModelScope or download the model by following the links below. | Model | Download Links | Model Size | Precision | |------------------|-----------------------------------------------------------------------------------------------------------------------------------------------|------------|-----------| | GLM-4.5 | 🤗 Hugging Face 🤖 ModelScope | 355B-A32B | BF16 | | GLM-4.5-Air | 🤗 Hugging Face 🤖 ModelScope | 106B-A12B | BF16 | | GLM-4.5-FP8 | 🤗 Hugging Face 🤖 ModelScope | 355B-A32B | FP8 | | GLM-4.5-Air-FP8 | 🤗 Hugging Face 🤖 ModelScope | 106B-A12B | FP8 | | GLM-4.5-Base | 🤗 Hugging Face 🤖 ModelScope | 355B-A32B | BF16 | | GLM-4.5-Air-Base | 🤗 Hugging Face 🤖 ModelScope | 106B-A12B | BF16 | We provide minimum and recommended configurations for "full-featured" model inference. The data in the table below is based on the following conditions: 1. All models use MTP layers and specify `--speculative-num-steps 3 --speculative-eagle-topk 1 --speculative-num-draft-tokens 4` to ensure competitive inference speed. 2. The `cpu-offload` parameter is not used. 3. Inference batch size does not exceed `8`. 4. All are executed on devices that natively support FP8 inference, ensuring both weights and cache are in FP8 format. 5. Server memory must exceed `1T` to ensure normal model loading and operation. The models can run under the configurations in the table below: | Model | Precision | GPU Type and Count | Test Framework | |-------------|-----------|----------------------|----------------| | GLM-4.5 | BF16 | H100 x 16 / H200 x 8 | sglang | | GLM-4.5 | FP8 | H100 x 8 / H200 x 4 | sglang | | GLM-4.5-Air | BF16 | H100 x 4 / H200 x 2 | sglang | | GLM-4.5-Air | FP8 | H100 x 2 / H200 x 1 | sglang | Under the configurations in the table below, the models can utilize their full 128K context length: | Model | Precision | GPU Type and Count | Test Framework | |-------------|-----------|-----------------------|----------------| | GLM-4.5 | BF16 | H100 x 32 / H200 x 16 | sglang | | GLM-4.5 | FP8 | H100 x 16 / H200 x 8 | sglang | | GLM-4.5-Air | BF16 | H100 x 8 / H200 x 4 | sglang | | GLM-4.5-Air | FP8 | H100 x 4 / H200 x 2 | sglang |\ The code can run under the configurations in the table below using Llama Factory: | Model | GPU Type and Count | Strategy | Batch Size (per GPU) | |-------------|--------------------|----------|----------------------| | GLM-4.5 | H100 x 16 | Lora | 1 | | GLM-4.5-Air | H100 x 4 | Lora | 1 | The code can run under the configurations in the table below using Swift: | Model | GPU Type and Count | Strategy | Batch Size (per GPU) | |-------------|--------------------|----------|----------------------| | GLM-4.5 | H20 (96GiB) x 16 | Lora | 1 | | GLM-4.5-Air | H20 (96GiB) x 4 | Lora | 1 | | GLM-4.5 | H20 (96GiB) x 128 | SFT | 1 | | GLM-4.5-Air | H20 (96GiB) x 32 | SFT | 1 | | GLM-4.5 | H20 (96GiB) x 128 | RL | 1 | | GLM-4.5-Air | H20 (96GiB) x 32 | RL | 1 | For more comprehensive details and setup instructions, please refer to our GitHub page. Here is a basic example to run inference with the `transformers` library, demonstrating both thinking and non-thinking modes: + Both BF16 and FP8 can be started with the following code: If you're using 8x H100 GPUs and encounter insufficient memory when running the GLM-4.5 model, you'll need `--cpu-offload-gb 16` (only applicable to vLLM). If you encounter `flash infer` issues, use `VLLMATTENTIONBACKEND=XFORMERS` as a temporary replacement. You can also specify `TORCHCUDAARCHLIST='9.0+PTX'` to use `flash infer` (different GPUs have different TORCHCUDAARCHLIST values, please check accordingly). + When using `vLLM` and `SGLang`, thinking mode is enabled by default when sending requests. If you want to disable the thinking switch, you need to add the `extrabody={"chattemplatekwargs": {"enablethinking": False}}` parameter. + Both support tool calling. Please use OpenAI-style tool description format for calls. + For specific code, please refer to `apirequest.py` in the `inference` folder. Citation If you find our work useful or helpful for your R&D works, please feel free to cite our paper as below.
GLM-4.1V-9B-Base
📍 Using GLM-4.1V-9B-Thinking API at Zhipu Foundation Model Open Platform Vision-Language Models (VLMs) have become foundational components of intelligent systems. As real-world AI tasks grow increasingly complex, VLMs must evolve beyond basic multimodal perception to enhance their reasoning capabilities in complex tasks. This involves improving accuracy, comprehensiveness, and intelligence, enabling applications such as complex problem solving, long-context understanding, and multimodal agents. Based on the GLM-4-9B-0414 foundation model, we present the new open-source VLM model GLM-4.1V-9B-Thinking, designed to explore the upper limits of reasoning in vision-language models. By introducing a "thinking paradigm" and leveraging reinforcement learning, the model significantly enhances its capabilities. It achieves state-of-the-art performance among 10B-parameter VLMs, matching or even surpassing the 72B-parameter Qwen-2.5-VL-72B on 18 benchmark tasks. We are also open-sourcing the base model GLM-4.1V-9B-Base to support further research into the boundaries of VLM capabilities. Compared to the previous generation models CogVLM2 and the GLM-4V series, GLM-4.1V-Thinking offers the following improvements: 1. The first reasoning-focused model in the series, achieving world-leading performance not only in mathematics but also across various sub-domains. 2. Supports 64k context length. 3. Handles arbitrary aspect ratios and up to 4K image resolution. 4. Provides an open-source version supporting both Chinese and English bilingual usage. By incorporating the Chain-of-Thought reasoning paradigm, GLM-4.1V-9B-Thinking significantly improves answer accuracy, richness, and interpretability. It comprehensively surpasses traditional non-reasoning visual models. Out of 28 benchmark tasks, it achieved the best performance among 10B-level models on 23 tasks, and even outperformed the 72B-parameter Qwen-2.5-VL-72B on 18 tasks. For video reasoning, web demo deployment, and more code, please check our GitHub.
glm-4-9b-chat-hf
If you are using the weights from this repository, please update to These weights are not compatible with older versions of the transformers library. GLM-4-9B is the open-source version of the latest generation of pre-trained models in the GLM-4 series launched by Zhipu AI. In the evaluation of data sets in semantics, mathematics, reasoning, code, and knowledge, GLM-4-9B and its human preference-aligned version GLM-4-9B-Chat have shown superior performance beyond Llama-3-8B. In addition to multi-round conversations, GLM-4-9B-Chat also has advanced features such as web browsing, code execution, custom tool calls (Function Call), and long context reasoning (supporting up to 128K context). This generation of models has added multi-language support, supporting 26 languages including Japanese, Korean, and German. We have also launched the GLM-4-9B-Chat-1M model that supports 1M context length (about 2 million Chinese characters) and the multimodal model GLM-4V-9B based on GLM-4-9B. GLM-4V-9B possesses dialogue capabilities in both Chinese and English at a high resolution of 11201120. In various multimodal evaluations, including comprehensive abilities in Chinese and English, perception & reasoning, text recognition, and chart understanding, GLM-4V-9B demonstrates superior performance compared to GPT-4-turbo-2024-04-09, Gemini 1.0 Pro, Qwen-VL-Max, and Claude 3 Opus. We evaluated the GLM-4-9B-Chat model on some classic tasks and obtained the following results: | Model | AlignBench-v2 | MT-Bench | IFEval | MMLU | C-Eval | GSM8K | MATH | HumanEval | NCB | |:--------------------|:-------------:|:--------:|:------:|:----:|:------:|:-----:|:----:|:---------:|:----:| | Llama-3-8B-Instruct | 5.12 | 8.00 | 68.58 | 68.4 | 51.3 | 79.6 | 30.0 | 62.2 | 24.7 | | ChatGLM3-6B | 3.97 | 5.50 | 28.1 | 66.4 | 69.0 | 72.3 | 25.7 | 58.5 | 11.3 | | GLM-4-9B-Chat | 6.61 | 8.35 | 69.0 | 72.4 | 75.6 | 79.6 | 50.6 | 71.8 | 32.2 | The evalneedle experiment was conducted with a context length of 1M, and the results are as follows: The long text capability was further evaluated on LongBench, and the results are as follows: The tests for GLM-4-9B-Chat and Llama-3-8B-Instruct are conducted on six multilingual datasets. The test results and the corresponding languages selected for each dataset are shown in the table below: | Dataset | Llama-3-8B-Instruct | GLM-4-9B-Chat | Languages | |:------------|:-------------------:|:-------------:|:----------------------------------------------------------------------------------------------:| | M-MMLU | 49.6 | 56.6 | all | | FLORES | 25.0 | 28.8 | ru, es, de, fr, it, pt, pl, ja, nl, ar, tr, cs, vi, fa, hu, el, ro, sv, uk, fi, ko, da, bg, no | | MGSM | 54.0 | 65.3 | zh, en, bn, de, es, fr, ja, ru, sw, te, th | | XWinograd | 61.7 | 73.1 | zh, en, fr, jp, ru, pt | | XStoryCloze | 84.7 | 90.7 | zh, en, ar, es, eu, hi, id, my, ru, sw, te | | XCOPA | 73.3 | 80.1 | zh, et, ht, id, it, qu, sw, ta, th, tr, vi | | Model | Overall Acc. | AST Summary | Exec Summary | Relevance | |:-----------------------|:------------:|:-----------:|:------------:|:---------:| | Llama-3-8B-Instruct | 58.88 | 59.25 | 70.01 | 45.83 | | gpt-4-turbo-2024-04-09 | 81.24 | 82.14 | 78.61 | 88.75 | | ChatGLM3-6B | 57.88 | 62.18 | 69.78 | 5.42 | | GLM-4-9B-Chat | 81.00 | 80.26 | 84.40 | 87.92 | This repository is the model repository of GLM-4-9B-Chat, supporting `128K` context length. For more inference code and requirements, please visit our github page. Please strictly follow the dependencies to install, otherwise it will not run properly Transformers Lib(4.46.0 and later version) for inference: The weights of the GLM-4 model are available under the terms of LICENSE. If you find our work useful, please consider citing the following paper.
CogVideoX1.5-5B
📄 中文阅读 | 🤗 Huggingface Space | 🌐 Github | 📜 arxiv 📍 Visit QingYing and API Platform to experience larger-scale commercial video generation models. CogVideoX is an open-source video generation model similar to QingYing. The table below displays the list of video generation models we currently offer, along with their foundational information. Model Name CogVideoX1.5-5B (Latest) CogVideoX1.5-5B-I2V (Latest) CogVideoX-2B CogVideoX-5B CogVideoX-5B-I2V Release Date November 8, 2024 November 8, 2024 August 6, 2024 August 27, 2024 September 19, 2024 Video Resolution 1360 768 Min(W, H) = 768 768 ≤ Max(W, H) ≤ 1360 Max(W, H) % 16 = 0 720 480 Number of Frames Should be 16N + 1 where N Should be 8N + 1 where N Inference Precision BF16 (Recommended) , FP16, FP32, FP8, INT8, Not supported: INT4 FP16(Recommended) , BF16, FP32, FP8, INT8, Not supported: INT4 BF16 (Recommended) , FP16, FP32, FP8, INT8, Not supported: INT4 Single GPU Memory Usage SAT BF16: 76GB diffusers BF16: from 10GB diffusers INT8(torchao): from 7GB SAT FP16: 18GB diffusers FP16: 4GB minimum diffusers INT8 (torchao): 3.6GB minimum SAT BF16: 26GB diffusers BF16 : 5GB minimum diffusers INT8 (torchao): 4.4GB minimum Multi-GPU Memory Usage BF16: 24GB using diffusers FP16: 10GB using diffusers BF16: 15GB using diffusers Inference Speed (Step = 50, FP/BF16) Single A100: ~1000 seconds (5-second video) Single H100: ~550 seconds (5-second video) Single A100: ~90 seconds Single H100: ~45 seconds Single A100: ~180 seconds Single H100: ~90 seconds Position Encoding 3dropeposembed 3dsincosposembed 3dropeposembed 3dropeposembed + learnableposembed Download Link (Diffusers) 🤗 HuggingFace 🤖 ModelScope 🟣 WiseModel 🤗 HuggingFace 🤖 ModelScope 🟣 WiseModel 🤗 HuggingFace 🤖 ModelScope 🟣 WiseModel 🤗 HuggingFace 🤖 ModelScope 🟣 WiseModel 🤗 HuggingFace 🤖 ModelScope 🟣 WiseModel Download Link (SAT) 🤗 HuggingFace 🤖 ModelScope 🟣 WiseModel SAT + Testing with the `diffusers` library enabled all optimizations included in the library. This scheme has not been tested on non-NVIDIA A100/H100 devices. It should generally work with all NVIDIA Ampere architecture or higher devices. Disabling optimizations can triple VRAM usage but increase speed by 3-4 times. You can selectively disable certain optimizations, including: + In multi-GPU inference, `enablesequentialcpuoffload()` optimization needs to be disabled. + Using an INT8 model reduces inference speed, meeting the requirements of lower VRAM GPUs while retaining minimal video quality degradation, at the cost of significant speed reduction. + PytorchAO and Optimum-quanto can be used to quantize the text encoder, Transformer, and VAE modules, reducing CogVideoX’s memory requirements, making it feasible to run the model on smaller VRAM GPUs. TorchAO quantization is fully compatible with `torch.compile`, significantly improving inference speed. `FP8` precision is required for NVIDIA H100 and above, which requires source installation of `torch`, `torchao`, `diffusers`, and `accelerate`. Using `CUDA 12.4` is recommended. + Inference speed testing also used the above VRAM optimizations, and without optimizations, speed increases by about 10%. Only `diffusers` versions of models support quantization. + Models support English input only; other languages should be translated into English during prompt crafting with a larger model. + Use SAT for inference and fine-tuning SAT version models. Check our GitHub for more details. This model supports deployment using the Hugging Face diffusers library. You can follow the steps below to get started. We recommend that you visit our GitHub to check out prompt optimization and conversion to get a better experience. PytorchAO and Optimum-quanto can be used to quantize the text encoder, transformer, and VAE modules to reduce CogVideoX's memory requirements. This allows the model to run on free T4 Colab or GPUs with lower VRAM! Also, note that TorchAO quantization is fully compatible with `torch.compile`, which can significantly accelerate inference. Additionally, these models can be serialized and stored using PytorchAO in quantized data types to save disk space. You can find examples and benchmarks at the following links: 1. More detailed technical explanations and code. 2. Optimized prompt examples and conversions. 3. Detailed code for model inference and fine-tuning. 4. Project update logs and more interactive opportunities. 5. CogVideoX toolchain to help you better use the model. 6. INT8 model inference code. This model is released under the CogVideoX LICENSE.
GLM-4.6V-Flash
glm-4-9b-chat-1m
2024/08/12, 本仓库代码已更新并使用 `transforemrs>=4.44.0`, 请及时更新依赖。 2024/07/24,我们发布了与长文本相关的最新技术解读,关注 这里 查看我们在训练 GLM-4-9B 开源模型中关于长文本技术的技术报告 GLM-4-9B 是智谱 AI 推出的最新一代预训练模型 GLM-4 系列中的开源版本。 在语义、数学、推理、代码和知识等多方面的数据集测评中,GLM-4-9B 及其人类偏好对齐的版本 GLM-4-9B-Chat 均表现出较高的性能。 除了能进行多轮对话,GLM-4-9B-Chat 还具备网页浏览、代码执行、自定义工具调用(Function Call)和长文本推理(支持最大 128K 上下文)等高级功能。 本代模型增加了多语言支持,支持包括日语,韩语,德语在内的 26 种语言。我们还推出了支持 1M 上下文长度(约 200 万中文字符)的模型。
glm-4-9b
2024/08/12, 本仓库代码已更新并使用 `transformers>=4.44.0`, 请及时更新依赖。 GLM-4-9B 是智谱 AI 推出的最新一代预训练模型 GLM-4 系列中的开源版本。 在语义、数学、推理、代码和知识等多方面的数据集测评中, GLM-4-9B 及其人类偏好对齐的版本 GLM-4-9B-Chat 均表现出超越 Llama-3-8B 的卓越性能。除了能进行多轮对话,GLM-4-9B-Chat 还具备网页浏览、代码执行、自定义工具调用(Function Call)和长文本推理(支持最大 128K 上下文)等高级功能。本代模型增加了多语言支持,支持包括日语,韩语,德语在内的 26 种语言。我们还推出了支持 1M 上下文长度(约 200 万中文字符)的 GLM-4-9B-Chat-1M 模型和基于 GLM-4-9B 的多模态模型 GLM-4V-9B。GLM-4V-9B 具备 1120 1120 高分辨率下的中英双语多轮对话能力,在中英文综合能力、感知推理、文字识别、图表理解等多方面多模态评测中,GLM-4V-9B 表现出超越 GPT-4-turbo-2024-04-09、Gemini 1.0 Pro、Qwen-VL-Max 和 Claude 3 Opus 的卓越性能。 | Model | MMLU | C-Eval | GPQA | GSM8K | MATH | HumanEval | |:--------------------|:--------:|:--------:|:--------:|:--------:|:--------:|:---------:| | Llama-3-8B | 66.6 | 51.2 | - | 45.8 | - | - | | Llama-3-8B-Instruct | 68.4 | 51.3 | 34.2 | 79.6 | 30.0 | 62.2 | | ChatGLM3-6B-Base | 61.4 | 69.0 | - | 72.3 | 25.7 | - | | GLM-4-9B | 74.7 | 77.1 | 34.3 | 84.0 | 30.4 | 70.1 |
GLM-4.5-Air-Base
📖 Check out the GLM-4.5 technical blog , technical report , and Zhipu AI technical documentation . 📍 Use GLM-4.5 API services on Zhipu AI Open Platform . The GLM-4.5 series models are foundation models designed for intelligent agents. GLM-4.5 has 355 billion total parameters with 32 billion active parameters, while GLM-4.5-Air adopts a more compact design with 106 billion total parameters and 12 billion active parameters. GLM-4.5 models unify reasoning, coding, and intelligent agent capabilities to meet the complex demands of intelligent agent applications. Both GLM-4.5 and GLM-4.5-Air are hybrid reasoning models that provide two modes: thinking mode for complex reasoning and tool usage, and non-thinking mode for immediate responses. We have open-sourced the base models, hybrid reasoning models, and FP8 versions of the hybrid reasoning models for both GLM-4.5 and GLM-4.5-Air. They are released under the MIT open-source license and can be used commercially and for secondary development. As demonstrated in our comprehensive evaluation across 12 industry-standard benchmarks, GLM-4.5 achieves exceptional performance with a score of 63.2, in the 3rd place among all the proprietary and open-source models. Notably, GLM-4.5-Air delivers competitive results at 59.8 while maintaining superior efficiency. For more eval results, show cases, and technical details, please visit our technical blog. The technical report will be released soon. The model code, tool parser and reasoning parser can be found in the implementation of transformers, vLLM and SGLang.
CogVideoX1.5-5B-I2V
📄 中文阅读 | 🤗 Huggingface Space | 🌐 Github | 📜 arxiv 📍 Visit Qingying and the API Platform to experience the commercial video generation model CogVideoX is an open-source video generation model similar to QingYing. Below is a table listing information on the video generation models available in this generation: Model Name CogVideoX1.5-5B CogVideoX1.5-5B-I2V (Current Repository) Video Resolution 1360 768 Min(W, H) = 768 768 ≤ Max(W, H) ≤ 1360 Max(W, H) % 16 = 0 Inference Precision BF16 (recommended) , FP16, FP32, FP8, INT8, not supported INT4 Single GPU Inference Memory Consumption BF16: 9GB minimum Multi-GPU Inference Memory Consumption BF16: 24GB using diffusers Inference Speed (Step = 50, FP/BF16) Single A100: ~1000 seconds (5-second video) Single H100: ~550 seconds (5-second video) + Testing with the `diffusers` library enabled all optimizations included in the library. This scheme has not been tested on non-NVIDIA A100/H100 devices. It should generally work with all NVIDIA Ampere architecture or higher devices. Disabling optimizations can triple VRAM usage but increase speed by 3-4 times. You can selectively disable certain optimizations, including: + In multi-GPU inference, `enablesequentialcpuoffload()` optimization needs to be disabled. + Using an INT8 model reduces inference speed, meeting the requirements of lower VRAM GPUs while retaining minimal video quality degradation, at the cost of significant speed reduction. + PytorchAO and Optimum-quanto can be used to quantize the text encoder, Transformer, and VAE modules, reducing CogVideoX’s memory requirements, making it feasible to run the model on smaller VRAM GPUs. TorchAO quantization is fully compatible with `torch.compile`, significantly improving inference speed. `FP8` precision is required for NVIDIA H100 and above, which requires source installation of `torch`, `torchao`, `diffusers`, and `accelerate`. Using `CUDA 12.4` is recommended. + Inference speed testing also used the above VRAM optimizations, and without optimizations, speed increases by about 10%. Only `diffusers` versions of models support quantization. + Models support English input only; other languages should be translated into English during prompt crafting with a larger model. + Use SAT for inference and fine-tuning SAT version models. Check our GitHub for more details. This model supports deployment using the Hugging Face diffusers library. You can follow the steps below to get started. We recommend that you visit our GitHub to check out prompt optimization and conversion to get a better experience. PytorchAO and Optimum-quanto can be used to quantize the text encoder, transformer, and VAE modules to reduce CogVideoX's memory requirements. This allows the model to run on free T4 Colab or GPUs with lower VRAM! Also, note that TorchAO quantization is fully compatible with `torch.compile`, which can significantly accelerate inference. Additionally, these models can be serialized and stored using PytorchAO in quantized data types to save disk space. You can find examples and benchmarks at the following links: 1. More detailed technical explanations and code. 2. Optimized prompt examples and conversions. 3. Detailed code for model inference and fine-tuning. 4. Project update logs and more interactive opportunities. 5. CogVideoX toolchain to help you better use the model. 6. INT8 model inference code. This model is released under the CogVideoX LICENSE.
chatglm3-6b-base
💻 Github Repo • 🐦 Twitter • 📃 [GLM@ACL 22] [GitHub] • 📃 [GLM-130B@ICLR 23] [GitHub] 📍Experience the larger-scale ChatGLM model at chatglm.cn 介绍 (Introduction) ChatGLM3-6B 是 ChatGLM 系列最新一代的开源模型,在保留了前两代模型对话流畅、部署门槛低等众多优秀特性的基础上,ChatGLM3-6B 引入了如下特性: 1. 更强大的基础模型: ChatGLM3-6B 的基础模型 ChatGLM3-6B-Base 采用了更多样的训练数据、更充分的训练步数和更合理的训练策略。在语义、数学、推理、代码、知识等不同角度的数据集上测评显示,ChatGLM3-6B-Base 具有在 10B 以下的预训练模型中最强的性能。 2. 更完整的功能支持: ChatGLM3-6B 采用了全新设计的 Prompt 格式,除正常的多轮对话外。同时原生支持工具调用(Function Call)、代码执行(Code Interpreter)和 Agent 任务等复杂场景。 3. 更全面的开源序列: 除了对话模型 ChatGLM3-6B 外,还开源了基础模型 ChatGLM-6B-Base、长文本对话模型 ChatGLM3-6B-32K。以上所有权重对学术研究完全开放,在填写问卷进行登记后亦允许免费商业使用。 ChatGLM3-6B is the latest open-source model in the ChatGLM series. While retaining many excellent features such as smooth dialogue and low deployment threshold from the previous two generations, ChatGLM3-6B introduces the following features: 1. More Powerful Base Model: The base model of ChatGLM3-6B, ChatGLM3-6B-Base, employs a more diverse training dataset, more sufficient training steps, and a more reasonable training strategy. Evaluations on datasets such as semantics, mathematics, reasoning, code, knowledge, etc., show that ChatGLM3-6B-Base has the strongest performance among pre-trained models under 10B. 2. More Comprehensive Function Support: ChatGLM3-6B adopts a newly designed Prompt format, in addition to the normal multi-turn dialogue. It also natively supports function call, code interpreter, and complex scenarios such as agent tasks. 3. More Comprehensive Open-source Series: In addition to the dialogue model ChatGLM3-6B, the base model ChatGLM-6B-Base and the long-text dialogue model ChatGLM3-6B-32K are also open-sourced. All the weights are fully open for academic research, and after completing the questionnaire registration, they are also allowed for free commercial use. This repo is ChatGLM3-6B-Base, the base model of ChatGLM3-6B. 作为没有经过人类意图对齐的模型,ChatGLM3-6B-Base 不能用于多轮对话。但是可以进行文本续写。 As a model that has not been aligned with human intent, ChatGLM3-6B-Base cannot be used for multi-turn conversations. However, text completion is possible. 关于更多的使用说明,包括如何运行命令行和网页版本的 DEMO,以及使用模型量化以节省显存,请参考我们的 Github Repo。 For more instructions, including how to run CLI and web demos, and model quantization, please refer to our Github Repo. 本仓库的代码依照 Apache-2.0 协议开源,ChatGLM3-6B 模型的权重的使用则需要遵循 Model License。 The code in this repository is open-sourced under the Apache-2.0 license, while the use of the ChatGLM3-6B model weights needs to comply with the Model License. If you find our work helpful, please consider citing the following papers.
cogvlm-chat-hf
CogVLM 是一个强大的开源视觉语言模型(VLM)。CogVLM-17B 拥有 100 亿视觉参数和 70 亿语言参数,在 10 个经典跨模态基准测试上取得了 SOTA 性能,包括 NoCaps、Flicker30k captioning、RefCOCO、RefCOCO+、RefCOCOg、Visual7W、GQA、ScienceQA、VizWiz VQA 和 TDIUC,而在 VQAv2、OKVQA、TextVQA、COCO captioning 等方面则排名第二,超越或与 PaLI-X 55B 持平。您可以通过线上 demo 体验 CogVLM 多模态对话。 CogVLM is a powerful open-source visual language model (VLM). CogVLM-17B has 10 billion vision parameters and 7 billion language parameters. CogVLM-17B achieves state-of-the-art performance on 10 classic cross-modal benchmarks, including NoCaps, Flicker30k captioning, RefCOCO, RefCOCO+, RefCOCOg, Visual7W, GQA, ScienceQA, VizWiz VQA and TDIUC, and rank the 2nd on VQAv2, OKVQA, TextVQA, COCO captioning, etc., surpassing or matching PaLI-X 55B. CogVLM can also chat with you about images. 需要近 40GB GPU 显存用于模型推理。如果没有一整块GPU显存超过40GB,则需要使用accelerate的将模型切分到多个有较小显存的GPU设备上。 40GB VRAM for inference. If there is no single GPU with more than 40GB of VRAM, you will need to use the "accelerate" library to dispatch the model into multiple GPUs with smaller VRAM. 当单卡显存不足时,可以将模型切分到多个小显存GPU上。以下是个当你有两张24GB的GPU,16GBCPU内存的例子。 你可以将`inferautodevicemap`的参数改成你的配置。注意这里将GPU显存少写了一点,这是为推理时中间状态预留出一部分显存。 dispatch the model into multiple GPUs with smaller VRAM. This is an example for you have two 24GB GPU and 16GB CPU memory. you can change the arguments of `inferautodevicemap` with your own setting. CogVLM 模型包括四个基本组件:视觉变换器(ViT)编码器、MLP适配器、预训练的大型语言模型(GPT)和一个视觉专家模块。更多细节请参见Paper。 CogVLM model comprises four fundamental components: a vision transformer (ViT) encoder, an MLP adapter, a pretrained large language model (GPT), and a visual expert module. See Paper for more details. 此存储库中的代码是根据 Apache-2.0 许可 开放源码,而使用 CogVLM 模型权重必须遵循 模型许可。 The code in this repository is open source under the Apache-2.0 license, while the use of the CogVLM model weights must comply with the Model License. If you find our work helpful, please consider citing the following papers
GLM-Image
chatglm-6b
🌐 Blog • 💻 Github Repo • 🐦 Twitter • 📃 [GLM@ACL 22] [GitHub] • 📃 [GLM-130B@ICLR 23] [GitHub] 📍Experience the larger-scale ChatGLM model at chatglm.cn 我们发布了 ChatGLM2-6B,ChatGLM-6B 的升级版本,在保留了了初代模型对话流畅、部署门槛较低等众多优秀特性的基础之上,引入了更强大的性能、更长的上下文、更高效的推理等升级。 介绍 ChatGLM-6B 是一个开源的、支持中英双语问答的对话语言模型,基于 General Language Model (GLM) 架构,具有 62 亿参数。结合模型量化技术,用户可以在消费级的显卡上进行本地部署(INT4 量化级别下最低只需 6GB 显存)。ChatGLM-6B 使用了和 ChatGLM 相同的技术,针对中文问答和对话进行了优化。经过约 1T 标识符的中英双语训练,辅以监督微调、反馈自助、人类反馈强化学习等技术的加持,62 亿参数的 ChatGLM-6B 已经能生成相当符合人类偏好的回答。 ChatGLM-6B 权重对学术研究完全开放,在填写问卷进行登记后亦允许免费商业使用。 ChatGLM-6B is an open bilingual language model based on General Language Model (GLM) framework, with 6.2 billion parameters. With the quantization technique, users can deploy locally on consumer-grade graphics cards (only 6GB of GPU memory is required at the INT4 quantization level). ChatGLM-6B uses technology similar to ChatGPT, optimized for Chinese QA and dialogue. The model is trained for about 1T tokens of Chinese and English corpus, supplemented by supervised fine-tuning, feedback bootstrap, and reinforcement learning with human feedback. With only about 6.2 billion parameters, the model is able to generate answers that are in line with human preference. ChatGLM-6B weights are completely open for academic research, and free commercial use is also allowed after completing the questionnaire. 关于更多的使用说明,包括如何运行命令行和网页版本的 DEMO,以及使用模型量化以节省显存,请参考我们的 Github Repo。 For more instructions, including how to run CLI and web demos, and model quantization, please refer to our Github Repo. Change Log v1.1.0 (942945d): 更新 v1.1 版本 checkpoint v0.1.0 (f831824) 本仓库的代码依照 Apache-2.0 协议开源,ChatGLM-6B 模型的权重的使用则需要遵循 Model License。 If you find our work helpful, please consider citing the following paper.
cogvlm2-llama3-chat-19B
👋 Wechat · 💡 Online Demo · 🎈 Github Page · 📑 Paper 📍Experience the larger-scale CogVLM model on the ZhipuAI Open Platform . We launch a new generation of CogVLM2 series of models and open source two models built with Meta-Llama-3-8B-Instruct. Compared with the previous generation of CogVLM open source models, the CogVLM2 series of open source models have the following improvements: 1. Significant improvements in many benchmarks such as `TextVQA`, `DocVQA`. 2. Support 8K content length. 3. Support image resolution up to 1344 1344. 4. Provide an open source model version that supports both Chinese and English. You can see the details of the CogVLM2 family of open source models in the table below: | Model name | cogvlm2-llama3-chat-19B | cogvlm2-llama3-chinese-chat-19B | |------------------|-------------------------------------|-------------------------------------| | Base Model | Meta-Llama-3-8B-Instruct | Meta-Llama-3-8B-Instruct | | Language | English | Chinese, English | | Model size | 19B | 19B | | Task | Image understanding, dialogue model | Image understanding, dialogue model | | Text length | 8K | 8K | | Image resolution | 1344 1344 | 1344 1344 | Our open source models have achieved good results in many lists compared to the previous generation of CogVLM open source models. Its excellent performance can compete with some non-open source models, as shown in the table below: | Model | Open Source | LLM Size | TextVQA | DocVQA | ChartQA | OCRbench | VCREASY | VCRHARD | MMMU | MMVet | MMBench | |----------------------------|-------------|----------|----------|----------|----------|----------|-------------|-------------|----------|----------|----------| | CogVLM1.1 | ✅ | 7B | 69.7 | - | 68.3 | 590 | 73.9 | 34.6 | 37.3 | 52.0 | 65.8 | | LLaVA-1.5 | ✅ | 13B | 61.3 | - | - | 337 | - | - | 37.0 | 35.4 | 67.7 | | Mini-Gemini | ✅ | 34B | 74.1 | - | - | - | - | - | 48.0 | 59.3 | 80.6 | | LLaVA-NeXT-LLaMA3 | ✅ | 8B | - | 78.2 | 69.5 | - | - | - | 41.7 | - | 72.1 | | LLaVA-NeXT-110B | ✅ | 110B | - | 85.7 | 79.7 | - | - | - | 49.1 | - | 80.5 | | InternVL-1.5 | ✅ | 20B | 80.6 | 90.9 | 83.8 | 720 | 14.7 | 2.0 | 46.8 | 55.4 | 82.3 | | QwenVL-Plus | ❌ | - | 78.9 | 91.4 | 78.1 | 726 | - | - | 51.4 | 55.7 | 67.0 | | Claude3-Opus | ❌ | - | - | 89.3 | 80.8 | 694 | 63.85 | 37.8 | 59.4 | 51.7 | 63.3 | | Gemini Pro 1.5 | ❌ | - | 73.5 | 86.5 | 81.3 | - | 62.73 | 28.1 | 58.5 | - | - | | GPT-4V | ❌ | - | 78.0 | 88.4 | 78.5 | 656 | 52.04 | 25.8 | 56.8 | 67.7 | 75.0 | | CogVLM2-LLaMA3 | ✅ | 8B | 84.2 | 92.3 | 81.0 | 756 | 83.3 | 38.0 | 44.3 | 60.4 | 80.5 | | CogVLM2-LLaMA3-Chinese | ✅ | 8B | 85.0 | 88.4 | 74.7 | 780 | 79.9 | 25.1 | 42.8 | 60.5 | 78.9 | All reviews were obtained without using any external OCR tools ("pixel only"). Quick Start here is a simple example of how to use the model to chat with the CogVLM2 model. For More use case. Find in our github This model is released under the CogVLM2 LICENSE. For models built with Meta Llama 3, please also adhere to the LLAMA3LICENSE. If you find our work helpful, please consider citing the following papers
codegeex4-all-9b
We introduce CodeGeeX4-ALL-9B, the open-source version of the latest CodeGeeX4 model series. It is a multilingual code generation model continually trained on the GLM-4-9B, significantly enhancing its code generation capabilities. Using a single CodeGeeX4-ALL-9B model, it can support comprehensive functions such as code completion and generation, code interpreter, web search, function call, repository-level code Q&A, covering various scenarios of software development. CodeGeeX4-ALL-9B has achieved highly competitive performance on public benchmarks, such as BigCodeBench and NaturalCodeBench. It is currently the most powerful code generation model with less than 10B parameters, even surpassing much larger general-purpose models, achieving the best balance in terms of inference speed and model performance. 你是一位智能编程助手,你叫CodeGeeX。你会为用户回答关于编程、代码、计算机方面的任何问题,并提供格式规范、可以执行、准确安全的代码,并在必要时提供详细的解释。 You are an intelligent programming assistant named CodeGeeX. You will answer any questions users have about programming, coding, and computers, and provide code that is formatted correctly. ###PATH:src/example.py ###LANGUAGE:Python ###MODE:BLOCK {suffix} {prefix} @inproceedings{zheng2023codegeex, title={CodeGeeX: A Pre-Trained Model for Code Generation with Multilingual Benchmarking on HumanEval-X}, author={Qinkai Zheng and Xiao Xia and Xu Zou and Yuxiao Dong and Shan Wang and Yufei Xue and Zihan Wang and Lei Shen and Andi Wang and Yang Li and Teng Su and Zhilin Yang and Jie Tang}, booktitle={Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining}, pages={5673--5684}, year={2023} } ```
GLM-4-32B-Base-0414
The GLM family welcomes new members, the GLM-4-32B-0414 series models, featuring 32 billion parameters. Its performance is comparable to OpenAI’s GPT series and DeepSeek’s V3/R1 series. It also supports very user-friendly local deployment features. GLM-4-32B-Base-0414 was pre-trained on 15T of high-quality data, including substantial reasoning-type synthetic data. This lays the foundation for subsequent reinforcement learning extensions. In the post-training stage, we employed human preference alignment for dialogue scenarios. Additionally, using techniques like rejection sampling and reinforcement learning, we enhanced the model’s performance in instruction following, engineering code, and function calling, thus strengthening the atomic capabilities required for agent tasks. GLM-4-32B-0414 achieves good results in engineering code, Artifact generation, function calling, search-based Q&A, and report generation. In particular, on several benchmarks, such as code generation or specific Q&A tasks, GLM-4-32B-Base-0414 achieves comparable performance with those larger models like GPT-4o and DeepSeek-V3-0324 (671B). GLM-Z1-32B-0414 is a reasoning model with deep thinking capabilities. This was developed based on GLM-4-32B-0414 through cold start, extended reinforcement learning, and further training on tasks including mathematics, code, and logic. Compared to the base model, GLM-Z1-32B-0414 significantly improves mathematical abilities and the capability to solve complex tasks. During training, we also introduced general reinforcement learning based on pairwise ranking feedback, which enhances the model's general capabilities. GLM-Z1-Rumination-32B-0414 is a deep reasoning model with rumination capabilities (against OpenAI's Deep Research). Unlike typical deep thinking models, the rumination model is capable of deeper and longer thinking to solve more open-ended and complex problems (e.g., writing a comparative analysis of AI development in two cities and their future development plans). Z1-Rumination is trained through scaling end-to-end reinforcement learning with responses graded by the ground truth answers or rubrics and can make use of search tools during its deep thinking process to handle complex tasks. The model shows significant improvements in research-style writing and complex tasks. Finally, GLM-Z1-9B-0414 is a surprise. We employed all the aforementioned techniques to train a small model (9B). GLM-Z1-9B-0414 exhibits excellent capabilities in mathematical reasoning and general tasks. Its overall performance is top-ranked among all open-source models of the same size. Especially in resource-constrained scenarios, this model achieves an excellent balance between efficiency and effectiveness, providing a powerful option for users seeking lightweight deployment. write a Python program that shows a ball bouncing inside a spinning hexagon. The ball should be affected by gravity and friction, and it must bounce off the rotating walls realistically Use HTML to simulate the scenario of a small ball released from the center of a rotating hexagon. Consider the collision between the ball and the hexagon's edges, the gravity acting on the ball, and assume all collisions are perfectly elastic. (Prompt translated from Chinese) Design a drawing board that supports custom function plotting, allowing adding and deleting custom functions, and assigning colors to functions. (Prompt translated from Chinese) Design a UI for a mobile machine learning platform, which should include interfaces for training tasks, storage management, and personal statistics. The personal statistics interface should use charts to display the user's resource usage over a period. Use Tailwind CSS to style the page, and display these 3 mobile interfaces tiled on a single HTML page. (Prompt translated from Chinese) Create a misty Jiangnan scene using SVG. (Prompt translated from Chinese) Use SVG to illustrate the training process of an LLM. (Prompt translated from Chinese) For search-based writing tasks, we use the following system prompt to have the model respond based on search results: When using, you can obtain search results through methods such as `RAG` or `WebSearch`, and wrap them in `observation`, for example: For the above prompt, we use an internal or external search model to obtain the search results. Using the format shown above, we can generate the following analysis report: Analysis Report on Common Characteristics of Children's Literature Children's literature, as a literary genre specifically created for children, possesses unique artistic features and expressive techniques. This report will comprehensively analyze the common characteristics of children's literature from three dimensions: narrative methods, thematic tendencies, and other universal features, integrating academic research, classic examples of children's literature, and expert viewpoints. Narrative Methods and Their Impact on Child Readers The narrative methods of children's literature exhibit diverse characteristics, including first-person, third-person, narration, and interactive storytelling. These different narrative perspectives and methods profoundly influence children's reading experiences and cognitive development. First-person narration is common in children's literature, unfolding the story from the perspective of a child protagonist. This narrative style bridges the gap between the reader and the story, making it easier for child readers to identify with the characters and develop emotional resonance. For example, many coming-of-age novels use first-person narration, allowing the protagonist to directly recount their experiences and feelings, making readers feel as though they are experiencing the joys and sorrows of growth alongside the protagonist. This narrative perspective lends the work a strong sense of authenticity and intimacy, helping to cultivate children's empathy【1†source】. Third-person narration offers a broader perspective, allowing the author to flexibly switch between different characters' viewpoints and present richer layers of the story. In children's literature, third-person omniscient narration enables the author to control the narrative pace, revealing or concealing information as needed to guide children's attention. At the same time, third-person narration facilitates direct dialogue between the author and the reader, conveying values or explaining complex concepts through narration. This narrative method positively influences children's macro-thinking and comprehensive understanding【1†source】. Narration (authorial intrusion) is a unique narrative technique in children's literature, where the author directly appears as the "storyteller," explaining the background, commenting on characters, or posing questions to the reader. This technique is particularly common in classic fairy tales, such as the opening lines of Andersen's Fairy Tales: "Once, there was a child..." Narration helps children understand the story's context, fills cognitive gaps, and conveys the author's educational intent. Research shows that appropriate authorial intrusion aids children in grasping the story's structure and improving reading comprehension【5†source】. Interactive storytelling is a new trend in contemporary children's literature, especially prominent in the digital media era. Interactive storytelling breaks the traditional unidirectional author-reader relationship, encouraging child readers to participate in the story's creation, such as by choosing plot directions, character dialogues, or endings. This participatory reading enhances children's sense of agency and fosters decision-making skills and creative thinking. For example, some children's reading apps incorporate interactive elements, allowing children to influence the story's development through clicks, drag-and-drop actions, and other operations, thereby gaining a stronger sense of immersion and achievement【6†source】. Interactive storytelling transforms children from passive information recipients into active meaning-makers, uniquely contributing to the development of their subjectivity. Table: Common Narrative Methods in Children's Literature and Their Effects | Narrative Method | Characteristics | Impact on Child Readers | Classic Examples | |----------------------|--------------------|----------------------------|---------------------| | First-Person | Told from the child protagonist's perspective | Enhances immersion, fosters empathy | Charlotte's Web, The Straw House | | Third-Person | Omniscient or limited perspective | Expands horizons, develops comprehensive understanding | Harry Potter series | | Narration | Direct authorial intrusion into the narrative | Aids comprehension, conveys values | Andersen's Fairy Tales | | Interactive | Encourages reader participation in creation | Cultivates agency and creative thinking | Children's interactive reading apps | Notably, the narrative methods of children's literature are often closely intertwined with the childhood perspective. The childhood perspective does not necessarily mean the narrator must be a child but refers to the work's ability to describe the world to the greatest extent from a child's heart, expressing their inner psychology and external circumstances【2†source】. Through the childhood perspective, readers can embark on a spiritual journey with a child's mindset, a narrative strategy that creates a strong sense of realism, allowing child readers to achieve emotional identification and cognitive resonance during the reading process【1†source】. The use of the childhood perspective gives the work's language a perceptual and naive quality, often with a prose-like and spatial structure, artistic features that align with children's cognitive characteristics and aid their acceptance and understanding【2†source】. Thematic Tendencies and Their Impact on Children's Cognitive and Emotional Development The thematic choices in children's literature exhibit distinct tendencies, with common themes including growth, adventure, friendship, and family. These themes not only form the core content of children's literature but also subtly influence children's cognitive development and emotional shaping. The theme of growth is one of the central motifs in children's literature. Growth narratives are regarded as the artistic lifeblood of children's literature, focusing on depicting the pivotal moments of rapid psychological development in children, particularly the awakening and establishment of self-awareness【3†source】. Growth literature typically includes three elements: an artistic portrayal of the self-awareness construction process in growing adolescents, a developmental story with logical propulsion, and the presentation of the protagonist's spiritual trials and quest for direction【3†source】. By reading growth-themed works, child readers can indirectly experience the confusion and breakthroughs of growing up and understand the formation of self-identity. Classics such as Astrid Lindgren's Pippi Longstocking and Cao Wenxuan's The Straw House vividly depict children's psychological growth trajectories in specific environments. Research indicates that growth-themed literary works help children build a positive self-concept and develop the courage and resilience to face challenges, positively contributing to their psychological development【9†source】. The theme of adventure holds an important place in children's literature, satisfying children's curiosity about exploring the unknown. Adventure stories often feature unusual settings and unknown challenges, with the protagonist growing through overcoming difficulties. Classics like Robinson Crusoe and The Adventures of Tom Sawyer attract child readers with thrilling plots while conveying the importance of qualities such as courage, wisdom, and perseverance. The impact of adventure themes on children's cognitive development mainly lies in expanding their imaginative space and fostering problem-solving skills. In adventure stories, children must analyze situations, make plans, and respond to unexpected events alongside the protagonist, a process that exercises their logical thinking and adaptability【14†source】. At the same time, the unfamiliar environments and novel experiences in adventure stories stimulate children's curiosity and desire to learn, laying the foundation for cultivating an exploratory spirit. As experts point out, excellent children's literature should be grounded in reality, rich in depth, and generate significant inspiration and感染力, guiding children to comprehensively understand the world【14†source】. The theme of friendship is equally prevalent in children's literature, reflecting children's emphasis on peer relationships. Friendship and love are regarded as humanity's most precious qualities, often depicted in children's literature as beacons in the night, guiding children toward the future【9†source】. Friendship stories typically revolve around interactions between children, portraying positive behaviors such as sharing, cooperation, and understanding. Examples include the genuine friendships among the children at Tomoe Gakuen in Totto-Chan: The Little Girl at the Window and the promise and mutual aid between Wilbur and Charlotte in Charlotte's Web. These stories help child readers recognize the value of friendship and learn how to build and maintain interpersonal relationships. Research shows that children need peer support during their growth, as friends provide crucial emotional anchors, offering the greatest emotional support and comfort in unfamiliar environments【16†source】. By reading friendship-themed works, children can learn social skills, develop empathy, and cultivate a spirit of cooperation, qualities essential for their social development【17†source】. The theme of family is an indispensable subject in children's literature, depicting the emotional bonds and interaction patterns among family members. As the primary setting for children's earliest socialization, the family atmosphere and parenting styles profoundly impact children's mental health【10†source】. Family stories in children's literature often focus on parent-child relationships, sibling bonds, and other dynamics, such as Alice's relationship with her sister in Alice's Adventures in Wonderland and the Little Prince's interactions with the rose in The Little Prince. These stories help children understand the responsibilities and expectations of family roles and learn to handle conflicts within the family. Research indicates that a positive family atmosphere and parental support promote the development of children's positive psychological traits, while adverse family environments and parenting behaviors negatively affect their mental health【10†source】【11†source】. By reading family-themed works, children can gain emotional support, learn skills for managing family relationships, and establish healthy family values. Table: Common Themes in Children's Literature and Their Impact on Child Development | Theme Type | Content Representation | Impact on Cognitive Development | Impact on Emotional Development | Classic Examples | |---------------|---------------------------|-------------------------------------|-------------------------------------|---------------------| | Growth | Awakening of self-awareness, psychological trials and breakthroughs | Establishes self-concept, fosters problem-solving skills | Shapes positive self-identity, enhances psychological resilience | The Straw House, Pippi Longstocking | | Adventure | Exploring the unknown, overcoming challenges | Expands imaginative space, exercises logical thinking | Cultivates courage and perseverance | Robinson Crusoe, The Adventures of Tom Sawyer | | Friendship | Peer interactions, mutual aid and cooperation | Learns social skills, understands interpersonal dynamics | Develops empathy, builds a sense of belonging | Charlotte's Web, Totto-Chan: The Little Girl at the Window | | Family | Parent-child relationships, sibling bonds | Understands social roles, learns communication skills | Gains emotional support, establishes secure attachments | Alice's Adventures in Wonderland, The Little Prince | Regarding thematic choices, children's literature researcher Zhu Ziqiang proposed the famous "Three Major Motifs" theory, categorizing children's literary works into "the motif of love," "the motif of the mischievous child," and "the motif of nature"【8†source】. The motif of love focuses on emotional connections between children and adults or peers; the motif of the mischievous child portrays children's free-spirited nature; and the motif of nature emphasizes the harmonious relationship between children and the natural environment. These three motifs reflect the richness of the children's world from different angles, providing diverse emotional experiences and cognitive frameworks for children. Notably, these themes do not exist in isolation; outstanding works often organically integrate multiple themes. For example, the Harry Potter series incorporates growth, friendship, adventure, and family elements, presenting child readers with a multidimensional spiritual world. Other Universal Features and Their Artistic Expression In addition to narrative methods and thematic tendencies, children's literature exhibits a series of universal artistic features, including anthropomorphism, repetitive language, symbolism and metaphor, and educational significance. These features collectively constitute the unique aesthetic style of children's literature, subtly influencing children's cognitive development and aesthetic cultivation. Anthropomorphism is one of the most distinctive artistic features of children's literature. In children's literary works, animals, plants, and even inanimate objects are often endowed with human thoughts, emotions, and behaviors, greatly enhancing the story's fun and imagination. Research shows that anthropomorphism is a frequently used technique by children's literature creators to attribute human characteristics to animals, enabling them to possess perception and communication abilities【19†source】. Through anthropomorphism, children can more easily understand abstract concepts and moral principles, as anthropomorphic characters translate complex ideas into familiar emotional and behavioral patterns. For example, in scientific fairy tales, anthropomorphic characters can help explain scientific principles, making abstract concepts tangible【18†source】. Anthropomorphism not only enriches the narrative techniques of children's literature but also provides children with a unique perspective for understanding the relationship between humans and nature. It is worth noting that excessive anthropomorphism may affect children's accurate understanding of the animal world, so modern children's literature pays more attention to balancing the natural attributes of characters with human characteristics when employing anthropomorphic techniques【19†source】. Repetitive language is extremely common in children's literature, a linguistic feature rooted in oral traditions originally intended to aid memory and dissemination【20†source】. In children's literature, the repetitive use of words, phrases, or sentences serves multiple functions: constructing the story's framework, emphasizing key information, creating rhythm and musicality, and training children's vocabulary skills. For example, in The Very Hungry Caterpillar, the author repeatedly uses phrases like "On Monday, he ate one apple. On Tuesday, he ate two pears..." This not only builds the story's structure but also helps children learn numbers and days of the week. Repetitive structures also aid children in developing an awareness of language patterns during the early stages of language acquisition, fostering a sense of language and memory skills【21†source】. Research indicates that repetitive language in children's literature promotes children's language acquisition, helping them master vocabulary and syntactic rules. At the same time, this linguistic feature enhances the story's participatory nature, as children can often join in reciting the repetitive parts, gaining a sense of achievement. Symbolism and metaphor are common expressive techniques in children's literature, conveying abstract meanings through concrete imagery. Symbolism uses specific objects to represent abstract concepts or emotions, while metaphor connects two different things through comparison, creating new meanings. In children's literature, symbolism and metaphor are usually presented in a simple and clear manner, avoiding overly complex interpretations. For example, the character configurations and metaphorical connotations in The Wizard of Oz are thought-provoking, as these characters not only breathe life into the story but also convey profound life philosophies through their symbolic meanings【24†source】. Symbolism and metaphor in children's literature are often related to themes such as growth, friendship, and courage, helping children understand abstract concepts through concrete and figurative expressions. Research shows that appropriate metaphors can promote children's cognitive development, stimulating their imagination and creativity【23†source】. As children grow older, their ability to understand symbolism and metaphor gradually improves, providing children's literature with multi-layered meaning spaces. Educational significance is an indispensable component of children's literature, which inherently carries the gene of children's education【22†source】. Excellent children's literary works simultaneously possess entertainment and educational functions, not only helping children understand the objective world, enrich their inner emotions, and acquire life wisdom but also cultivating their perception, aesthetic sensibility, thinking skills, and creativity【15†source】. Educational significance in children's literature is often not directly presented through preaching but naturally revealed through the storyline and characters' fates. For example, many classic fairy tales convey the importance of qualities such as bravery and honesty through the protagonist's adventurous experiences, while popular science books introduces scientific knowledge through interesting plots and characters. Experts point out that children's literature writers should shoulder the importantence of education, incorporating care for children's mental growth into their works【22†source】. It is worth noting that the educational significance of children's literature should respect children's receptive abilities, avoiding excessive preaching or moral indoctrination, and instead naturally influencing children's values and behaviors through artistic appeal. Storytelling is the most basic and essential feature of children's literature. Children's perceptual, imagery-driven, and novelty-seeking cognitive characteristics and receptive psychology further determine that "storytelling" is an indispensable ontological feature of children's literature【25†source】. Engaging plots are the most crucial aspect of children's literary works because, compared to adults, children's understanding of things relies mainly on intuition, and plots play a key role in guiding children's comprehension of stories【26†source】. The storytelling quality of children's literature is reflected in multiple aspects: clear cause-and-effect relationships, Compact narrative rhythm and satisfying endings. These elements work together to immerse children in the story world, providing emotional satisfaction and cognitive inspiration. As researchers have noted, plots must be performed by specific characters in specific situations to convey individual experiences in unique space-time environments【7†source】. In children's literature, storytelling is not merely an artistic technique but a bridge connecting children to the world. Through stories, children can safely experience various life scenarios and learn methods for challenges. In terms of language features, children's literature typically adopts a concise, clear, and vivid language style, avoiding complex sentence structures and abstract vocabulary. This linguistic characteristic aligns with children's cognitive development levels, facilitating their understanding and acceptance. At the same time, the language of children's literature is often rich in rhythm and musicality, enhancing readability and memorability through techniques such as rhyming and repetition. For example, Michael Rosen's children's literary works extensively employ repetitive structures and rhymes, a language usage that helps children develop an awareness of language patterns during the early stages of language acquisition【21†source】. The language of children's literature also often includes rich sensory descriptions and emotional expressions, stimulating children's imagination through concrete and tangible imagery. Scholar Jay Davis's research shows that the interactive use of language in children's literature can influence children's language habits and promote their language development【21†source】. In summary, these universal features of children's literature collectively constitute its unique artistic charm and educational value. Anthropomorphism and symbolism expand children's imaginative spaces, repetitive language and storytelling promote language acquisition and cognitive development, and the natural integration of educational significance achieves the artistic effect of "teaching through entertainment." These features do not exist in isolation but are interwoven and organically unified, collectively serving the comprehensive development of child readers. Through a systematic analysis of the narrative methods, thematic tendencies, and other universal features of children's literature, we can draw the following conclusions: As a special literary genre, the creation and reception of children's literature follow unique rules. In terms of narrative methods, children's literature flexibly employs various techniques such as first-person, third-person, narration, and interactive storytelling to adapt to children's cognitive characteristics and receptive psychology. Among these, the use of the childhood perspective is particularly important, as it enhances the work's sense of realism and intimacy, enabling child readers to develop emotional resonance【1†source】【2†source】. In terms of thematic choices, growth, adventure, friendship, and family constitute the main content of children's literature. These themes not only satisfy children's curiosity and desire to explore but also subtly influence their cognitive development and emotional shaping【3†source】【9†source】. Other universal features such as anthropomorphism, repetitive language, symbolism, and educational significance collectively form the unique artistic style and educational value of children's literature【18†source】【20†source】【24†source】. These characteristics of children's literature do not exist in isolation but are interconnected and organically unified. For example, adventure themes are often combined with third-person omniscient narration to attract child readers through compact plots and vivid descriptions; friendship themes frequently employ first-person narration to enhance emotional resonance; and anthropomorphism is commonly found in nature-themed works, helping children understand the relationship between humans and nature. These features collectively serve the comprehensive development of child readers, meeting their entertainment needs while promoting their cognitive growth and emotional maturity. From an academic research perspective, children's literature studies should emphasize the application of narrative theory, as narrative theory focuses more on the "how" of storytelling—narrative form—which aligns closely with the research focus of children's literature【0†source】. At the same time, cognitive research methods provide new perspectives for children's literature studies. By combining cognitive science with literary theory, we can gain a deeper understanding of how children's literature influences children's thinking and cognitive development【4†source】. Future research should continue to explore the application of these theoretical methods in children's literature studies while paying attention to the intersection and integration of children's literature with emerging fields such as digital media and interdisciplinary education. From a creative practice perspective, children's literature writers should fully grasp children's cognitive characteristics and emotional needs, incorporating growth Care and educational wisdom into their work As experts have pointed out, excellent children's literary works should be grounded in reality, rich in depth, and generate significant infection and infectivity, guiding children to comprehensively understand the world and correctly recognize themselves and society【14†source】. At the same time, children's literature Creativity should keep pace with the times, addressing new problems and challenges faced by contemporary children, such as media literacy in the digital age and identity formation in multicultural contexts, to provide targeted spiritual nourishment for children. From an educational application perspective, children's literature should fully leverage its unique role in children's mental growth. Through carefully designed reading activities, teachers and parents can help children deeply understand the themes and meanings in works, guiding them to connect reading experiences with real life. Research shows that children's literature plays an increasingly important role in language education, the construction of a reading society, and children's mental growth【22†source】. Therefore, children's literature should be incorporated as an important component of school and family education, promoting children's cognitive development and emotional maturity through activities such as reading sharing, role-playing, and creative writing. In summary, as a unique art form and educational medium, the common characteristics of children's literature constitute an organic whole, collectively serving the comprehensive development of child readers. By deeply understanding these features and their mechanisms of influence, we can better create, research, and apply children's literature, providing high-quality spiritual nourishment for children's healthy growth. Future children's literature research should continue to deepen theoretical exploration, expand research methods, and strengthen interdisciplinary collaboration to address the ever-changing needs of children and the challenges of the times, promoting the continuous development of children's literature. GLM-4-32B-0414 supports calling external tools in JSON format. This can be done via HuggingFace Transformers, vLLM, or sgLang. The message format for tool calling is as follows: The message format for tool execution results is as follows: The following example demonstrates the process of GLM-4-32B-0414 calling a tool and generating a final response using HuggingFace Transformers. | 模型 | IFEval | BFCL-v3 (Overall) | BFCL-v3 (MultiTurn) | TAU-Bench (Retail) | TAU-Bench (Airline) | SimpleQA | HotpotQA | | ---------------- | ------ | ----------------- | ------------------- | ------------------ | ------------------- | -------- | -------- | | Qwen2.5-Max | 85.6 | 50.9 | 30.5 | 58.3 | 22.0 | 79.0 | 52.8 | | GPT-4o-1120 | 81.9 | 69.6 | 41.0 | 62.8 | 46.0 | 82.8 | 63.9 | | DeepSeek-V3-0324 | 83.4 | 66.2 | 35.8 | 60.7 | 32.4 | 82.6 | 54.6 | | DeepSeek-R1 | 84.3 | 57.5 | 12.4 | 33.0 | 37.3 | 83.9 | 63.1 | | GLM-4-32B-0414 | 87.6 | 69.6 | 41.5 | 68.7 | 51.2 | 88.1 | 63.8 | > For `SimpleQA` and `HotpotQA`, we sampled nearly 500 test cases from each test set, provided all models with basic `search` and `click` tools, ensured other settings remained consistent, and averaged the results over 3 runs. | Model | Framework | SWE-bench Verified | SWE-bench Verified mini | |---|---|---|---| | GLM-4-32B-0414 | Moatless [1] | 33.8 | 38.0 | | GLM-4-32B-0414 | Agentless [2] | 30.7 | 34.0 | | GLM-4-32B-0414 | OpenHands [3] | 27.2 | 28.0 | [1] Moatless v0.0.3 used the following parameters: `responseformat="react", thoughtsinaction=False, maxinterations=30`. No retries on failed trajectories; other settings are default. [2] Agentless v1.5.0 used BGE as the embedding model and FAISS for similarity search. To speed up patch verification while maintaining performance, the timeout for running a single instance was changed from the default 300s to 180s. [3] OpenHands v0.29.1 did not use YaRN context extension but limited runs to a maximum of 60 iterations and summarized the history to prevent exceeding the 32K context limit. Summarization was configured as `llmconfig="condenser", keepfirst=1, maxsize=32`. No retries on failed trajectories.
cogvlm2-llama3-caption
Typically, most video data does not come with corresponding descriptive text, so it is necessary to convert the video data into textual descriptions to provide the essential training data for text-to-video models. CogVLM2-Caption is a video captioning model used to generate training data for the CogVideoX model. This model is released under the CogVLM2 LICENSE. For models built with Meta Llama 3, please also adhere to the LLAMA3LICENSE. 🌟 If you find our work helpful, please leave us a star and cite our paper. ``` @article{yang2024cogvideox, title={CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer}, author={Yang, Zhuoyi and Teng, Jiayan and Zheng, Wendi and Ding, Ming and Huang, Shiyu and Xu, Jiazheng and Yang, Yuanming and Hong, Wenyi and Zhang, Xiaohan and Feng, Guanyu and others}, journal={arXiv preprint arXiv:2408.06072}, year={2024} }
GLM-Z1-32B-0414
The GLM family welcomes a new generation of open-source models, the GLM-4-32B-0414 series, featuring 32 billion parameters. Its performance is comparable to OpenAI's GPT series and DeepSeek's V3/R1 series, and it supports very user-friendly local deployment features. GLM-4-32B-Base-0414 was pre-trained on 15T of high-quality data, including a large amount of reasoning-type synthetic data, laying the foundation for subsequent reinforcement learning extensions. In the post-training stage, in addition to human preference alignment for dialogue scenarios, we also enhanced the model's performance in instruction following, engineering code, and function calling using techniques such as rejection sampling and reinforcement learning, strengthening the atomic capabilities required for agent tasks. GLM-4-32B-0414 achieves good results in areas such as engineering code, Artifact generation, function calling, search-based Q&A, and report generation. Some benchmarks even rival larger models like GPT-4o and DeepSeek-V3-0324 (671B). GLM-Z1-32B-0414 is a reasoning model with deep thinking capabilities. This was developed based on GLM-4-32B-0414 through cold start and extended reinforcement learning, as well as further training of the model on tasks involving mathematics, code, and logic. Compared to the base model, GLM-Z1-32B-0414 significantly improves mathematical abilities and the capability to solve complex tasks. During the training process, we also introduced general reinforcement learning based on pairwise ranking feedback, further enhancing the model's general capabilities. GLM-Z1-Rumination-32B-0414 is a deep reasoning model with rumination capabilities (benchmarked against OpenAI's Deep Research). Unlike typical deep thinking models, the rumination model employs longer periods of deep thought to solve more open-ended and complex problems (e.g., writing a comparative analysis of AI development in two cities and their future development plans). The rumination model integrates search tools during its deep thinking process to handle complex tasks and is trained by utilizing multiple rule-based rewards to guide and extend end-to-end reinforcement learning. Z1-Rumination shows significant improvements in research-style writing and complex retrieval tasks. Finally, GLM-Z1-9B-0414 is a surprise. We employed the aforementioned series of techniques to train a 9B small-sized model that maintains the open-source tradition. Despite its smaller scale, GLM-Z1-9B-0414 still exhibits excellent capabilities in mathematical reasoning and general tasks. Its overall performance is already at a leading level among open-source models of the same size. Especially in resource-constrained scenarios, this model achieves an excellent balance between efficiency and effectiveness, providing a powerful option for users seeking lightweight deployment. | Parameter | Recommended Value | Description | | ------------ | ----------------- | -------------------------------------------- | | temperature | 0.6 | Balances creativity and stability | | topp | 0.95 | Cumulative probability threshold for sampling| | topk | 40 | Filters out rare tokens while maintaining diversity | | maxnewtokens | 30000 | Leaves enough tokens for thinking | - Add \ \n to the first line: Ensures the model thinks before responding - When using `chattemplate.jinja`, the prompt is automatically injected to enforce this behavior - Retain only the final user-visible reply. Hidden thinking content should not be saved to history to reduce interference—this is already implemented in `chattemplate.jinja` - When input length exceeds 8,192 tokens, consider enabling YaRN (Rope Scaling) - In supported frameworks, add the following snippet to `config.json`: - Static YaRN applies uniformly to all text. It may slightly degrade performance on short texts, so enable as needed. If you find our work useful, please consider citing the following paper.
GLM-Z1-9B-0414
The GLM family welcomes a new generation of open-source models, the GLM-4-32B-0414 series, featuring 32 billion parameters. Its performance is comparable to OpenAI's GPT series and DeepSeek's V3/R1 series, and it supports very user-friendly local deployment features. GLM-4-32B-Base-0414 was pre-trained on 15T of high-quality data, including a large amount of reasoning-type synthetic data, laying the foundation for subsequent reinforcement learning extensions. In the post-training stage, in addition to human preference alignment for dialogue scenarios, we also enhanced the model's performance in instruction following, engineering code, and function calling using techniques such as rejection sampling and reinforcement learning, strengthening the atomic capabilities required for agent tasks. GLM-4-32B-0414 achieves good results in areas such as engineering code, Artifact generation, function calling, search-based Q&A, and report generation. Some benchmarks even rival larger models like GPT-4o and DeepSeek-V3-0324 (671B). GLM-Z1-32B-0414 is a reasoning model with deep thinking capabilities. This was developed based on GLM-4-32B-0414 through cold start and extended reinforcement learning, as well as further training of the model on tasks involving mathematics, code, and logic. Compared to the base model, GLM-Z1-32B-0414 significantly improves mathematical abilities and the capability to solve complex tasks. During the training process, we also introduced general reinforcement learning based on pairwise ranking feedback, further enhancing the model's general capabilities. GLM-Z1-Rumination-32B-0414 is a deep reasoning model with rumination capabilities (benchmarked against OpenAI's Deep Research). Unlike typical deep thinking models, the rumination model employs longer periods of deep thought to solve more open-ended and complex problems (e.g., writing a comparative analysis of AI development in two cities and their future development plans). The rumination model integrates search tools during its deep thinking process to handle complex tasks and is trained by utilizing multiple rule-based rewards to guide and extend end-to-end reinforcement learning. Z1-Rumination shows significant improvements in research-style writing and complex retrieval tasks. Finally, GLM-Z1-9B-0414 is a surprise. We employed the aforementioned series of techniques to train a 9B small-sized model that maintains the open-source tradition. Despite its smaller scale, GLM-Z1-9B-0414 still exhibits excellent capabilities in mathematical reasoning and general tasks. Its overall performance is already at a leading level among open-source models of the same size. Especially in resource-constrained scenarios, this model achieves an excellent balance between efficiency and effectiveness, providing a powerful option for users seeking lightweight deployment. | Parameter | Recommended Value | Description | | ------------ | ----------------- | -------------------------------------------- | | temperature | 0.6 | Balances creativity and stability | | topp | 0.95 | Cumulative probability threshold for sampling| | topk | 40 | Filters out rare tokens while maintaining diversity | | maxnewtokens | 30000 | Leaves enough tokens for thinking | - Add \ \n to the first line: Ensures the model thinks before responding - When using `chattemplate.jinja`, the prompt is automatically injected to enforce this behavior - Retain only the final user-visible reply. Hidden thinking content should not be saved to history to reduce interference—this is already implemented in `chattemplate.jinja` - When input length exceeds 8,192 tokens, consider enabling YaRN (Rope Scaling) - In supported frameworks, add the following snippet to `config.json`: - Static YaRN applies uniformly to all text. It may slightly degrade performance on short texts, so enable as needed. If you find our work useful, please consider citing the following paper.
VisionReward-Video
glm-edge-v-2b
glm-edge-1.5b-chat
Install the transformers library from the source code: The usage of this model’s weights is subject to the terms outlined in the LICENSE.
GLM-4.7-FP8
Codegeex4 All 9b GGUF
!!! This is the GGUF version of CodeGeeX4, the original version can be found here. !!! We introduce CodeGeeX4-ALL-9B, the open-source version of the latest CodeGeeX4 model series. It is a multilingual code generation model continually trained on the GLM-4-9B, significantly enhancing its code generation capabilities. Using a single CodeGeeX4-ALL-9B model, it can support comprehensive functions such as code completion and generation, code interpreter, web search, function call, repository-level code Q&A, covering various scenarios of software development. CodeGeeX4-ALL-9B has achieved highly competitive performance on public benchmarks, such as BigCodeBench and NaturalCodeBench. It is currently the most powerful code generation model with less than 10B parameters, even surpassing much larger general-purpose models, achieving the best balance in terms of inference speed and model performance. Use the latest llama.cpp to launch codegeex4-all-9b-GGUF Please make sure the prompt is under the following format: | Model | Seq Length | HumanEval | MBPP | NCB | LCB | HumanEvalFIM | CRUXEval-O | |-----------------------------|----------------|---------------|----------|---------|---------|------------------|----------------| | Llama3-70B-intruct | 8K | 77.4 | 82.3 | 37.0 | 27.4 | - | - | | DeepSeek Coder 33B Instruct | 16K | 81.1 | 80.4 | 39.3 | 29.3 | 78.2 | 49.9 | | Codestral-22B | 32K | 81.1 | 78.2 | 46.0 | 35.3 | 91.6 | 51.3 | | CodeGeeX4-All-9B | 128K | 82.3 | 75.7 | 40.4 | 28.5 | 85.0 | 47.1 | The model weights are licensed under the following License. If you find our work helpful, please feel free to cite the following paper:
LongWriter Glm4 9b
🤗 [LongWriter Dataset] • 💻 [Github Repo] • 📃 [LongWriter Paper] LongWriter-glm4-9b is trained based on glm-4-9b, and is capable of generating 10,000+ words at once. Environment: Same environment requirement as glm-4-9b-chat (`transformers>=4.43.0`).
glm-4-voice-9b
GLM-4-Voice 是智谱 AI 推出的端到端语音模型。GLM-4-Voice 能够直接理解和生成中英文语音,进行实时语音对话,并且能够根据用户的指令改变语音的情感、语调、语速、方言等属性。 GLM-4-Voice is an end-to-end voice model launched by Zhipu AI. GLM-4-Voice can directly understand and generate Chinese and English speech, engage in real-time voice conversations, and change attributes such as emotion, intonation, speech rate, and dialect based on user instructions. 本仓库是 GLM-4-Voice 的 LLM 部分。GLM-4-Voice-9B 在 GLM-4-9B 的基础上进行语音模态的预训练和对齐,从而能够理解和生成离散化的语音。 The repo provides the LLM part of GLM-4-Voice, pre-trained and aligned on speech modality based on GLM-4-9B, enabling understanding and generation of discretized speech. For more information please refer to our repo GLM-4-Voice.
chatglm2-6b-int4
💻 Github Repo • 🐦 Twitter • 📃 [GLM@ACL 22] [GitHub] • 📃 [GLM-130B@ICLR 23] [GitHub] 介绍 ChatGLM2-6B 是开源中英双语对话模型 ChatGLM-6B 的第二代版本,在保留了初代模型对话流畅、部署门槛较低等众多优秀特性的基础之上,ChatGLM2-6B 引入了如下新特性: 1. 更强大的性能:基于 ChatGLM 初代模型的开发经验,我们全面升级了 ChatGLM2-6B 的基座模型。ChatGLM2-6B 使用了 GLM 的混合目标函数,经过了 1.4T 中英标识符的预训练与人类偏好对齐训练,评测结果显示,相比于初代模型,ChatGLM2-6B 在 MMLU(+23%)、CEval(+33%)、GSM8K(+571%) 、BBH(+60%)等数据集上的性能取得了大幅度的提升,在同尺寸开源模型中具有较强的竞争力。 2. 更长的上下文:基于 FlashAttention 技术,我们将基座模型的上下文长度(Context Length)由 ChatGLM-6B 的 2K 扩展到了 32K,并在对话阶段使用 8K 的上下文长度训练,允许更多轮次的对话。但当前版本的 ChatGLM2-6B 对单轮超长文档的理解能力有限,我们会在后续迭代升级中着重进行优化。 3. 更高效的推理:基于 Multi-Query Attention 技术,ChatGLM2-6B 有更高效的推理速度和更低的显存占用:在官方的模型实现下,推理速度相比初代提升了 42%,INT4 量化下,6G 显存支持的对话长度由 1K 提升到了 8K。 ChatGLM2-6B is the second-generation version of the open-source bilingual (Chinese-English) chat model ChatGLM-6B. It retains the smooth conversation flow and low deployment threshold of the first-generation model, while introducing the following new features: 1. Stronger Performance: Based on the development experience of the first-generation ChatGLM model, we have fully upgraded the base model of ChatGLM2-6B. ChatGLM2-6B uses the hybrid objective function of GLM, and has undergone pre-training with 1.4T bilingual tokens and human preference alignment training. The evaluation results show that, compared to the first-generation model, ChatGLM2-6B has achieved substantial improvements in performance on datasets like MMLU (+23%), CEval (+33%), GSM8K (+571%), BBH (+60%), showing strong competitiveness among models of the same size. 2. Longer Context: Based on FlashAttention technique, we have extended the context length of the base model from 2K in ChatGLM-6B to 32K, and trained with a context length of 8K during the dialogue alignment, allowing for more rounds of dialogue. However, the current version of ChatGLM2-6B has limited understanding of single-round ultra-long documents, which we will focus on optimizing in future iterations. 3. More Efficient Inference: Based on Multi-Query Attention technique, ChatGLM2-6B has more efficient inference speed and lower GPU memory usage: under the official implementation, the inference speed has increased by 42% compared to the first generation; under INT4 quantization, the dialogue length supported by 6G GPU memory has increased from 1K to 8K. 关于更多的使用说明,包括如何运行命令行和网页版本的 DEMO,以及使用模型量化以节省显存,请参考我们的 Github Repo。 For more instructions, including how to run CLI and web demos, and model quantization, please refer to our Github Repo. 本仓库的代码依照 Apache-2.0 协议开源,ChatGLM2-6B 模型的权重的使用则需要遵循 Model License。 If you find our work helpful, please consider citing the following paper.
glm-edge-v-5b-gguf
The code for adapting this model is actively being integrated into the official `llama.cpp`. You can test it using the following adapted version: After installation, you can start the GLM-Edge Chat model using the following command: In the command-line interface, you can interact with the model by entering your requests, and the model will provide the corresponding responses. The usage of this model’s weights is subject to the terms outlined in the LICENSE.
Glyph
Glyph: Scaling Context Windows via Visual-Text Compression - Repository: https://github.com/thu-coai/Glyph - Paper: https://arxiv.org/abs/2510.17800 Glyph is a framework for scaling the context length through visual-text compression. Instead of extending token-based context windows, Glyph renders long textual sequences into images and processes them using vision–language models (VLMs). This design transforms the challenge of long-context modeling into a multimodal problem, substantially reducing computational and memory costs while preserving semantic information. This is a simple example of running single-image inference using the `transformers` library. First, install the `transformers` library: Known Limitations - Sensitivity to rendering parameters: Glyph’s performance can vary with rendering settings such as resolution, font, and spacing. Since our search procedure adopts a fixed rendering configuration during post-training, the model may not generalize well to unseen or substantially different rendering styles. - OCR-related challenges: Recognizing fine-grained or rare alphanumeric strings (e.g., UUIDs) remains difficult for visual-language models, especially with ultra-long inputs, sometimes leading to minor character misclassification. - Limited generalization: The training of Glyph mainly targets long-context understanding, and its capability on broader tasks is yet to be studied. Citation If you find our model useful in your work, please cite it with:
GLM-4.5-Base
📖 Check out the GLM-4.5 technical blog , technical report , and Zhipu AI technical documentation . 📍 Use GLM-4.5 API services on Z.ai API Platform (Global) or Zhipu AI Open Platform (Mainland China) . The GLM-4.5 series models are foundation models designed for intelligent agents. GLM-4.5 has 355 billion total parameters with 32 billion active parameters, while GLM-4.5-Air adopts a more compact design with 106 billion total parameters and 12 billion active parameters. GLM-4.5 models unify reasoning, coding, and intelligent agent capabilities to meet the complex demands of intelligent agent applications. Both GLM-4.5 and GLM-4.5-Air are hybrid reasoning models that provide two modes: thinking mode for complex reasoning and tool usage, and non-thinking mode for immediate responses. We have open-sourced the base models, hybrid reasoning models, and FP8 versions of the hybrid reasoning models for both GLM-4.5 and GLM-4.5-Air. They are released under the MIT open-source license and can be used commercially and for secondary development. As demonstrated in our comprehensive evaluation across 12 industry-standard benchmarks, GLM-4.5 achieves exceptional performance with a score of 63.2, in the 3rd place among all the proprietary and open-source models. Notably, GLM-4.5-Air delivers competitive results at 59.8 while maintaining superior efficiency. For more eval results, show cases, and technical details, please visit our technical blog. The technical report will be released soon. The model code, tool parser and reasoning parser can be found in the implementation of transformers, vLLM and SGLang.
chatglm-6b-int4
介绍 ChatGLM-6B 是一个开源的、支持中英双语问答的对话语言模型,基于 General Language Model (GLM) 架构,具有 62 亿参数。结合模型量化技术,用户可以在消费级的显卡上进行本地部署(INT4 量化级别下最低只需 6GB 显存)。ChatGLM-6B 使用了和 ChatGLM 相同的技术,针对中文问答和对话进行了优化。经过约 1T 标识符的中英双语训练,辅以监督微调、反馈自助、人类反馈强化学习等技术的加持,62 亿参数的 ChatGLM-6B 已经能生成相当符合人类偏好的回答。 ChatGLM-6B-INT4 是 ChatGLM-6B 量化后的模型权重。具体的,ChatGLM-6B-INT4 对 ChatGLM-6B 中的 28 个 GLM Block 进行了 INT4 量化,没有对 Embedding 和 LM Head 进行量化。量化后的模型理论上 6G 显存(使用 CPU 即内存)即可推理,具有在嵌入式设备(如树莓派)上运行的可能。 在 CPU 上运行时,会根据硬件自动编译 CPU Kernel ,请确保已安装 GCC 和 OpenMP (Linux一般已安装,对于Windows则需手动安装),以获得最佳并行计算能力。 关于更多的使用说明,包括如何运行命令行和网页版本的 DEMO,以及使用模型量化以节省显存,请参考我们的 Github Repo。 本仓库的代码依照 Apache-2.0 协议开源,ChatGLM-6B 模型的权重的使用则需要遵循 Model License。 If you find our work helpful, please consider citing the following paper.
glm-edge-4b-chat
glm-edge-v-2b-gguf
The code for adapting this model is actively being integrated into the official `llama.cpp`. You can test it using the following adapted version: After installation, you can start the GLM-Edge Chat model using the following command: In the command-line interface, you can interact with the model by entering your requests, and the model will provide the corresponding responses. The usage of this model’s weights is subject to the terms outlined in the LICENSE.
cogagent-9b-20241220
🌐 Github | 🤗 Huggingface Space | 📄 Technical Report | 📜 arxiv paper The `CogAgent-9B-20241220` model is based on GLM-4V-9B, a bilingual open-source VLM base model. Through data collection and optimization, multi-stage training, and strategy improvements, `CogAgent-9B-20241220` achieves significant advancements in GUI perception, inference prediction accuracy, action space completeness, and task generalizability. The model supports bilingual (Chinese and English) interaction with both screenshots and language input. This version of the CogAgent model has already been applied in ZhipuAI's GLM-PC product. We hope this release will assist researchers and developers in advancing the research and applications of GUI agents based on vision-language models. Please refer to our GitHub for specific examples of running the model. `cogagent-9b-20241220` is an agent execution model rather than a conversational model. It does not support continuous conversations but does support continuous execution history. Below are guidelines on how users should format their input for the model and interpret the formatted output. Please visit our github for specific running examples, as well as the part for prompt concatenation (this directly affects whether the model runs correctly) . In particular, pay attention to the prompt concatenation process. You can refer to app/client.py#L115 for concatenating user input prompts. A minimal user input concatenation code is as follows: Due to the length, if you would like to understand the meaning and representation of each field in detail, please refer to the GitHub. In November 2023, we released the first generation of CogAgent. You can find related code and weights in the CogVLM & CogAgent Official Repository. CogVLM 📖 Paper: CogVLM: Visual Expert for Pretrained Language Models CogVLM is a powerful open-source vision-language model (VLM). CogVLM-17B has 10 billion vision parameters and 7 billion language parameters, supporting 490x490 resolution image understanding and multi-turn conversations. CogVLM-17B achieved state-of-the-art performance on 10 classic cross-modal benchmarks including NoCaps, Flicker30k captioning, RefCOCO, RefCOCO+, RefCOCOg, Visual7W, GQA, ScienceQA, VizWiz VQA, and TDIUC benchmarks. CogAgent 📖 Paper: CogAgent: A Visual Language Model for GUI Agents CogAgent is an improved open-source vision-language model based on CogVLM. CogAgent-18B has 11 billion vision parameters and 7 billion language parameters, supporting image understanding at 1120x1120 resolution. Beyond CogVLM's capabilities, it also incorporates GUI agent capabilities. CogAgent-18B achieved state-of-the-art performance on 9 classic cross-modal benchmarks, including VQAv2, OK-VQ, TextVQA, ST-VQA, ChartQA, infoVQA, DocVQA, MM-Vet, and POPE benchmarks. It significantly outperformed existing models on GUI operation datasets like AITW and Mind2Web. Please follow the Model License for using the model weights.
Agentlm 7b
🤗 [Dataset] • 💻 [Github Repo] • 📌 [Project Page] • 📃 [Paper] AgentTuning represents the very first attempt to instruction-tune LLMs using interaction trajectories across multiple agent tasks. Evaluation results indicate that AgentTuning enables the agent capabilities of LLMs with robust generalization on unseen agent tasks while remaining good on general language abilities. We have open-sourced the AgentInstruct dataset and AgentLM. AgentLM models are produced by mixed training on AgentInstruct dataset and ShareGPT dataset from Llama-2-chat models. The models follow the conversation format of Llama-2-chat, with system prompt fixed as 7B, 13B, and 70B models are available on Huggingface model hub. |Model|Huggingface Repo| |---|---| |AgentLM-7B| 🤗Huggingface Repo | |AgentLM-13B| 🤗Huggingface Repo | |AgentLM-70B| 🤗Huggingface Repo | If you find our work useful, please consider citing AgentTuning:
cogvlm-grounding-generalist-hf
CogVLM 是一个强大的开源视觉语言模型(VLM)。CogVLM-17B 拥有 100 亿视觉参数和 70 亿语言参数,在 10 个经典跨模态基准测试上取得了 SOTA 性能,包括 NoCaps、Flicker30k captioning、RefCOCO、RefCOCO+、RefCOCOg、Visual7W、GQA、ScienceQA、VizWiz VQA 和 TDIUC,而在 VQAv2、OKVQA、TextVQA、COCO captioning 等方面则排名第二,超越或与 PaLI-X 55B 持平。您可以通过线上 demo 体验 CogVLM 多模态对话。 CogVLM is a powerful open-source visual language model (VLM). CogVLM-17B has 10 billion vision parameters and 7 billion language parameters. CogVLM-17B achieves state-of-the-art performance on 10 classic cross-modal benchmarks, including NoCaps, Flicker30k captioning, RefCOCO, RefCOCO+, RefCOCOg, Visual7W, GQA, ScienceQA, VizWiz VQA and TDIUC, and rank the 2nd on VQAv2, OKVQA, TextVQA, COCO captioning, etc., surpassing or matching PaLI-X 55B. CogVLM can also chat with you about images. CogVLM 模型包括四个基本组件:视觉变换器(ViT)编码器、MLP适配器、预训练的大型语言模型(GPT)和一个视觉专家模块。更多细节请参见Paper。 CogVLM model comprises four fundamental components: a vision transformer (ViT) encoder, an MLP adapter, a pretrained large language model (GPT), and a visual expert module. See Paper for more details. 此存储库中的代码是根据 Apache-2.0 许可 开放源码,而使用 CogVLM 模型权重必须遵循 模型许可。 The code in this repository is open source under the Apache-2.0 license, while the use of the CogVLM model weights must comply with the Model License. If you find our work helpful, please consider citing the following papers
glm-edge-1.5b-chat-gguf
The code for adapting this model is actively being integrated into the official `llama.cpp`. You can test it using the following adapted version: After installation, you can start the GLM-Edge Chat model using the following command: In the command-line interface, you can interact with the model by entering your requests, and the model will provide the corresponding responses. The usage of this model’s weights is subject to the terms outlined in the LICENSE.
glm-edge-4b-chat-gguf
The code for adapting this model is actively being integrated into the official `llama.cpp`. You can test it using the following adapted version: After installation, you can start the GLM-Edge Chat model using the following command: In the command-line interface, you can interact with the model by entering your requests, and the model will provide the corresponding responses. The usage of this model’s weights is subject to the terms outlined in the LICENSE.
cogagent-chat-hf
🔥 News: The new version `CogAgent-9B-20241220` has been released! Welcome to visit CogAgent GitHub and Technical Report to explore and use our latest model. 📄 Technical Report: https://cogagent.aminer.cn/blog#/articles/cogagent-9b-20241220-technical-report-en 🤗 Model Page: https://huggingface.co/THUDM/cogagent-9b-20241220 CogAgent is an open-source visual language model improved based on CogVLM. For more information such as demo, fine-tuning of `THUDM/cogagent-chat-hf`, and query prompts, please refer to This GitHub 📍 This is the ``cogagent-chat`` version of CogAgent checkpoint. We have open-sourced 2 versions of CogAgent checkpoints, and you can choose one based on your needs. 1. ``cogagent-chat``: This model has strong capabilities in GUI Agent, visual multi-turn dialogue, visual grounding, etc. If you need GUI Agent and Visual Grounding functions, or need to conduct multi-turn dialogues with a given image, we recommend using this version of the model. 3. ``cogagent-vqa``: This model has stronger capabilities in single-turn visual dialogue. If you need to work on VQA benchmarks (such as MMVET, VQAv2), we recommend using this model. CogAgent-18B has 11 billion visual and 7 billion language parameters. CogAgent demonstrates strong performance in image understanding and GUI agent: 1. CogAgent-18B achieves state-of-the-art generalist performance on 9 cross-modal benchmarks, including: VQAv2, MM-Vet, POPE, ST-VQA, OK-VQA, TextVQA, ChartQA, InfoVQA, DocVQA. 2. CogAgent-18B significantly surpasses existing models on GUI operation datasets, including AITW and Mind2Web. In addition to all the features already present in CogVLM (visual multi-round dialogue, visual grounding), CogAgent: 1. Supports higher resolution visual input and dialogue question-answering. It supports ultra-high-resolution image inputs of 1120x1120. 2. Possesses the capabilities of a visual Agent, being able to return a plan, next action, and specific operations with coordinates for any given task on any GUI screenshot. 3. Enhanced GUI-related question-answering capabilities, allowing it to handle questions about any GUI screenshot, such as web pages, PC apps, mobile applications, etc. 4. Enhanced capabilities in OCR-related tasks through improved pre-training and fine-tuning. Models weight in this repository for academic research is free. Users who wish to use the models for commercial purposes must register here. Registered users may use the models for commercial activities free of charge, but must comply with all terms and conditions of this license. The license notice shall be included in all copies or substantial portions of the Software. use this python code to get started quickly in `clidemo.py`: The code in this repository is open source under the Apache-2.0 license, while the use of CogAgent and CogVLM model weights must comply with the Model License. If you find our work helpful, please consider citing the following papers In the instruction fine-tuning phase of the CogVLM, there are some English image-text data from the MiniGPT-4, LLAVA, LRV-Instruction, LLaVAR and Shikra projects, as well as many classic cross-modal work datasets. We sincerely thank them for their contributions.
LongWriter Llama3.1 8b
🤗 [LongWriter Dataset] • 💻 [Github Repo] • 📃 [LongWriter Paper] LongWriter-llama3.1-8b is trained based on Meta-Llama-3.1-8B, and is capable of generating 10,000+ words at once. Please ahere to the prompt template (system prompt is optional): ` >\n{system prompt}\n >\n\n[INST]{query1}[/INST]{response1}[INST]{query2}[/INST]{response2}...`
glm-10b
GLM is a General Language Model pretrained with an autoregressive blank-filling objective and can be finetuned on various natural language understanding and generation tasks. Please refer to our paper for a detailed description of GLM: GLM: General Language Model Pretraining with Autoregressive Blank Infilling (ACL 2022) Zhengxiao Du, Yujie Qian, Xiao Liu, Ming Ding, Jiezhong Qiu, Zhilin Yang, Jie Tang (: equal contribution) Model description `glm-10b` is pretrained on the Pile dataset. It has 48 transformer layers, with hidden size 4096 and 64 attention heads in each layer. The model is pretrained with autoregressive blank filling objectives designed for natural language understanding, seq2seq, and language modeling. Find more details from our repo. How to use Please refer the instruction in our Github repo. We use three different mask tokens for different tasks: `[MASK]` for short blank filling, `[sMASK]` for sentence filling, and `[gMASK]` for left to right generation. You can find examples about different masks from here. The prediction always begin with a special ` ` token and ends with a ` ` token. Citation Please cite our paper if you find this code useful for your research:
visualglm-6b
💻 Github Repo • 🐦 Twitter • 📃 [GLM@ACL 22] [GitHub] • 📃 [GLM-130B@ICLR 23] [GitHub] 介绍 VisualGLM-6B 是一个开源的,支持图像、中文和英文的多模态对话语言模型,语言模型基于 ChatGLM-6B,具有 62 亿参数;图像部分通过训练 BLIP2-Qformer 构建起视觉模型与语言模型的桥梁,整体模型共78亿参数。 VisualGLM-6B 依靠来自于 CogView 数据集的30M高质量中文图文对,与300M经过筛选的英文图文对进行预训练,中英文权重相同。该训练方式较好地将视觉信息对齐到ChatGLM的语义空间;之后的微调阶段,模型在长视觉问答数据上训练,以生成符合人类偏好的答案。 关于更多的使用说明,包括如何运行命令行和网页版本的 DEMO,以及使用模型量化以节省显存,请参考我们的 Github Repo。 For more instructions, including how to run CLI and web demos, and model quantization, please refer to our Github Repo. 本仓库的代码依照 Apache-2.0 协议开源,VisualGLM-6B 模型的权重的使用则需要遵循 Model License。 If you find our work helpful, please consider citing the following paper.
SWE-Dev-7B
cogvlm2-llama3-chat-19B-int4
📍Experience the larger-scale CogVLM model on the ZhipuAI Open Platform . We launch a new generation of CogVLM2 series of models and open source two models built with Meta-Llama-3-8B-Instruct. Compared with the previous generation of CogVLM open source models, the CogVLM2 series of open source models have the following improvements: 1. Significant improvements in many benchmarks such as `TextVQA`, `DocVQA`. 2. Support 8K content length. 3. Support image resolution up to 1344 1344. 4. Provide an open source model version that supports both Chinese and English. CogVlM2 Int4 model requires 16G GPU memory and Must be run on Linux with Nvidia GPU. | Model name | cogvlm2-llama3-chat-19B-int4 | cogvlm2-llama3-chat-19B | |---------------------|------------------------------|-------------------------| | GPU Memory Required | 16G | 42G | | System Required | Linux (With Nvidia GPU) | Linux (With Nvidia GPU) | Our open source models have achieved good results in many lists compared to the previous generation of CogVLM open source models. Its excellent performance can compete with some non-open source models, as shown in the table below: | Model | Open Source | LLM Size | TextVQA | DocVQA | ChartQA | OCRbench | MMMU | MMVet | MMBench | |--------------------------------|-------------|----------|----------|----------|----------|----------|----------|----------|----------| | CogVLM1.1 | ✅ | 7B | 69.7 | - | 68.3 | 590 | 37.3 | 52.0 | 65.8 | | LLaVA-1.5 | ✅ | 13B | 61.3 | - | - | 337 | 37.0 | 35.4 | 67.7 | | Mini-Gemini | ✅ | 34B | 74.1 | - | - | - | 48.0 | 59.3 | 80.6 | | LLaVA-NeXT-LLaMA3 | ✅ | 8B | - | 78.2 | 69.5 | - | 41.7 | - | 72.1 | | LLaVA-NeXT-110B | ✅ | 110B | - | 85.7 | 79.7 | - | 49.1 | - | 80.5 | | InternVL-1.5 | ✅ | 20B | 80.6 | 90.9 | 83.8 | 720 | 46.8 | 55.4 | 82.3 | | QwenVL-Plus | ❌ | - | 78.9 | 91.4 | 78.1 | 726 | 51.4 | 55.7 | 67.0 | | Claude3-Opus | ❌ | - | - | 89.3 | 80.8 | 694 | 59.4 | 51.7 | 63.3 | | Gemini Pro 1.5 | ❌ | - | 73.5 | 86.5 | 81.3 | - | 58.5 | - | - | | GPT-4V | ❌ | - | 78.0 | 88.4 | 78.5 | 656 | 56.8 | 67.7 | 75.0 | | CogVLM2-LLaMA3 (Ours) | ✅ | 8B | 84.2 | 92.3 | 81.0 | 756 | 44.3 | 60.4 | 80.5 | | CogVLM2-LLaMA3-Chinese (Ours) | ✅ | 8B | 85.0 | 88.4 | 74.7 | 780 | 42.8 | 60.5 | 78.9 | All reviews were obtained without using any external OCR tools ("pixel only"). here is a simple example of how to use the model to chat with the CogVLM2 model. For More use case. Find in our github This model is released under the CogVLM2 LICENSE. For models built with Meta Llama 3, please also adhere to the LLAMA3LICENSE. If you find our work helpful, please consider citing the following papers
codegeex2-6b
🏠 Homepage |💻 GitHub |🛠 Tools VS Code , Jetbrains |🤗 HF Repo |📄 Paper CodeGeeX2: 更强大的多语言代码生成模型 A More Powerful Multilingual Code Generation Model CodeGeeX2 是多语言代码生成模型 CodeGeeX (KDD’23) 的第二代模型。CodeGeeX2 基于 ChatGLM2 架构加入代码预训练实现,得益于 ChatGLM2 的更优性能,CodeGeeX2 在多项指标上取得性能提升(+107% > CodeGeeX;仅60亿参数即超过150亿参数的 StarCoder-15B 近10%),更多特性包括: 更强大的代码能力:基于 ChatGLM2-6B 基座语言模型,CodeGeeX2-6B 进一步经过了 600B 代码数据预训练,相比一代模型,在代码能力上全面提升,HumanEval-X 评测集的六种编程语言均大幅提升 (Python +57%, C++ +71%, Java +54%, JavaScript +83%, Go +56%, Rust +321\%),在Python上达到 35.9\% 的 Pass@1 一次通过率,超越规模更大的 StarCoder-15B。 更优秀的模型特性:继承 ChatGLM2-6B 模型特性,CodeGeeX2-6B 更好支持中英文输入,支持最大 8192 序列长度,推理速度较一代 CodeGeeX-13B 大幅提升,量化后仅需6GB显存即可运行,支持轻量级本地化部署。 更全面的AI编程助手:CodeGeeX插件(VS Code, Jetbrains)后端升级,支持超过100种编程语言,新增上下文补全、跨文件补全等实用功能。结合 Ask CodeGeeX 交互式AI编程助手,支持中英文对话解决各种编程问题,包括且不限于代码解释、代码翻译、代码纠错、文档生成等,帮助程序员更高效开发。 更开放的协议:CodeGeeX2-6B 权重对学术研究完全开放,填写登记表申请商业使用。 CodeGeeX2 is the second-generation model of the multilingual code generation model CodeGeeX (KDD’23), which is implemented based on the ChatGLM2 architecture trained on more code data. Due to the advantage of ChatGLM2, CodeGeeX2 has been comprehensively improved in coding capability (+107% > CodeGeeX; with only 6B parameters, surpassing larger StarCoder-15B for some tasks). It has the following features: More Powerful Coding Capabilities: Based on the ChatGLM2-6B model, CodeGeeX2-6B has been further pre-trained on 600B code tokens, which has been comprehensively improved in coding capability compared to the first-generation. On the HumanEval-X benchmark, all six languages have been significantly improved (Python +57%, C++ +71%, Java +54%, JavaScript +83%, Go +56%, Rust +321\%), and in Python it reached 35.9% of Pass@1 one-time pass rate, surpassing the larger StarCoder-15B. More Useful Features: Inheriting the ChatGLM2-6B model features, CodeGeeX2-6B better supports both Chinese and English prompts, maximum 8192 sequence length, and the inference speed is significantly improved compared to the first-generation. After quantization, it only needs 6GB of GPU memory for inference, thus supports lightweight local deployment. Comprehensive AI Coding Assistant: The backend of CodeGeeX plugin (VS Code, Jetbrains) is upgraded, supporting 100+ programming languages, and adding practical functions such as infilling and cross-file completion. Combined with the "Ask CodeGeeX" interactive AI coding assistant, it can be used to solve various programming problems via Chinese or English dialogue, including but not limited to code summarization, code translation, debugging, and comment generation, which helps increasing the efficiency of developpers. Open Liscense: CodeGeeX2-6B weights are fully open to academic research, and please apply for commercial use by filling in the registration form. For more information, please refer to CodeGeeX2's Github Repo. 本仓库的代码依照 Apache-2.0 协议开源,模型的权重的使用则需要遵循 Model License。 The code in this repository is open source under the Apache-2.0 license. The model weights are licensed under the Model License. If you find our work helpful, please feel free to cite the following paper:
cogvlm2-video-llama3-chat
CogVLM2-Video achieves state-of-the-art performance on multiple video question answering tasks. It can achieve video understanding within one minute. We provide two example videos to demonstrate CogVLM2-Video's video understanding and video temporal grounding capabilities. The following diagram shows the performance of CogVLM2-Video on the MVBench, VideoChatGPT-Bench and Zero-shot VideoQA datasets (MSVD-QA, MSRVTT-QA, ActivityNet-QA). Where VCG- refers to the VideoChatGPTBench, ZS- refers to Zero-Shot VideoQA datasets and MV- refers to main categories in the MVBench. Performance on VideoChatGPT-Bench and Zero-shot VideoQA dataset: | Models | VCG-AVG | VCG-CI | VCG-DO | VCG-CU | VCG-TU | VCG-CO | ZS-AVG | |-----------------------|----------|----------|----------|----------|----------|----------|-----------| | IG-VLM GPT4V | 3.17 | 3.40 | 2.80 | 3.61 | 2.89 | 3.13 | 65.70 | | ST-LLM | 3.15 | 3.23 | 3.05 | 3.74 | 2.93 | 2.81 | 62.90 | | ShareGPT4Video | N/A | N/A | N/A | N/A | N/A | N/A | 46.50 | | VideoGPT+ | 3.28 | 3.27 | 3.18 | 3.74 | 2.83 | 3.39 | 61.20 | | VideoChat2HDmistral | 3.10 | 3.40 | 2.91 | 3.72 | 2.65 | 2.84 | 57.70 | | PLLaVA-34B | 3.32 | 3.60 | 3.20 | 3.90 | 2.67 | 3.25 | 68.10 | | CogVLM2-Video | 3.41 | 3.49 | 3.46 | 3.87 | 2.98 | 3.23 | 66.60 | | Models | AVG | AA | AC | AL | AP | AS | CO | CI | EN | ER | FA | FP | MA | MC | MD | OE | OI | OS | ST | SC | UA | |-----------------------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------| | IG-VLM GPT4V | 43.7 | 72.0 | 39.0 | 40.5 | 63.5 | 55.5 | 52.0 | 11.0 | 31.0 | 59.0 | 46.5 | 47.5 | 22.5 | 12.0 | 12.0 | 18.5 | 59.0 | 29.5 | 83.5 | 45.0 | 73.5 | | ST-LLM | 54.9 | 84.0 | 36.5 | 31.0 | 53.5 | 66.0 | 46.5 | 58.5 | 34.5 | 41.5 | 44.0 | 44.5 | 78.5 | 56.5 | 42.5 | 80.5 | 73.5 | 38.5 | 86.5 | 43.0 | 58.5 | | ShareGPT4Video | 51.2 | 79.5 | 35.5 | 41.5 | 39.5 | 49.5 | 46.5 | 51.5 | 28.5 | 39.0 | 40.0 | 25.5 | 75.0 | 62.5 | 50.5 | 82.5 | 54.5 | 32.5 | 84.5 | 51.0 | 54.5 | | VideoGPT+ | 58.7 | 83.0 | 39.5 | 34.0 | 60.0 | 69.0 | 50.0 | 60.0 | 29.5 | 44.0 | 48.5 | 53.0 | 90.5 | 71.0 | 44.0 | 85.5 | 75.5 | 36.0 | 89.5 | 45.0 | 66.5 | | VideoChat2HDmistral | 62.3 | 79.5 | 60.0 | 87.5 | 50.0 | 68.5 | 93.5 | 71.5 | 36.5 | 45.0 | 49.5 | 87.0 | 40.0 | 76.0 | 92.0 | 53.0 | 62.0 | 45.5 | 36.0 | 44.0 | 69.5 | | PLLaVA-34B | 58.1 | 82.0 | 40.5 | 49.5 | 53.0 | 67.5 | 66.5 | 59.0 | 39.5 | 63.5 | 47.0 | 50.0 | 70.0 | 43.0 | 37.5 | 68.5 | 67.5 | 36.5 | 91.0 | 51.5 | 79.0 | | CogVLM2-Video | 62.3 | 85.5 | 41.5 | 31.5 | 65.5 | 79.5 | 58.5 | 77.0 | 28.5 | 42.5 | 54.0 | 57.0 | 91.5 | 73.0 | 48.0 | 91.0 | 78.0 | 36.0 | 91.5 | 47.0 | 68.5 | We follow the previous works to evaluate the performance of our model. In different benchmarks, we craft task-specific prompts for each benchmark: For evaluation codes, please refer to the evaluation script in PLLaVA. This repository is a `chat` version model and it support single-round chat. You can quickly install the Python package dependencies and run model inference in our github. This model is released under the CogVLM2 LICENSE. For models built with Meta Llama 3, please also adhere to the LLAMA3LICENSE. Pleaser refer to our technical report for training formula and hyperparameters.
glm-4-9b-hf
chatglm3-6b-32k
💻 Github Repo • 🐦 Twitter • 📃 [GLM@ACL 22] [GitHub] • 📃 [GLM-130B@ICLR 23] [GitHub] 📍Experience the larger-scale ChatGLM model at chatglm.cn ChatGLM3-6B-32K在ChatGLM3-6B的基础上进一步强化了对于长文本的理解能力,能够更好的处理最多32K长度的上下文。具体地,我们对位置编码进行了更新,并设计了更有针对性的长文本训练方法,在对话阶段使用 32K 的上下文长度训练。在实际的使用中,如果您面临的上下文长度基本在 8K 以内,我们推荐使用ChatGLM3-6B;如果您需要处理超过 8K 的上下文长度,我们推荐使用ChatGLM3-6B-32K。 ChatGLM3-6B 是 ChatGLM 系列最新一代的开源模型,在保留了前两代模型对话流畅、部署门槛低等众多优秀特性的基础上,ChatGLM3-6B 引入了如下特性: 1. 更强大的基础模型: ChatGLM3-6B 的基础模型 ChatGLM3-6B-Base 采用了更多样的训练数据、更充分的训练步数和更合理的训练策略。在语义、数学、推理、代码、知识等不同角度的数据集上测评显示,ChatGLM3-6B-Base 具有在 10B 以下的预训练模型中最强的性能。 2. 更完整的功能支持: ChatGLM3-6B 采用了全新设计的 Prompt 格式,除正常的多轮对话外。同时原生支持工具调用(Function Call)、代码执行(Code Interpreter)和 Agent 任务等复杂场景。 3. 更全面的开源序列: 除了对话模型 ChatGLM3-6B 外,还开源了基础模型 ChatGLM-6B-Base、长文本对话模型 ChatGLM3-6B-32K。以上所有权重对学术研究完全开放,在填写问卷进行登记后亦允许免费商业使用。 Based on ChatGLM3-6B, ChatGLM3-6B-32K further strengthens the ability to understand long texts and can better handle contexts up to 32K in length. Specifically, we update the position encoding and design a more targeted long text training method, using a context length of 32K for training in the conversation stage. In actual use, if the context length you face is basically within 8K, we recommend using ChatGLM3-6B; if you need to handle For context lengths exceeding 8K, we recommend using ChatGLM3-6B-32K. ChatGLM3-6B is the latest open-source model in the ChatGLM series. While retaining many excellent features such as smooth dialogue and low deployment threshold from the previous two generations, ChatGLM3-6B introduces the following features: 1. More Powerful Base Model: The base model of ChatGLM3-6B, ChatGLM3-6B-Base, employs a more diverse training dataset, more sufficient training steps, and a more reasonable training strategy. Evaluations on datasets such as semantics, mathematics, reasoning, code, knowledge, etc., show that ChatGLM3-6B-Base has the strongest performance among pre-trained models under 10B. 2. More Comprehensive Function Support: ChatGLM3-6B adopts a newly designed Prompt format, in addition to the normal multi-turn dialogue. It also natively supports function call, code interpreter, and complex scenarios such as agent tasks. 3. More Comprehensive Open-source Series: In addition to the dialogue model ChatGLM3-6B, the base model ChatGLM-6B-Base and the long-text dialogue model ChatGLM3-6B-32K are also open-sourced. All the weights are fully open for academic research, and after completing the questionnaire registration, they are also allowed for free commercial use. 关于更多的使用说明,包括如何运行命令行和网页版本的 DEMO,以及使用模型量化以节省显存,请参考我们的 Github Repo。 For more instructions, including how to run CLI and web demos, and model quantization, please refer to our Github Repo. 本仓库的代码依照 Apache-2.0 协议开源,ChatGLM3-6B 模型的权重的使用则需要遵循 Model License。 The code in this repository is open-sourced under the Apache-2.0 license, while the use of the ChatGLM3-6B model weights needs to comply with the Model License. If you find our work helpful, please consider citing the following paper.
webrl-orm-llama-3.1-8b
chatglm2-6b-32k
💻 Github Repo • 🐦 Twitter • 📃 [GLM@ACL 22] [GitHub] • 📃 [GLM-130B@ICLR 23] [GitHub] - 我们优化了KV Cache的存储方式,减少了显存碎片的产生。基于优化后的代码,模型可以在约20G显存的情况下处理32K长度的上下文(FP/BF16格式)。 - We have optimized the storage method of the KV Cache, reducing the generation of memory fragmentation. Based on the optimized code, the model can process a context length of 32K under approximately 20G of memory (FP/BF16 format). ChatGLM2-6B-32K在ChatGLM2-6B的基础上进一步强化了对于长文本的理解能力,能够更好的处理最多32K长度的上下文。具体地,我们基于位置插值(Positional Interpolation)的方法对位置编码进行了更新,并在对话阶段使用 32K 的上下文长度训练。在实际的使用中,如果您面临的上下文长度基本在 8K 以内,我们推荐使用ChatGLM2-6B;如果您需要处理超过 8K 的上下文长度,我们推荐使用ChatGLM2-6B-32K。 ChatGLM2-6B-32K是开源中英双语对话模型 ChatGLM2-6B 的加长版本,在保留了初代模型对话流畅、部署门槛较低等众多优秀特性的基础之上,ChatGLM2-6B-32k 引入了如下新特性: 1. 更强大的性能:基于 ChatGLM 初代模型的开发经验,我们全面升级了 ChatGLM2-6B-32K 的基座模型。ChatGLM2-6B-32K 使用了 GLM 的混合目标函数,经过了 1.4T 中英标识符的预训练与人类偏好对齐训练。 2. 更长的上下文:基于 FlashAttention 技术,我们将基座模型的上下文长度(Context Length)由 ChatGLM-6B 的 2K 扩展到了 32K,并在对话阶段使用 32K 的上下文长度训练,允许更多轮次的对话。 3. 更高效的推理:基于 Multi-Query Attention 技术,ChatGLM2-6B-32K 有更高效的推理速度和更低的显存占用:在官方的模型实现下,推理速度相比初代提升了 42%,INT4 量化下,6G 显存支持的对话长度由 1K 提升到了 8K。 4. 更开放的协议:ChatGLM2-6B-32K 权重对学术研究完全开放,在填写问卷进行登记后亦允许免费商业使用。 The ChatGLM2-6B-32K further strengthens the ability to understand long texts based on the ChatGLM2-6B, and can better handle up to 32K context length. Specifically, we have updated the position encoding based on the method of Positional Interpolation, and trained with a 32K context length during the dialogue alignment. In practical use, if the context length you are dealing with is generally within 8K, we recommend using ChatGLM2-6B; if you need to handle a context length exceeding 8K, we recommend using ChatGLM2-6B-32K. ChatGLM2-6B-32K is the second-generation version of the open-source bilingual (Chinese-English) chat model ChatGLM-6B. It retains the smooth conversation flow and low deployment threshold of the first-generation model, while introducing the following new features: 1. Stronger Performance: Based on the development experience of the first-generation ChatGLM model, we have fully upgraded the base model of ChatGLM2-6B-32K. ChatGLM2-6B-32K uses the hybrid objective function of GLM, and has undergone pre-training with 1.4T bilingual tokens and human preference alignment training. 2. Longer Context: Based on FlashAttention technique, we have extended the context length of the base model from 2K in ChatGLM-6B to 32K, and trained with a context length of 32K during the dialogue alignment, allowing for more rounds of dialogue. 3. More Efficient Inference: Based on Multi-Query Attention technique, ChatGLM2-6B-32K has more efficient inference speed and lower GPU memory usage: under the official implementation, the inference speed has increased by 42% compared to the first generation; under INT4 quantization, the dialogue length supported by 6G GPU memory has increased from 1K to 8K. 4. More Open License: ChatGLM2-6B-32K weights are completely open for academic research, and free commercial use is also allowed after completing the questionnaire. 关于更多的使用说明,包括如何运行命令行和网页版本的 DEMO,以及使用模型量化以节省显存,请参考我们的 Github Repo。 For more instructions, including how to run CLI and web demos, and model quantization, please refer to our Github Repo. 本仓库的代码依照 Apache-2.0 协议开源,ChatGLM2-6B-32K 模型的权重的使用则需要遵循 Model License。 If you find our work helpful, please consider citing the following paper.
glm-edge-v-5b
glm-4-9b-chat-1m-hf
webrl-glm-4-9b
cogvlm2-llama3-chinese-chat-19B
👋 Wechat · 💡 Online Demo · 🎈 Github Page · 📑 Paper 📍Experience the larger-scale CogVLM model on the ZhipuAI Open Platform . We launch a new generation of CogVLM2 series of models and open source two models built with Meta-Llama-3-8B-Instruct. Compared with the previous generation of CogVLM open source models, the CogVLM2 series of open source models have the following improvements: 1. Significant improvements in many benchmarks such as `TextVQA`, `DocVQA`. 2. Support 8K content length. 3. Support image resolution up to 1344 1344. 4. Provide an open source model version that supports both Chinese and English. You can see the details of the CogVLM2 family of open source models in the table below: | Model name | cogvlm2-llama3-chat-19B | cogvlm2-llama3-chinese-chat-19B | |------------------|-------------------------------------|-------------------------------------| | Base Model | Meta-Llama-3-8B-Instruct | Meta-Llama-3-8B-Instruct | | Language | English | Chinese, English | | Model size | 19B | 19B | | Task | Image understanding, dialogue model | Image understanding, dialogue model | | Text length | 8K | 8K | | Image resolution | 1344 1344 | 1344 1344 | Our open source models have achieved good results in many lists compared to the previous generation of CogVLM open source models. Its excellent performance can compete with some non-open source models, as shown in the table below: | Model | Open Source | LLM Size | TextVQA | DocVQA | ChartQA | OCRbench | VCREASY | VCRHARD | MMMU | MMVet | MMBench | |----------------------------|-------------|----------|----------|----------|----------|----------|-------------|-------------|----------|----------|----------| | CogVLM1.1 | ✅ | 7B | 69.7 | - | 68.3 | 590 | 73.9 | 34.6 | 37.3 | 52.0 | 65.8 | | LLaVA-1.5 | ✅ | 13B | 61.3 | - | - | 337 | - | - | 37.0 | 35.4 | 67.7 | | Mini-Gemini | ✅ | 34B | 74.1 | - | - | - | - | - | 48.0 | 59.3 | 80.6 | | LLaVA-NeXT-LLaMA3 | ✅ | 8B | - | 78.2 | 69.5 | - | - | - | 41.7 | - | 72.1 | | LLaVA-NeXT-110B | ✅ | 110B | - | 85.7 | 79.7 | - | - | - | 49.1 | - | 80.5 | | InternVL-1.5 | ✅ | 20B | 80.6 | 90.9 | 83.8 | 720 | 14.7 | 2.0 | 46.8 | 55.4 | 82.3 | | QwenVL-Plus | ❌ | - | 78.9 | 91.4 | 78.1 | 726 | - | - | 51.4 | 55.7 | 67.0 | | Claude3-Opus | ❌ | - | - | 89.3 | 80.8 | 694 | 63.85 | 37.8 | 59.4 | 51.7 | 63.3 | | Gemini Pro 1.5 | ❌ | - | 73.5 | 86.5 | 81.3 | - | 62.73 | 28.1 | 58.5 | - | - | | GPT-4V | ❌ | - | 78.0 | 88.4 | 78.5 | 656 | 52.04 | 25.8 | 56.8 | 67.7 | 75.0 | | CogVLM2-LLaMA3 | ✅ | 8B | 84.2 | 92.3 | 81.0 | 756 | 83.3 | 38.0 | 44.3 | 60.4 | 80.5 | | CogVLM2-LLaMA3-Chinese | ✅ | 8B | 85.0 | 88.4 | 74.7 | 780 | 79.9 | 25.1 | 42.8 | 60.5 | 78.9 | All reviews were obtained without using any external OCR tools ("pixel only"). Quick Start here is a simple example of how to use the model to chat with the CogVLM2 model. For More use case. Find in our github This model is released under the CogVLM2 LICENSE. For models built with Meta Llama 3, please also adhere to the LLAMA3LICENSE. If you find our work helpful, please consider citing the following papers
LongCite-llama3.1-8b
🤗 [LongCite Dataset] • 💻 [Github Repo] • 📃 [LongCite Paper] LongCite-llama3.1-8b is trained based on Meta-Llama-3.1-8B, and is capable of generating fine-grained citations in long-context question answering. The model supports a maximum context window of up to 128K tokens. You can also deploy the model with vllm. See the code example in vllminference.py. If you find our work useful, please consider citing LongCite:
webrl-llama-3.1-8b
glm-large-chinese
GLM is a General Language Model pretrained with an autoregressive blank-filling objective and can be finetuned on various natural language understanding and generation tasks. Please refer to our paper for a detailed description of GLM: GLM: General Language Model Pretraining with Autoregressive Blank Infilling (ACL 2022) Zhengxiao Du, Yujie Qian, Xiao Liu, Ming Ding, Jiezhong Qiu, Zhilin Yang, Jie Tang (: equal contribution) Model description `glm-large-chinese` is pretrained on the WuDaoCorpora dataset. It has 24 transformer layers, with hidden size 1024 and 16 attention heads in each layer. The model is pretrained with autoregressive blank filling objectives designed for natural language understanding, seq2seq, and language modeling. How to use Please refer the instruction in our Github repo. We use three different mask tokens for different tasks: `[MASK]` for short blank filling, `[sMASK]` for sentence filling, and `[gMASK]` for left to right generation. You can find examples about different masks from here. The prediction always begin with a special ` ` token and ends with a ` ` token. Citation Please cite our paper if you find this code useful for your research:
SWE-Dev-32B
- 🤗 SWE-Dev-7B (Qwen-2.5-Coder-7B-Instruct) - 🤗 SWE-Dev-9B (GLM-4-9B-Chat) - 🤗 SWE-Dev-32B (Qwen-2.5-Coder-32B-Instruct) - 🤗 SWE-Dev-train (Training Data) 🚀 SWE-Dev, an open-source Agent for Software Engineering tasks! This repository contains the SWE-Dev-32B model as presented in the paper SWE-Dev: Building Software Engineering Agents with Training and Inference Scaling. 💡 We develop a comprehensive pipeline for creating developer-oriented datasets from GitHub repositories, including issue tracking, code localization, test case generation, and evaluation. 🔧 Based on open-source frameworks (OpenHands) and models, SWE-Dev-7B and 32B achieved solve rates of 23.4% and 36.6% on SWE-bench-Verified, respectively, even approaching the performance of GPT-4o. 📚 We find that training data scaling and inference scaling can both effectively boost the performance of models on SWE-bench. Moreover, higher data quality further improves this trend when combined with reinforcement fine-tuning (RFT). For inference scaling specifically, the solve rate on SWE-Dev increased from 34.0% at 30 rounds to 36.6% at 75 rounds.
cogvlm-grounding-base-hf
cogvlm2-video-llama3-base
BPO
Black-Box Prompt Optimization: Aligning Large Language Models without Model Training - Repository: https://github.com/thu-coai/BPO - Paper: https://arxiv.org/abs/2311.04155 - Data: https://huggingface.co/datasets/THUDM/BPO Black-box Prompt Optimization (BPO) BPO is a black-box alignment technique that differs from training-based methods (like PPO or DPO). BPO only requires training of a plug-and-play model and optimizes LLMs through optimizing user inputs. Therefore, it can be used on a variety of open-source or API-based LLMs. Data Prompt优化模型由隐含人类偏好特征的prompt优化对训练得到,数据集的详细信息在这里。 The Prompt Optimization Model is trained on prompt optimization pairs which contain human preference features. Detailed information on the dataset can be found here. Backbone Model The prompt preference optimizer is built on `Llama-2-7b-chat-hf`. | Model A| Model B | A win | tie | B win | |-------------|-------------|----|----|----| | gpt-3.5-turbo + BPO | gpt-3.5-turbo | 60.0 | 8.7 | 31.3 | | claude-2 + BPO | claude-2 | 57.5 | 5.0 | 37.5 | | llama-2-13b-chat + BPO | llama-2-70b-chat | 61.3 | 0.0 | 38.7 | | vicuna-13b + BPO | vicuna-13b + PPO | 52.5 | 3.7 | 43.7 | | vicuna-13b + BPO | vicuna-13b + DPO | 53.8 | 2.5 | 43.7 | | vicuna-13b + DPO + BPO | vicuna-13b + DPO | 60.0 | 2.5 | 37.5 | Inference code Here is an example code for inference: See our Github Repo for more detailed usage (e.g. more aggressive optimization). Other Known Limitations - Task coverage is not sufficient, as we only used open-source data to get about 14k optimized prompts. Clearly, it is impossible to cover a wide range of user queries, so the current model may not perform well on every prompt. - Due to the small ratio of long-context-based tasks and mathematical problems, the prompt optimizer underperforms when dealing with these tasks. Citation If you find our model is useful in your work, please cite it with:
cogvlm-base-490-hf
cogvlm-base-224-hf
cogagent-vqa-hf
CogAgent is an open-source visual language model improved based on CogVLM. 🚀 GitHub: For more information such as demo, fine-tuning, and query prompts, please refer to Our GitHub 📍 This is the ``cogagent-vqa`` version of CogAgent checkpoint. We have open-sourced 2 versions of CogAgent checkpoints, and you can choose one based on your needs. 1. ``cogagent-chat``: This model has strong capabilities in GUI Agent, visual multi-turn dialogue, visual grounding, etc. If you need GUI Agent and Visual Grounding functions, or need to conduct multi-turn dialogues with a given image, we recommend using this version of the model. 3. ``cogagent-vqa``: This model has stronger capabilities in single-turn visual dialogue. If you need to work on VQA benchmarks (such as MMVET, VQAv2), we recommend using this model. CogAgent-18B has 11 billion visual and 7 billion language parameters. CogAgent demonstrates strong performance in image understanding and GUI agent: 1. CogAgent-18B achieves state-of-the-art generalist performance on 9 cross-modal benchmarks, including: VQAv2, MM-Vet, POPE, ST-VQA, OK-VQA, TextVQA, ChartQA, InfoVQA, DocVQA. 2. CogAgent-18B significantly surpasses existing models on GUI operation datasets, including AITW and Mind2Web. In addition to all the features already present in CogVLM (visual multi-round dialogue, visual grounding), CogAgent: 1. Supports higher resolution visual input and dialogue question-answering. It supports ultra-high-resolution image inputs of 1120x1120. 2. Possesses the capabilities of a visual Agent, being able to return a plan, next action, and specific operations with coordinates for any given task on any GUI screenshot. 3. Enhanced GUI-related question-answering capabilities, allowing it to handle questions about any GUI screenshot, such as web pages, PC apps, mobile applications, etc. 4. Enhanced capabilities in OCR-related tasks through improved pre-training and fine-tuning. Models weight in this repository for academic research is free. Users who wish to use the models for commercial purposes must register here. Registered users may use the models for commercial activities free of charge, but must comply with all terms and conditions of this license. The license notice shall be included in all copies or substantial portions of the Software. use this python code to get started quickly in `clidemo.py`: The code in this repository is open source under the Apache-2.0 license, while the use of CogAgent and CogVLM model weights must comply with the Model License. If you find our work helpful, please consider citing the following papers In the instruction fine-tuning phase of the CogVLM, there are some English image-text data from the MiniGPT-4, LLAVA, LRV-Instruction, LLaVAR and Shikra projects, as well as many classic cross-modal work datasets. We sincerely thank them for their contributions.
cogvlm2-llama3-chinese-chat-19B-int4
GLM-Z1-Rumination-32B-0414
The GLM family welcomes a new generation of open-source models, the GLM-4-32B-0414 series, featuring 32 billion parameters. Its performance is comparable to OpenAI's GPT series and DeepSeek's V3/R1 series, and it supports very user-friendly local deployment features. GLM-4-32B-Base-0414 was pre-trained on 15T of high-quality data, including a large amount of reasoning-type synthetic data, laying the foundation for subsequent reinforcement learning extensions. In the post-training stage, in addition to human preference alignment for dialogue scenarios, we also enhanced the model's performance in instruction following, engineering code, and function calling using techniques such as rejection sampling and reinforcement learning, strengthening the atomic capabilities required for agent tasks. GLM-4-32B-0414 achieves good results in areas such as engineering code, Artifact generation, function calling, search-based Q&A, and report generation. Some benchmarks even rival larger models like GPT-4o and DeepSeek-V3-0324 (671B). GLM-Z1-32B-0414 is a reasoning model with deep thinking capabilities. This was developed based on GLM-4-32B-0414 through cold start and extended reinforcement learning, as well as further training of the model on tasks involving mathematics, code, and logic. Compared to the base model, GLM-Z1-32B-0414 significantly improves mathematical abilities and the capability to solve complex tasks. During the training process, we also introduced general reinforcement learning based on pairwise ranking feedback, further enhancing the model's general capabilities. GLM-Z1-Rumination-32B-0414 is a deep reasoning model with rumination capabilities (benchmarked against OpenAI's Deep Research). Unlike typical deep thinking models, the rumination model employs longer periods of deep thought to solve more open-ended and complex problems (e.g., writing a comparative analysis of AI development in two cities and their future development plans). The rumination model integrates search tools during its deep thinking process to handle complex tasks and is trained by utilizing multiple rule-based rewards to guide and extend end-to-end reinforcement learning. Z1-Rumination shows significant improvements in research-style writing and complex retrieval tasks. Finally, GLM-Z1-9B-0414 is a surprise. We employed the aforementioned series of techniques to train a 9B small-sized model that maintains the open-source tradition. Despite its smaller scale, GLM-Z1-9B-0414 still exhibits excellent capabilities in mathematical reasoning and general tasks. Its overall performance is already at a leading level among open-source models of the same size. Especially in resource-constrained scenarios, this model achieves an excellent balance between efficiency and effectiveness, providing a powerful option for users seeking lightweight deployment. By default, this model currently supports the following `function` calls: - `search`: Search using a keyword and return search results - `click`: Click on a specific webpage in the search results to view details - `open`: Open a fixed URL to view detailed content - `finsih`: Complete information gathering and begin writing Below is a simple workflow to help you quickly connect the pipeline.
agentlm-13b
🤗 [Dataset] • 💻 [Github Repo] • 📌 [Project Page] • 📃 [Paper] AgentTuning represents the very first attempt to instruction-tune LLMs using interaction trajectories across multiple agent tasks. Evaluation results indicate that AgentTuning enables the agent capabilities of LLMs with robust generalization on unseen agent tasks while remaining good on general language abilities. We have open-sourced the AgentInstruct dataset and AgentLM. AgentLM models are produced by mixed training on AgentInstruct dataset and ShareGPT dataset from Llama-2-chat models. The models follow the conversation format of Llama-2-chat, with system prompt fixed as 7B, 13B, and 70B models are available on Huggingface model hub. |Model|Huggingface Repo| |---|---| |AgentLM-7B| 🤗Huggingface Repo | |AgentLM-13B| 🤗Huggingface Repo | |AgentLM-70B| 🤗Huggingface Repo | If you find our work useful, please consider citing AgentTuning:
LongAlign-13B-64k
LongAlign-7B-64k
LongReward-glm4-9b-DPO
apar-7b
LongAlign-6B-64k-base
LongCite-glm4-9b
🤗 [LongCite Dataset] • 💻 [Github Repo] • 📃 [LongCite Paper] LongCite-glm4-9b is trained based on glm-4-9b, and is capable of generating fine-grained citations in long-context question answering. The model supports a maximum context window of up to 128K tokens. Environment: Same environment requirement as glm-4-9b-chat (`transforemrs>=4.43.0`). You can also deploy the model with vllm. See the code example in vllminference.py. If you find our work useful, please consider citing LongCite:
LongAlign-13B-64k-base
androidgen-llama-3-70b
agentlm-70b
🤗 [Dataset] • 💻 [Github Repo] • 📌 [Project Page] • 📃 [Paper] AgentTuning represents the very first attempt to instruction-tune LLMs using interaction trajectories across multiple agent tasks. Evaluation results indicate that AgentTuning enables the agent capabilities of LLMs with robust generalization on unseen agent tasks while remaining good on general language abilities. We have open-sourced the AgentInstruct dataset and AgentLM. AgentLM models are produced by mixed training on AgentInstruct dataset and ShareGPT dataset from Llama-2-chat models. The models follow the conversation format of Llama-2-chat, with system prompt fixed as 7B, 13B, and 70B models are available on Huggingface model hub. |Model|Huggingface Repo| |---|---| |AgentLM-7B| 🤗Huggingface Repo | |AgentLM-13B| 🤗Huggingface Repo | |AgentLM-70B| 🤗Huggingface Repo | If you find our work useful, please consider citing AgentTuning:
CogView3-Plus-3B
📄 中文阅读 | 🤗 Hugging Face Space | 🌐 Github | 📜 arxiv 📍 Visit Qingyan and API Platform to experience larger-scale commercial video generation models. This model is the DiT version of CogView3, a text-to-image generation model, supporting image generation from 512 to 2048px. + Resolution: Width and height must meet the range from 512px to 2048px and must be divisible by 32. + Inference Speed: 1s / step (tested on A100) + Precision: BF16 / FP32 (FP16 is not supported, as it leads to overflow causing black images) We tested memory consumption at several common resolutions on A100 devices, `batchsize=1, BF16`, as shown in the table below: | 分辨率 | enablemodelcpuoffload OFF | enablemodelcpuoffload ON | |-------------|------------------------------|-----------------------------| | 512 512 | 19GB | 11GB | | 720 480 | 20GB | 11GB | | 1024 1024 | 23GB | 11GB | | 1280 720 | 24GB | 11GB | | 2048 2048 | 25GB | 11GB | First, ensure the `diffusers` library is installed from source. For more content and to download the original SAT weights, please visit our GitHub. 🌟 If you find our work helpful, feel free to cite our paper and leave a star: This Model is released under the Apache 2.0 License.
LongReward-llama3.1-8b-DPO
LongAlign-7B-64k-base
WebGLM-2B
WebGLM: Towards An Efficient Web-enhanced Question Answering System with Human Preference WebGLM-2B aspires to provide an efficient and cost-effective web-enhanced question-answering system using the 2-billion-parameter General Language Model (GLM). It aims to improve real-world application deployment by integrating web search and retrieval capabilities into the pre-trained language model. - LLM-augmented Retriever: Enhances the retrieval of relevant web content to better aid in answering questions accurately. - Bootstrapped Generator: Generates human-like responses to questions, leveraging the power of the GLM to provide refined answers. - Human Preference-aware Scorer: Estimates the quality of generated responses by prioritizing human preferences, ensuring the system produces useful and engaging content. This repo is the implementation of Bootstrap Generator.
glm-2b
GLM is a General Language Model pretrained with an autoregressive blank-filling objective and can be finetuned on various natural language understanding and generation tasks. Please refer to our paper for a detailed description of GLM: GLM: General Language Model Pretraining with Autoregressive Blank Infilling (ACL 2022) Zhengxiao Du, Yujie Qian, Xiao Liu, Ming Ding, Jiezhong Qiu, Zhilin Yang, Jie Tang (: equal contribution) Model description `glm-2b` is pretrained on the Pile dataset. It has 36 transformer layers, with hidden size 4096 and 64 attention heads in each layer. The model is pretrained with autoregressive blank filling objectives designed for natural language understanding, seq2seq, and language modeling. Find more details from our repo. How to use Please refer the instruction in our Github repo. We use three different mask tokens for different tasks: `[MASK]` for short blank filling, `[sMASK]` for sentence filling, and `[gMASK]` for left to right generation. You can find examples about different masks from here. The prediction always begin with a special ` ` token and ends with a ` ` token. Citation Please cite our paper if you find this code useful for your research:
chatglm-6b-int4-qe
介绍 ChatGLM-6B 是一个开源的、支持中英双语问答的对话语言模型,基于 General Language Model (GLM) 架构,具有 62 亿参数。结合模型量化技术,用户可以在消费级的显卡上进行本地部署(INT4 量化级别下最低只需 6GB 显存)。ChatGLM-6B 使用了和 ChatGLM 相同的技术,针对中文问答和对话进行了优化。经过约 1T 标识符的中英双语训练,辅以监督微调、反馈自助、人类反馈强化学习等技术的加持,62 亿参数的 ChatGLM-6B 已经能生成相当符合人类偏好的回答。 ChatGLM-6B-INT4-QE 是 ChatGLM-6B 量化后的模型权重。具体的,ChatGLM-6B-INT4-QE 对 ChatGLM-6B 中的 28 个 GLM Block 、 Embedding 和 LM Head 进行了 INT4 量化。量化后的模型权重文件仅为 3G ,理论上 6G 显存(使用 CPU 即 6G 内存)即可推理,具有在嵌入式设备(如树莓派)上运行的可能。 在 CPU 上运行时,会根据硬件自动编译 CPU Kernel ,请确保已安装 GCC 和 OpenMP (Linux一般已安装,对于Windows则需手动安装),以获得最佳并行计算能力。 关于更多的使用说明,包括如何运行命令行和网页版本的 DEMO,以及使用模型量化以节省显存,请参考我们的 Github Repo。 本仓库的代码依照 Apache-2.0 协议开源,ChatGLM-6B 模型的权重的使用则需要遵循 Model License。
SWE-Dev-9B
webrl-llama-3.1-70b
apar-13b
chatglm-6b-int8
介绍 ChatGLM-6B 是一个开源的、支持中英双语问答的对话语言模型,基于 General Language Model (GLM) 架构,具有 62 亿参数。结合模型量化技术,用户可以在消费级的显卡上进行本地部署(INT4 量化级别下最低只需 6GB 显存)。ChatGLM-6B 使用了和 ChatGLM 相同的技术,针对中文问答和对话进行了优化。经过约 1T 标识符的中英双语训练,辅以监督微调、反馈自助、人类反馈强化学习等技术的加持,62 亿参数的 ChatGLM-6B 已经能生成相当符合人类偏好的回答。 ChatGLM-6B-INT8 是 ChatGLM-6B 量化后的模型权重。具体的,ChatGLM-6B-INT8 对 ChatGLM-6B 中的 28 个 GLM Block 进行了 INT8 量化,没有对 Embedding 和 LM Head 进行量化。量化后的模型理论上 8G 显存(使用 CPU 即内存)即可推理,具有在嵌入式设备(如树莓派)上运行的可能。 在 CPU 上运行时,会根据硬件自动编译 CPU Kernel ,请确保已安装 GCC 和 OpenMP (Linux一般已安装,对于Windows则需手动安装),以获得最佳并行计算能力。 关于更多的使用说明,包括如何运行命令行和网页版本的 DEMO,以及使用模型量化以节省显存,请参考我们的 Github Repo。 本仓库的代码依照 Apache-2.0 协议开源,ChatGLM-6B 模型的权重的使用则需要遵循 Model License。 If you find our work helpful, please consider citing the following paper.
glm-4-voice-decoder
GLM-4-Voice 是智谱 AI 推出的端到端语音模型。GLM-4-Voice 能够直接理解和生成中英文语音,进行实时语音对话,并且能够根据用户的指令改变语音的情感、语调、语速、方言等属性。 GLM-4-Voice is an end-to-end voice model launched by Zhipu AI. GLM-4-Voice can directly understand and generate Chinese and English speech, engage in real-time voice conversations, and change attributes such as emotion, intonation, speech rate, and dialect based on user instructions. 本仓库是 GLM-4-Voice 的 speech decoder 部分。GLM-4-Voice-Decoder 是基于 CosyVoice 重新训练的支持流式推理的语音解码器,将离散化的语音 token 转化为连续的语音输出。最少只需要 10 个音频 token 即可开始生成,降低对话延迟。 The repo provides the speech decoder of GLM-4-Voice. GLM-4-Voice-Decoder is a speech decoder supporting streaming inference, retrained based on CosyVoice, converting discrete speech tokens into continuous speech output. Generation can start with as few as 10 audio tokens, reducing conversation latency. For more information please refer to our repo GLM-4-Voice.
glm-10b-chinese
GLM is a General Language Model pretrained with an autoregressive blank-filling objective and can be finetuned on various natural language understanding and generation tasks. Please refer to our paper for a detailed description of GLM: GLM: General Language Model Pretraining with Autoregressive Blank Infilling (ACL 2022) Zhengxiao Du, Yujie Qian, Xiao Liu, Ming Ding, Jiezhong Qiu, Zhilin Yang, Jie Tang (: equal contribution) Model description `glm-10b-chinese` is pretrained on the WuDaoCorpora dataset. It has 48 transformer layers, with hidden size 4096 and 64 attention heads in each layer. The model is pretrained with autoregressive blank filling objectives designed for natural language understanding, seq2seq, and language modeling. We use three different mask tokens for different tasks: `[MASK]` for short blank filling, `[sMASK]` for sentence filling, and `[gMASK]` for left to right generation. You can find examples about different masks from here. Citation Please cite our paper if you find this code useful for your research:
cogvlm2-llama3-chat-19B-tgi
chatglm3-6b-128k
💻 Github Repo • 🐦 Twitter • 📃 [GLM@ACL 22] [GitHub] • 📃 [GLM-130B@ICLR 23] [GitHub] 📍Experience the larger-scale ChatGLM model at chatglm.cn ChatGLM3-6B-128K在ChatGLM3-6B的基础上进一步强化了对于长文本的理解能力,能够更好的处理最多128K长度的上下文。具体地,我们对位置编码进行了更新,并设计了更有针对性的长文本训练方法,在对话阶段使用 128K 的上下文长度训练。在实际的使用中,如果您面临的上下文长度基本在 8K 以内,我们推荐使用ChatGLM3-6B;如果您需要处理超过 8K 的上下文长度,我们推荐使用ChatGLM3-6B-128K。 ChatGLM3-6B 是 ChatGLM 系列最新一代的开源模型,在保留了前两代模型对话流畅、部署门槛低等众多优秀特性的基础上,ChatGLM3-6B 引入了如下特性: 1. 更强大的基础模型: ChatGLM3-6B 的基础模型 ChatGLM3-6B-Base 采用了更多样的训练数据、更充分的训练步数和更合理的训练策略。在语义、数学、推理、代码、知识等不同角度的数据集上测评显示,ChatGLM3-6B-Base 具有在 10B 以下的预训练模型中最强的性能。 2. 更完整的功能支持: ChatGLM3-6B 采用了全新设计的 Prompt 格式,除正常的多轮对话外。同时原生支持工具调用(Function Call)、代码执行(Code Interpreter)和 Agent 任务等复杂场景。 3. 更全面的开源序列: 除了对话模型 ChatGLM3-6B 外,还开源了基础模型 ChatGLM-6B-Base、长文本对话模型 ChatGLM3-6B-128K。以上所有权重对学术研究完全开放,在填写问卷进行登记后亦允许免费商业使用。 Based on ChatGLM3-6B, ChatGLM3-6B-128K further strengthens the ability to understand long texts and can better handle contexts up to 128K in length. Specifically, we update the position encoding and design a more targeted long text training method, using a context length of 128K for training in the conversation stage. In actual use, if the context length you face is basically within 8K, we recommend using ChatGLM3-6B; if you need to handle For context lengths exceeding 8K, we recommend using ChatGLM3-6B-128K. ChatGLM3-6B is the latest open-source model in the ChatGLM series. While retaining many excellent features such as smooth dialogue and low deployment threshold from the previous two generations, ChatGLM3-6B introduces the following features: 1. More Powerful Base Model: The base model of ChatGLM3-6B, ChatGLM3-6B-Base, employs a more diverse training dataset, more sufficient training steps, and a more reasonable training strategy. Evaluations on datasets such as semantics, mathematics, reasoning, code, knowledge, etc., show that ChatGLM3-6B-Base has the strongest performance among pre-trained models under 10B. 2. More Comprehensive Function Support: ChatGLM3-6B adopts a newly designed Prompt format, in addition to the normal multi-turn dialogue. It also natively supports function call, code interpreter, and complex scenarios such as agent tasks. 3. More Comprehensive Open-source Series: In addition to the dialogue model ChatGLM3-6B, the base model ChatGLM-6B-Base and the long-text dialogue model ChatGLM3-6B-128K are also open-sourced. All the weights are fully open for academic research, and after completing the questionnaire registration, they are also allowed for free commercial use. 关于更多的使用说明,包括如何运行命令行和网页版本的 DEMO,以及使用模型量化以节省显存,请参考我们的 Github Repo。 For more instructions, including how to run CLI and web demos, and model quantization, please refer to our Github Repo. 本仓库的代码依照 Apache-2.0 协议开源,ChatGLM3-6B 模型的权重的使用则需要遵循 Model License。 The code in this repository is open-sourced under the Apache-2.0 license, while the use of the ChatGLM3-6B model weights needs to comply with the Model License. If you find our work helpful, please consider citing the following paper.
cogvlm2-llama3-chinese-chat-19B-tgi
codegeex2-6b-int4
🏠 Homepage |💻 GitHub |🛠 Tools VS Code , Jetbrains |🤗 HF Repo |📄 Paper CodeGeeX2: 更强大的多语言代码生成模型 A More Powerful Multilingual Code Generation Model CodeGeeX2 是多语言代码生成模型 CodeGeeX (KDD’23) 的第二代模型。CodeGeeX2 基于 ChatGLM2 架构加入代码预训练实现,得益于 ChatGLM2 的更优性能,CodeGeeX2 在多项指标上取得性能提升(+107% > CodeGeeX;仅60亿参数即超过150亿参数的 StarCoder-15B 近10%),更多特性包括: 更强大的代码能力:基于 ChatGLM2-6B 基座语言模型,CodeGeeX2-6B 进一步经过了 600B 代码数据预训练,相比一代模型,在代码能力上全面提升,HumanEval-X 评测集的六种编程语言均大幅提升 (Python +57%, C++ +71%, Java +54%, JavaScript +83%, Go +56%, Rust +321\%),在Python上达到 35.9\% 的 Pass@1 一次通过率,超越规模更大的 StarCoder-15B。 更优秀的模型特性:继承 ChatGLM2-6B 模型特性,CodeGeeX2-6B 更好支持中英文输入,支持最大 8192 序列长度,推理速度较一代 CodeGeeX-13B 大幅提升,量化后仅需6GB显存即可运行,支持轻量级本地化部署。 更全面的AI编程助手:CodeGeeX插件(VS Code, Jetbrains)后端升级,支持超过100种编程语言,新增上下文补全、跨文件补全等实用功能。结合 Ask CodeGeeX 交互式AI编程助手,支持中英文对话解决各种编程问题,包括且不限于代码解释、代码翻译、代码纠错、文档生成等,帮助程序员更高效开发。 更开放的协议:CodeGeeX2-6B 权重对学术研究完全开放,填写登记表申请商业使用。 CodeGeeX2 is the second-generation model of the multilingual code generation model CodeGeeX (KDD’23), which is implemented based on the ChatGLM2 architecture trained on more code data. Due to the advantage of ChatGLM2, CodeGeeX2 has been comprehensively improved in coding capability (+107% > CodeGeeX; with only 6B parameters, surpassing larger StarCoder-15B for some tasks). It has the following features: More Powerful Coding Capabilities: Based on the ChatGLM2-6B model, CodeGeeX2-6B has been further pre-trained on 600B code tokens, which has been comprehensively improved in coding capability compared to the first-generation. On the HumanEval-X benchmark, all six languages have been significantly improved (Python +57%, C++ +71%, Java +54%, JavaScript +83%, Go +56%, Rust +321\%), and in Python it reached 35.9% of Pass@1 one-time pass rate, surpassing the larger StarCoder-15B. More Useful Features: Inheriting the ChatGLM2-6B model features, CodeGeeX2-6B better supports both Chinese and English prompts, maximum 8192 sequence length, and the inference speed is significantly improved compared to the first-generation. After quantization, it only needs 6GB of GPU memory for inference, thus supports lightweight local deployment. Comprehensive AI Coding Assistant: The backend of CodeGeeX plugin (VS Code, Jetbrains) is upgraded, supporting 100+ programming languages, and adding practical functions such as infilling and cross-file completion. Combined with the "Ask CodeGeeX" interactive AI coding assistant, it can be used to solve various programming problems via Chinese or English dialogue, including but not limited to code summarization, code translation, debugging, and comment generation, which helps increasing the efficiency of developpers. Open Liscense: CodeGeeX2-6B weights are fully open to academic research, and please apply for commercial use by filling in the registration form. For more information, please refer to CodeGeeX2's Github Repo. 本仓库的代码依照 Apache-2.0 协议开源,模型的权重的使用则需要遵循 Model License。 The code in this repository is open source under the Apache-2.0 license. The model weights are licensed under the Model License. If you find our work helpful, please feel free to cite the following paper:
LongAlign-6B-64k
glm-roberta-large
WebGLM
WebGLM: Towards An Efficient Web-enhanced Question Answering System with Human Preference WebGLM aspires to provide an efficient and cost-effective web-enhanced question-answering system using the 10-billion-parameter General Language Model (GLM). It aims to improve real-world application deployment by integrating web search and retrieval capabilities into the pre-trained language model. - LLM-augmented Retriever: Enhances the retrieval of relevant web content to better aid in answering questions accurately. - Bootstrapped Generator: Generates human-like responses to questions, leveraging the power of the GLM to provide refined answers. - Human Preference-aware Scorer: Estimates the quality of generated responses by prioritizing human preferences, ensuring the system produces useful and engaging content. This repo is the implementation of Bootstrap Generator.
androidgen-glm-4-9b
chatglm2-6b-32k-int4
💻 Github Repo • 🐦 Twitter • 📃 [GLM@ACL 22] [GitHub] • 📃 [GLM-130B@ICLR 23] [GitHub] - 我们优化了KV Cache的存储方式,减少了显存碎片的产生。基于优化后的代码,模型可以在约11G显存的情况下处理32K长度的上下文。 - We have optimized the storage method of the KV Cache, reducing the generation of memory fragmentation. Based on the optimized code, the model can process a context length of 32K under approximately 11G of memory. ChatGLM2-6B-32K在ChatGLM2-6B的基础上进一步强化了对于长文本的理解能力,能够更好的处理最多32K长度的上下文。具体地,我们基于位置插值(Positional Interpolation)的方法对位置编码进行了更新,并在对话阶段使用 32K 的上下文长度训练。在实际的使用中,如果您面临的上下文长度基本在 8K 以内,我们推荐使用ChatGLM2-6B;如果您需要处理超过 8K 的上下文长度,我们推荐使用ChatGLM2-6B-32K。 ChatGLM2-6B-32K是开源中英双语对话模型 ChatGLM2-6B 的加长版本,在保留了初代模型对话流畅、部署门槛较低等众多优秀特性的基础之上,ChatGLM2-6B-32k 引入了如下新特性: 1. 更强大的性能:基于 ChatGLM 初代模型的开发经验,我们全面升级了 ChatGLM2-6B-32K 的基座模型。ChatGLM2-6B-32K 使用了 GLM 的混合目标函数,经过了 1.4T 中英标识符的预训练与人类偏好对齐训练。 2. 更长的上下文:基于 FlashAttention 技术,我们将基座模型的上下文长度(Context Length)由 ChatGLM-6B 的 2K 扩展到了 32K,并在对话阶段使用 32K 的上下文长度训练,允许更多轮次的对话。 3. 更高效的推理:基于 Multi-Query Attention 技术,ChatGLM2-6B-32K 有更高效的推理速度和更低的显存占用:在官方的模型实现下,推理速度相比初代提升了 42%,INT4 量化下,6G 显存支持的对话长度由 1K 提升到了 8K。 4. 更开放的协议:ChatGLM2-6B-32K 权重对学术研究完全开放,在填写问卷进行登记后亦允许免费商业使用。 The ChatGLM2-6B-32K further strengthens the ability to understand long texts based on the ChatGLM2-6B, and can better handle up to 32K context length. Specifically, we have updated the position encoding based on the method of Positional Interpolation, and trained with a 32K context length during the dialogue alignment. In practical use, if the context length you are dealing with is generally within 8K, we recommend using ChatGLM2-6B; if you need to handle a context length exceeding 8K, we recommend using ChatGLM2-6B-32K. ChatGLM2-6B-32K is the second-generation version of the open-source bilingual (Chinese-English) chat model ChatGLM-6B. It retains the smooth conversation flow and low deployment threshold of the first-generation model, while introducing the following new features: 1. Stronger Performance: Based on the development experience of the first-generation ChatGLM model, we have fully upgraded the base model of ChatGLM2-6B-32K. ChatGLM2-6B-32K uses the hybrid objective function of GLM, and has undergone pre-training with 1.4T bilingual tokens and human preference alignment training. 2. Longer Context: Based on FlashAttention technique, we have extended the context length of the base model from 2K in ChatGLM-6B to 32K, and trained with a context length of 32K during the dialogue alignment, allowing for more rounds of dialogue. 3. More Efficient Inference: Based on Multi-Query Attention technique, ChatGLM2-6B-32K has more efficient inference speed and lower GPU memory usage: under the official implementation, the inference speed has increased by 42% compared to the first generation; under INT4 quantization, the dialogue length supported by 6G GPU memory has increased from 1K to 8K. 4. More Open License: ChatGLM2-6B-32K weights are completely open for academic research, and free commercial use is also allowed after completing the questionnaire. 关于更多的使用说明,包括如何运行命令行和网页版本的 DEMO,以及使用模型量化以节省显存,请参考我们的 Github Repo。 For more instructions, including how to run CLI and web demos, and model quantization, please refer to our Github Repo. 本仓库的代码依照 Apache-2.0 协议开源,ChatGLM2-6B-32K 模型的权重的使用则需要遵循 Model License。 如果你觉得我们的工作有帮助的话,请考虑引用下列论文,ChatGLM2-6B 的论文会在近期公布,敬请期待~
GLM-TTS
MathGLM
UI2Code_N
CogVideoX1.5-5B-SAT
📍 Visit QingYing and API Platform to experience commercial video generation models. CogVideoX is an open-source video generation model originating from Qingying. CogVideoX1.5 is the upgraded version of the open-source CogVideoX model. The CogVideoX1.5-5B series model supports 10-second videos and higher resolutions. The `CogVideoX1.5-5B-I2V` variant supports any resolution for video generation. This repository contains the SAT-weight version of the CogVideoX1.5-5B model, specifically including the following modules: Includes weights for both I2V and T2V models. Specifically, it includes the following modules: Please select the corresponding weights when performing inference. The VAE part is consistent with the CogVideoX-5B series and does not require updating. You can also download it directly from here. Specifically, it includes the following modules: Consistent with the diffusers version of CogVideoX-5B, no updates are necessary. You can also download it directly from here. Specifically, it includes the following modules: This model is released under the CogVideoX LICENSE.
GLM-4.6V
CogVLM
ImageReward
ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation ImageReward is the first general-purpose text-to-image human preference RM which is trained on in total 137k pairs of expert comparisons, based on text prompts and corresponding model outputs from DiffusionDB. We demonstrate that ImageReward outperforms existing text-image scoring methods, such as CLIP, Aesthetic, and BLIP, in terms of understanding human preference in text-to-image synthesis through extensive analysis and experiments. We have integrated the whole repository to a single python package `image-reward`. Following the commands below to prepare the environment: We provide example images in the `assets/images` directory of this repo. The example prompt is: Use the following code to get the human preference scores from ImageReward: The output should be like as follow (the exact numbers may be slightly different depending on the compute device):
SCAIL-Preview
CogAgent
Reminder: This is the repository for CogAgent of SAT (SwissArmyTransformer) version. Please refer to https://huggingface.co/THUDM/cogagent-chat-hf for CogAgent of Huggingface version. CogAgent is an open-source visual language model improved based on CogVLM. CogAgent-18B has 11 billion visual and 7 billion language parameters. 🚀 GitHub: For more information, please refer to Our GitHub CogAgent demonstrates strong performance in image understanding and GUI agent: 1. CogAgent-18B achieves state-of-the-art generalist performance on 9 cross-modal benchmarks, including: VQAv2, MM-Vet, POPE, ST-VQA, OK-VQA, TextVQA, ChartQA, InfoVQA, DocVQA. 2. CogAgent-18B significantly surpasses existing models on GUI operation datasets, including AITW and Mind2Web. In addition to all the features already present in CogVLM (visual multi-round dialogue, visual grounding), CogAgent: 1. Supports higher resolution visual input and dialogue question-answering. It supports ultra-high-resolution image inputs of 1120x1120. 2. Possesses the capabilities of a visual Agent, being able to return a plan, next action, and specific operations with coordinates for any given task on any GUI screenshot. 3. Enhanced GUI-related question-answering capabilities, allowing it to handle questions about any GUI screenshot, such as web pages, PC apps, mobile applications, etc. 4. Enhanced capabilities in OCR-related tasks through improved pre-training and fine-tuning. Please refer to the instructions located at our GitHub - section cli-SAT for inference and fine-tuning of the SAT version of the model. The code in this repository is open source under the Apache-2.0 license, while the use of CogAgent and CogVLM model weights must comply with the Model License. If you find our work helpful, please consider citing the following papers In the instruction fine-tuning phase of the CogVLM, there are some English image-text data from the MiniGPT-4, LLAVA, LRV-Instruction, LLaVAR and Shikra projects, as well as many classic cross-modal work datasets. We sincerely thank them for their contributions.
GLM-4.6V-FP8
CogVideo
VisionReward-Image
CogView2
Kaleido-14B-S2V
WebVIA-Agent
SSVAE
VisionReward-Image-bf16
Introduction We present VisionReward, a general strategy to aligning visual generation models——both image and video generation——with human preferences through a fine-grainedand multi-dimensional framework. We decompose human preferences in images and videos into multiple dimensions,each represented by a series of judgment questions, linearly weighted and summed to an interpretable and accuratescore. To address the challenges of video quality assess-ment, we systematically analyze various dynamic features of videos, which helps VisionReward surpass VideoScore by 17.2% and achieve top performance for video preference prediction. Here, we present the model of VisionReward-Image. Merging and Extracting Checkpoint Files Use the following command to merge the split files into a single `.tar` file and then extract it into the specified directory: Using this model You can quickly install the Python package dependencies and run model inference in our github. > This model utilizes bf16 precision parameters and requires the use of the sat (SwissArmyTransformer) library for invocation. For the fp32 version of the model, please refer to the following link: https://huggingface.co/THUDM/VisionReward-Image-bf16
MSAGPT
MSAGPT 📖 Paper: MSAGPT: Neural Prompting Protein Structure Prediction via MSA Generative Pre-Training MSAGPT is a powerful protein language model (PLM). MSAGPT has 3 billion parameters with three versions of the model, MSAGPT, MSAGPT-Sft, and MSAGPT-Dpo, supporting zero-shot and few-shot MSA generation . MSAGPT achieves state-of-the-art structural prediction performance on natural MSA-scarce scenarios . Visualized Cases Visualization of improved structure prediction compared with nature MSA. Yellow : Ground truth; Purple : Predictions based on MSA generated by MSAGPT; Cyan : Predictions from MSA generated by natural MSA. Model List You can choose to manually download the necessary weights. Then UNZIP it and put it into the checkpoints folder. | Model | Type | Seq Length | Download | |------------------|------|------------|-----------------------------------------------------------------------------------------------------------------------------------------| | MSAGPT | Base | 16K | 🤗 Huggingface 🔨 SwissArmyTransformer | | MSAGPT-SFT | Sft | 16K | 🤗 Huggingface 🔨 SwissArmyTransformer | | MSAGPT-DPO | Rlhf | 16K | 🤗 Huggingface 🔨 SwissArmyTransformer | | | The program will automatically interact in the command line. You can generate replies entering the protein sequence you need to generate virtual MSAs (or add a few MSAs as a prompt, connected by "\ "), for example: "PEGKQGDPGIPGEPGPPGPPGPQGARGPPG\ VTVEFVNSCLIGDMGVDGPPGQQGQPGPPG", where "PEGKQGDPGIPGEPGPPGPPGPQGARGPPG" is the main sequence, and "VTVEFVNSCLIGDMGVDGPPGQQGQPGPPG" are MSA prompts, and pressing enter. Enter `stop` to stop the program. The chat CLI looks like: You can also enable the offline generation by set the --input-source \ and --output-path \ . We set an input file example: msainput. The code in this repository is open source under the Apache-2.0 license. If you find our work helpful, please consider citing the our paper