shreyanshu09
gm3-4b
Block Diagram Global Information
It was introduced in the paper "Unveiling the Power of Integration: Block Diagram Summarization through Local-Global Fusion" accepted at ACL 2024. The full code is available in this BlockNet github repository. This model is trained using a transformer encoder and decoder architecture, based on the configuration specified in Donut, to extract the overall summary of block diagram images. It supports both English and Korean languages. The straightforward architecture comprises a visual encoder module and a text decoder module, both based on the Transformer architecture. Training dataset - 41,933 samples from the synthetic and real-world block diagrams in English language (BD-EnKo) - 33,101 samples from the synthetic and real-world block diagrams in Korean language (BD-EnKo) - 396 samples from real-world English block diagram dataset (CBD) - 357 samples from handwritten English block diagram dataset (FCA) - 476 samples from handwritten English block diagram dataset (FCB) If you have any questions about this work, please contact Shreyanshu Bhushan using the following email addresses: [email protected]. The content of this project itself is licensed under the Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).