SchGen: PCB Schematic Generation with Semantic-Grounded Code Representations

TL;DR

SchGen introduces a semantic-grounded code model for PCB schematic generation, achieving 82% valid circuits with 60.5% functional correctness from natural language prompts.

cs.AI 🔴 Advanced 2026-05-29 100 views
Qinpei Luo Ruichun Ma Xinyu Zhang Lili Qiu
Hardware Design Natural Language Processing Generative Models Circuit Schematics Semantic Encoding

Key Findings

Methodology

This work proposes a semantic-based code representation that encodes schematic editing primitives, transforming the generation task from geometry prediction to semantic matching. By focusing on relative placement and pin-name connectivity, the approach simplifies spatial reasoning for large language models. A large-scale dataset of open-source PCB designs is constructed via an agent-human collaborative pipeline, converting images into executable code representations. The model, based on fine-tuned GPT-oss-20b with chain-of-thought reasoning, significantly improves wire connectivity and functional correctness. The core components include symbol addition, pin connection, and relative positioning APIs, which collectively reduce errors and enhance interpretability. The training process involves supervised fine-tuning on the dataset, with ablation studies confirming the importance of relative coordinates and pin-based wiring. Experimental results demonstrate that the semantic representation outperforms traditional verbose syntax and tool-specific formats, achieving higher validity and correctness metrics.

Key Results

  • On a dataset of 1390 open-source PCB schematics, SchGen achieves an 82% valid circuit rate and 60.5% functional correctness, surpassing baseline models that reach only 32%. The model maintains high netlist accuracy even on unseen GitHub test sets, matching the performance of larger models like GPT-5.2. Ablation experiments show that removing relative coordinates or pin-name connectivity causes significant performance drops, confirming their critical roles. Compared to larger general-purpose LLMs, SchGen with only 20 billion parameters outperforms in wire connectivity and layout correctness, highlighting the effectiveness of the semantic encoding strategy.
  • The model demonstrates robustness across diverse component types and complex layouts, with a notable reduction in wiring errors and spatial overlaps. The results validate the hypothesis that structured semantic representations facilitate better learning and generation, especially in hardware design tasks that involve intricate connectivity and spatial arrangements. The experimental validation underscores the potential of semantic-grounded code representations to revolutionize automated PCB schematic generation, paving the way for more intelligent and efficient hardware design workflows.
  • These findings suggest that integrating semantic understanding into generative models can significantly improve the quality and reliability of hardware schematics, enabling more automated and accessible design processes. The approach also offers a scalable pathway for expanding datasets and adapting to industrial applications, ultimately contributing to the evolution of AI-assisted hardware development.

Significance

This research addresses a long-standing challenge in hardware design automation: generating accurate, editable PCB schematics from natural language instructions. By innovating with a semantic-grounded code representation, the study reduces the complexity of spatial reasoning and leverages large language models more effectively. The resulting system not only accelerates the schematic design process but also lowers the entry barrier for non-experts, democratizing hardware innovation. Its success demonstrates that structured semantic encoding can bridge the gap between high-level functional descriptions and detailed hardware schematics, marking a significant step toward end-to-end AI-driven hardware design. The implications extend to rapid prototyping, customized electronics, and integrated AI-assisted design tools, promising substantial industry impact and research advancement.

Technical Contribution

The paper introduces a novel semantic code representation that encodes schematic primitives, relative placements, and pin-level connectivity, transforming the traditional geometry-heavy generation problem into a semantics-driven matching task. This approach enables efficient learning and robust generation with smaller models. The large-scale dataset, created via an agent-human pipeline converting open-source designs into executable code, provides a rich training resource. Fine-tuning GPT-oss-20b with chain-of-thought reasoning further enhances the model’s ability to produce functionally correct schematics. The combination of structured API design, semantic abstraction, and data augmentation constitutes a new paradigm in hardware schematic generation, setting a foundation for future research in AI-assisted hardware design.

Novelty

This work is the first to formulate PCB schematic generation as a semantic matching task, leveraging a structured code representation based on relative placement and pin semantics. Unlike prior methods that rely on verbose, tool-specific formats or raw geometry, this approach abstracts the design process into a set of primitives that are more learnable and generalizable. The integration of a large, automatically generated dataset and the application of chain-of-thought prompting for training further distinguish this work. The results demonstrate that a relatively small model, when equipped with a carefully designed semantic representation, can outperform larger general-purpose LLMs, highlighting the importance of representation design in hardware AI tasks.

Limitations

  • The current model struggles with highly dense, multi-layered circuits where spatial reasoning becomes more complex, leading to wiring errors and placement inaccuracies. This is partly due to the limited spatial understanding of the model and the simplified assumptions in the relative coordinate system.
  • The dataset, although large, is primarily derived from open-source designs, which may not fully capture the complexity and variability of industrial PCB layouts, potentially limiting real-world applicability.
  • Training and inference require substantial computational resources, especially for fine-tuning and chain-of-thought prompting, which could hinder deployment in resource-constrained environments.

Future Work

未来将结合布局优化算法和电气仿真工具,推动端到端的自动硬件设计流程。计划扩展数据集,涵盖更多工业级复杂电路,提升模型的泛化能力。还将探索多模态输入(如图像和文本结合)以增强模型理解能力,推动硬件设计的智能化发展。此外,优化模型架构以降低计算成本和提升推理速度,也是未来的重要方向。

AI Executive Summary

Designing printed circuit boards (PCBs) has long been a manual, expertise-intensive process that forms the backbone of modern electronic hardware. From consumer devices to complex AI systems, the schematic design stage is crucial but remains a bottleneck due to its reliance on specialized skills and time-consuming workflows. Traditional electronic design automation (EDA) tools, while powerful, depend heavily on verbose, tool-specific formats and geometric layouts, making automation and AI integration challenging.

Recent advances in generative AI have shown promise in digital and analog IC design, yet applying similar techniques to PCB schematic generation from natural language remains largely unexplored. Existing methods focus on high-level hardware description languages or graph-based circuit topologies, which do not directly translate to the diverse, heterogeneous components and complex wiring of PCB schematics. Moreover, these approaches often lack the ability to incorporate high-level functional requirements expressed in natural language, limiting their usability.

In response, this study introduces SchGen, a pioneering large language model tailored for PCB schematic generation driven by semantic-grounded code representations. The core innovation lies in abstracting schematic design into structured editing primitives, such as symbol addition, pin connection, and relative placement, encoded via APIs that emphasize semantics over raw geometry. This transformation converts a geometry-heavy prediction task into a semantics-driven matching problem, significantly easing the learning burden on the model.

To support training, the authors developed a large-scale dataset of open-source PCB designs, converted into executable code through an agent-human collaborative pipeline. This pipeline leverages multi-modal LLMs like GPT-5 to generate Python scripts that replicate schematic layouts, which are then manually corrected to ensure accuracy. The dataset encompasses 1390 diverse schematic types, providing rich supervision for model fine-tuning.

Experimental results demonstrate that SchGen achieves an 82% valid circuit rate and 60.5% functional correctness on a comprehensive test set, outperforming baseline models and larger general-purpose LLMs. Ablation studies confirm the importance of relative coordinates and pin-based connectivity, validating the design choices. The model generalizes well to unseen data, matching the netlist accuracy of GPT-5.2 despite having only 20 billion parameters.

This work marks a significant step toward automated, natural language-driven hardware design. By focusing on semantic encoding, it overcomes the limitations of geometry-heavy representations, paving the way for more accessible and efficient PCB schematic generation. Future directions include integrating layout optimization, expanding datasets for industrial applications, and exploring multi-modal inputs to further enhance model understanding. Overall, SchGen exemplifies how innovative representation strategies can unlock new potentials in AI-assisted hardware development, promising transformative impacts across electronics industries.

Deep Dive

Abstract

Printed circuit board (PCB) schematic design defines nearly all electronic hardware, but it remains manual and expertise-intensive. While generative AI has advanced digital and analog IC design, PCB schematic generation from natural-language intent is largely unexplored. This paper presents SchGen, the first large language model that generates editable PCB schematics from natural-language requests. The key challenge lies in the lack of an LLM-suited representation and a large-scale dataset. Current schematic formats are dominated by verbose, tool-specific syntax and geometry-heavy descriptions, making them difficult to generate reliably. We introduce a semantically grounded code representation that encodes schematic editing primitives with relative placement and pin-name-based wiring, transforming a geometry-driven generation problem into a semantics-driven matching task amenable to LLMs. We further construct a large-scale dataset of PCB schematics paired with user prompts via a human-agent collaborative pipeline that converts open-source hardware designs into our representation. Experiments show that SchGen significantly outperforms alternative representations and even larger general-purpose LLMs on wire connectivity accuracy and functional correctness. Our results highlight the critical role of representation design in enabling generative models for complex hardware design tasks.

cs.AI cs.CL cs.LG

References (20)

Complete PCB Design Using OrCAD Capture and PCB Editor

Kraig D. Mitzner

2009 48 citations

Challenges for Automating PCB Layout

Wen-Hao Liu, Anthony Agnesina, Haoxing Ren

2024 6 citations

ChatEDA: A Large Language Model Powered Autonomous Agent for EDA

Haoyuan Wu, Zhuolun He, Xinyun Zhang et al.

2023 218 citations View Analysis →

LaMAGIC: Language-Model-based Topology Generation for Analog Integrated Circuits

Chen-Chia Chang, Yikang Shan, Shaoze Fan et al.

2024 44 citations View Analysis →

Text-to-CAD Generation Through Infusing Visual Feedback in Large Language Models

Ruiyu Wang, Yu Yuan, Shizhao Sun et al.

2025 34 citations View Analysis →

Schemato – An LLM for Netlist-to-Schematic Conversion

Ryoga Matsuo, Stefan Uhlich, Arun Venkitaraman et al.

2024 12 citations View Analysis →

Netlistify: Transforming Circuit Schematics into Netlists with Deep Learning

Chun-Yen Huang, Hsuan-I Chen, Hao-Wen Ho et al.

2025 4 citations

Image2Net: Datasets, Benchmark and Hybrid Framework to Convert Analog Circuit Diagrams into Netlists

Haohang Xu, Chengjie Liu, Qihang Wang et al.

2025 8 citations View Analysis →

INVITED: ALIGN – Open-Source Analog Layout Automation from the Ground Up

K. Kunal, Meghna Madhusudan, A. Sharma et al.

2019 108 citations

PCBSchemaGen: Constraint-Guided Schematic Design via LLM for Printed Circuit Boards (PCB)

Huanghaohe Zou, Peng Han, Emad Nazerian et al.

2026 1 citations View Analysis →

Can LLMs Compress (and Decompress)? Evaluating Code Understanding and Execution via Invertibility

Nickil Maveli, Antonio Vergari, Shay B. Cohen

2026 2 citations View Analysis →

AnalogCoder: Analog Circuit Design via Training-Free Code Generation

Yao Lai, Sungyoung Lee, Guojin Chen et al.

2024 111 citations View Analysis →

VeriGen: A Large Language Model for Verilog Code Generation

Shailja Thakur, Baleegh Ahmad, H. Pearce et al.

2023 352 citations View Analysis →

Automatic Layout Design for Power Electronics PCBs

Yi Tian, A. Forsyth, Zhuoru Li et al.

2022 4 citations

Large Language Models Are Reasoning Teachers

Namgyu Ho, Laura Schmid, Se-Young Yun

2022 504 citations View Analysis →

GenCAD: Image-Conditioned Computer-Aided Design Generation with Transformer-Based Contrastive Representation and Diffusion Priors

Md Ferdous Alam, Faez Ahmed

2024 31 citations View Analysis →

LoRA: Low-Rank Adaptation of Large Language Models

J. Hu, Yelong Shen, Phillip Wallis et al.

2021 19443 citations View Analysis →

Evaluating Spatial Understanding of Large Language Models

Yutaro Yamada, Yihan Bao, Andrew Kyle Lampinen et al.

2023 72 citations View Analysis →

FanoutNet: A Neuralized PCB Fanout Automation Method Using Deep Reinforcement Learning

Haiyun Li, Jixin Zhang, Ning Xu et al.

2023 12 citations

Large Language Models as Commonsense Knowledge for Large-Scale Task Planning

Zirui Zhao, W. S. Lee, David Hsu

2023 380 citations View Analysis →