Instruction set for the representation of graphs

TL;DR

IsalGraph method encodes any finite simple graph as a compact string over a nine-character instruction alphabet, suitable for graph similarity search.

cs.CL 🔴 Advanced 2026-03-12 15 views

Ezequiel Lopez-Rubio Mario Pascual-Gonzalez

AI Reader Arxiv Page Download PDF

graph representation instruction set graph similarity language models Levenshtein distance

Key Findings

Methodology

The IsalGraph method represents graph structures using a compact string over a nine-character instruction alphabet, executed by a small virtual machine comprising a sparse graph, a circular doubly-linked list (CDLL) of graph-node references, and two traversal pointers. Instructions either move a pointer through the CDLL or insert a node or edge into the graph. A greedy GraphToString algorithm encodes any connected graph into a string in polynomial time, while an exhaustive-backtracking variant produces a canonical string by selecting the lexicographically smallest shortest string across all starting nodes and valid traversal orders.

Key Results

Evaluation on five real-world graph benchmark datasets (IAM Letter LOW/MED/HIGH, LINUX, AIDS) shows that the Levenshtein distance between IsalGraph strings correlates strongly with graph edit distance (GED).
Experimental results demonstrate that IsalGraph strings provide a compact, isomorphism-invariant, and language-model-compatible sequential encoding of graph structure, with direct applications in graph similarity search, generation, and graph-conditioned language modeling.
On sparse graphs, the encoding time complexity of the IsalGraph method is polynomial, while it exhibits super-polynomial growth on dense graphs.

Significance

The IsalGraph method offers a novel compact representation of graph structures, overcoming the limitations of traditional adjacency matrices in terms of space occupancy and compatibility with sequential models. Its isomorphism-invariance and compatibility with language models make it directly applicable in graph similarity search, generation, and conditional language modeling. This method not only opens new research directions in academia but also provides new tools for graph data processing in the industry.

Technical Contribution

The IsalGraph method introduces a small virtual machine and a nine-character instruction alphabet to achieve compact graph representation. Unlike existing methods, it does not rely on a fixed ordering of nodes, avoiding the space waste of adjacency matrices, and ensures graph validity and consistency during string decoding. Additionally, the IsalGraph method provides new theoretical guarantees, ensuring all strings decode to valid graphs.

Novelty

IsalGraph is the first method to represent graph structures as compact strings over a nine-character instruction alphabet. Compared to traditional adjacency matrix representations, its innovation lies in using a virtual machine and instruction set to achieve compact graph representation, addressing the space waste issue in sparse graphs and being compatible with language models.

Limitations

The IsalGraph method exhibits super-polynomial growth in encoding time complexity when dealing with dense graphs, which may limit its application on large-scale dense graphs.
The method relies on an exhaustive-backtracking algorithm to generate canonical strings, which can lead to high computational overhead when the number of nodes is large.
While the IsalGraph method performs well on sparse graphs, its performance on dense graphs still requires further optimization.

Future Work

Future research directions include optimizing the encoding efficiency of the IsalGraph method on dense graphs, exploring more efficient exhaustive-backtracking algorithms, and applying the method to a broader range of graph datasets. Additionally, the potential of the IsalGraph method in graph generation and graph-conditioned language modeling, especially when combined with large-scale language models, can be further explored.

AI Executive Summary

Graphs are among the most expressive data structures available to scientists and engineers, widely used in fields such as molecular compounds, social networks, knowledge bases, protein interaction networks, and circuit topologies. However, encoding graph structures efficiently to support computation, generalization, and downstream learning remains a challenge. Traditional adjacency matrix methods, while widely used, suffer from severe space waste on sparse graphs and are unsuitable for sequential models like recurrent networks or Transformers. The IsalGraph method addresses these issues by representing graph structures as compact strings over a nine-character instruction alphabet. This method is implemented through a small virtual machine comprising a sparse graph, a circular doubly-linked list (CDLL) of graph-node references, and two traversal pointers. Instructions either move a pointer through the CDLL or insert a node or edge into the graph. Every string over the alphabet decodes to a valid graph, with no invalid states reachable. Experimental results show that the Levenshtein distance between IsalGraph strings correlates strongly with graph edit distance (GED), demonstrating its potential applications in graph similarity search, generation, and conditional language modeling. Despite the higher encoding time complexity on dense graphs, the IsalGraph method offers a new perspective on graph representation, warranting further research and optimization.

Deep Analysis

Background

Graphs, as a data structure, are widely used in various fields such as molecular compounds, social networks, and knowledge bases. Traditionally, graph representation has relied heavily on adjacency matrices, which suffer from space waste issues on sparse graphs and are unsuitable for sequential model processing. With the rise of large-scale language models, encoding graph structures as sequential data for language model processing has become a new research direction. The IsalGraph method was proposed to address this challenge by representing graph structures as compact strings over a nine-character instruction alphabet, providing a new solution.

Core Problem

Traditional graph representation methods, such as adjacency matrices, suffer from space waste and are unsuitable for sequential model processing. Particularly on sparse graphs, the space complexity of adjacency matrices is O(N^2), which is unacceptable for large-scale graph data. Additionally, adjacency matrix representation relies on a fixed ordering of nodes, breaking isomorphism invariance. Designing a compact, isomorphism-invariant graph representation method has become a pressing issue.

Innovation

The core innovation of the IsalGraph method lies in representing graph structures using a small virtual machine and a nine-character instruction alphabet. • The virtual machine comprises a sparse graph, a circular doubly-linked list (CDLL) of graph-node references, and two traversal pointers. • Instructions either move a pointer through the CDLL or insert a node or edge into the graph. • Every string over the alphabet decodes to a valid graph, with no invalid states reachable. • A greedy GraphToString algorithm encodes any connected graph into a string in polynomial time, while an exhaustive-backtracking variant produces a canonical string by selecting the lexicographically smallest shortest string across all starting nodes and valid traversal orders.

Methodology

The IsalGraph method achieves compact graph representation through the following steps: • Use a small virtual machine comprising a sparse graph, a circular doubly-linked list (CDLL) of graph-node references, and two traversal pointers. • Instructions either move a pointer through the CDLL or insert a node or edge into the graph. • Every string over the alphabet decodes to a valid graph, with no invalid states reachable. • A greedy GraphToString algorithm encodes any connected graph into a string in polynomial time, while an exhaustive-backtracking variant produces a canonical string by selecting the lexicographically smallest shortest string across all starting nodes and valid traversal orders.

Experiments

Experiments were conducted on five real-world graph benchmark datasets (IAM Letter LOW/MED/HIGH, LINUX, AIDS) to evaluate the correlation between the Levenshtein distance between IsalGraph strings and graph edit distance (GED). • The datasets include graphs with varying structural densities, from sparse to moderately dense. • Experimental results demonstrate that IsalGraph strings provide a compact, isomorphism-invariant, and language-model-compatible sequential encoding of graph structure, with direct applications in graph similarity search, generation, and graph-conditioned language modeling.

Results

Experimental results show that the Levenshtein distance between IsalGraph strings correlates strongly with graph edit distance (GED), particularly on sparse graphs. • On sparse graphs, the encoding time complexity of the IsalGraph method is polynomial, while it exhibits super-polynomial growth on dense graphs. • The IsalGraph method offers a new perspective on graph representation, warranting further research and optimization.

Applications

The IsalGraph method has direct applications in graph similarity search, generation, and graph-conditioned language modeling. • Its compact representation makes it suitable for processing large-scale graph data, especially when combined with large-scale language models. • The method can also be used for compressed storage and transmission of graph data, reducing storage and communication costs.

Limitations & Outlook

While the IsalGraph method performs well on sparse graphs, its performance on dense graphs still requires further optimization. • The method relies on an exhaustive-backtracking algorithm to generate canonical strings, which can lead to high computational overhead when the number of nodes is large. • The encoding time complexity exhibits super-polynomial growth when dealing with dense graphs, which may limit its application on large-scale dense graphs.

Plain Language Accessible to non-experts

Imagine you're in a kitchen cooking. Traditional adjacency matrices are like putting all the ingredients in one big pot to cook, which is simple but can waste a lot of space for some ingredients. The IsalGraph method is like using a small kitchen tool to precisely place each ingredient in the right spot and process them in a specific order. This not only saves space but also ensures each dish has a unique flavor. This method uses a small virtual machine and a nine-character instruction alphabet to achieve compact graph representation, similar to using a small tool to organize all the ingredients neatly. Each step is carefully designed to ensure the final dish is both delicious and resource-efficient. Although it may take more time to handle complex dishes, the result is worth it because it provides a whole new cooking experience.

ELI14 Explained like you're 14

Hey there! Did you know that scientists have invented a super cool way to represent graphs, like those network diagrams we see on social media? It's called the IsalGraph method, and it's like a super smart robot chef that can put all the ingredients (nodes and edges in a graph) into a small box and describe them using a special language. Imagine this is like using a secret code to record all your friends and their relationships! Plus, this method can help you find which graphs are similar, just like finding friends with similar interests. Although this method might take a bit of time to handle complex graphs, it's really amazing because it helps us better understand and process these complex networks. Isn't that cool?

Glossary

IsalGraph

IsalGraph is a method that represents any finite simple graph as a compact string over a nine-character instruction alphabet.

Used in graph similarity search, generation, and conditional language modeling.

CDLL (Circular Doubly-Linked List)

A circular doubly-linked list is a data structure that allows bidirectional traversal of the list, with the end connected to the beginning.

Used to store graph-node references.

GraphToString

GraphToString is a greedy algorithm that encodes connected graphs into strings.

Used in the IsalGraph method for graph encoding.

Levenshtein Distance

Levenshtein distance is a measure of string similarity, representing the minimum number of edit operations needed to transform one string into another.

Used to evaluate similarity between IsalGraph strings.

Graph Edit Distance (GED)

Graph edit distance is the minimum number of edit operations needed to transform one graph into another.

Used to evaluate similarity between graphs.

Greedy Algorithm

A greedy algorithm makes the optimal choice at each step as it attempts to find the overall optimal way to solve the entire problem.

Used in the IsalGraph method for graph encoding.

Exhaustive Backtracking

Exhaustive backtracking is an algorithmic strategy that explores all possible solutions to find the optimal one.

Used to generate canonical strings in the IsalGraph method.

Isomorphism Invariance

Isomorphism invariance ensures that the graph representation does not depend on the fixed ordering of nodes.

A key feature of the IsalGraph method.

Language Model Compatibility

Language model compatibility means that the graph representation can be directly processed by language models.

An important feature of the IsalGraph method.

Sparse Graph

A sparse graph is one where the number of edges is much less than the square of the number of nodes.

A primary application scenario for the IsalGraph method.

Open Questions Unanswered questions from this research

1 The encoding efficiency of the IsalGraph method on dense graphs still needs optimization. The current method exhibits super-polynomial growth in encoding time complexity on dense graphs, limiting its application on large-scale dense graphs. Improving encoding efficiency is a pressing issue.
2 The exhaustive-backtracking algorithm can lead to high computational overhead when the number of nodes is large. Designing more efficient algorithms to generate canonical strings is an important direction for future research.
3 The performance of the IsalGraph method when combined with large-scale language models deserves further investigation. Although the method is compatible with language models, its actual performance on large-scale graph data still needs validation.
4 Applying the IsalGraph method to more diverse graph datasets is also an open question. Current experiments focus mainly on five benchmark datasets, and future research can explore applications in more fields.
5 The potential of the IsalGraph method in graph generation and graph-conditioned language modeling has not been fully explored. Future research can investigate the method's application effects in these fields, especially in generating complex graph structures.

Applications

Immediate Applications

Graph Similarity Search

The IsalGraph method can be used for quickly finding similar graphs, applicable in social network analysis and chemical molecular structure comparison.

Graph Generation

Generate new graph structures using the IsalGraph method, applicable in synthetic chemistry and biological network simulation.

Graph-Conditioned Language Modeling

Combine with language models for graph data generation and analysis, applicable in natural language processing and graph data mining.

Long-term Vision

Large-Scale Graph Data Compression

Utilize the compact representation feature of the IsalGraph method to achieve efficient compression and transmission of large-scale graph data.

Intelligent Graph Data Analysis

Combine with artificial intelligence technology to achieve intelligent analysis and prediction of complex graph data, promoting scientific research and industrial applications.

Abstract

We present IsalGraph, a method for representing the structure of any finite, simple graph as a compact string over a nine-character instruction alphabet. The encoding is executed by a small virtual machine comprising a sparse graph, a circular doubly-linked list (CDLL) of graph-node references, and two traversal pointers. Instructions either move a pointer through the CDLL or insert a node or edge into the graph. A key design property is that every string over the alphabet decodes to a valid graph, with no invalid states reachable. A greedy \emph{GraphToString} algorithm encodes any connected graph into a string in time polynomial in the number of nodes; an exhaustive-backtracking variant produces a canonical string by selecting the lexicographically smallest shortest string across all starting nodes and all valid traversal orders. We evaluate the representation on five real-world graph benchmark datasets (IAM Letter LOW/MED/HIGH, LINUX, and AIDS) and show that the Levenshtein distance between IsalGraph strings correlates strongly with graph edit distance (GED). Together, these properties make IsalGraph strings a compact, isomorphism-invariant, and language-model-compatible sequential encoding of graph structure, with direct applications in graph similarity search, graph generation, and graph-conditioned language modelling

cs.CL cs.AI cs.DS

References (18)

A distance measure between attributed relational graphs for pattern recognition

A. Sanfeliu, K. Fu

1983 1090 citations ⭐ Influential

Random graphs

A. Frieze

2006 11766 citations

Graph Neural Networks: A Review of Methods and Applications

Jie Zhou, Ganqu Cui, Zhengyan Zhang et al.

2018 6644 citations View Analysis →

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee et al.

2019 111292 citations View Analysis →

Representation of the structure of graphs by sequences of instructions

Ezequiel López-Rubio

2025 1 citations View Analysis →

IAM Graph Database Repository for Graph Based Pattern Recognition and Machine Learning

Kaspar Riesen, H. Bunke

2008 608 citations

Emergence of Scaling in Random Networks

B. McInnes, Jannene S. McBride, N. Evans et al.

1999 31836 citations

SimGNN: A Neural Network Approach to Fast Graph Similarity Computation

Yunsheng Bai, Haoyang Ding, Song Bian et al.

2018 388 citations View Analysis →

A Survey on Graph Representation Learning Methods

Shima Khoshraftar, A. An

2022 208 citations View Analysis →

Graph Edit Distance with General Costs Using Neural Set Divergence

Eeshaan Jain, Indradyumna Roy, Saswat Meher et al.

2024 14 citations View Analysis →

Inductive Representation Learning on Large Graphs

William L. Hamilton, Z. Ying, J. Leskovec

2017 18775 citations View Analysis →

Fast Graph Representation Learning with PyTorch Geometric

Matthias Fey, J. E. Lenssen

2019 5162 citations View Analysis →

Semi-Supervised Classification with Graph Convolutional Networks

Thomas Kipf, M. Welling

2016 33908 citations View Analysis →

Graph Attention Networks

Petar Velickovic, Guillem Cucurull, Arantxa Casanova et al.

2017 25045 citations View Analysis →

Attention is All you Need

Ashish Vaswani, Noam Shazeer, Niki Parmar et al.

2017 168849 citations View Analysis →

The igraph software package for complex network research

Gábor Csárdi, T. Nepusz

2006 11920 citations

A Comprehensive Survey on Deep Graph Representation Learning

Wei Ju, Zheng Fang, Yiyang Gu et al.

2023 276 citations View Analysis →

Exploring Network Structure, Dynamics, and Function using NetworkX

A. Hagberg, D. Schult, P. Swart et al.

2008 7669 citations

Instruction set for the representation of graphs

Key Findings

Methodology

Key Results

Significance

Technical Contribution

Novelty

Limitations

Future Work

AI Executive Summary

Deep Analysis

Background

Core Problem

Innovation

Methodology

Experiments

Results

Applications

Limitations & Outlook

Plain Language Accessible to non-experts

ELI14 Explained like you're 14

Glossary

IsalGraph

CDLL (Circular Doubly-Linked List)

GraphToString

Levenshtein Distance

Graph Edit Distance (GED)

Greedy Algorithm

Exhaustive Backtracking

Isomorphism Invariance

Language Model Compatibility

Sparse Graph

Open Questions Unanswered questions from this research

Applications

Immediate Applications

Graph Similarity Search

Graph Generation

Graph-Conditioned Language Modeling

Long-term Vision

Large-Scale Graph Data Compression

Intelligent Graph Data Analysis

Abstract

References (18)

Related Papers

Neuron-Aware Data Selection In Instruction Tuning For Large Language Models

ESG-Bench: Benchmarking Long-Context ESG Reports for Hallucination Mitigation

Mending the Holes: Mitigating Reward Hacking in Reinforcement Learning for Multilingual Translation

Interpretable Semantic Gradients in SSD: A PCA Sweep Approach and a Case Study on AI Discourse

Long-form RewardBench: Evaluating Reward Models for Long-form Generation

HMS-BERT: Hybrid Multi-Task Self-Training for Multilingual and Multi-Label Cyberbullying Detection