Cross-Species Transfer Learning for Electrophysiology-to-Transcriptomics Mapping in Cortical GABAergic Interneurons

TL;DR

Using cross-species transfer learning to enhance electrophysiology-to-transcriptomics mapping accuracy in cortical GABAergic interneurons.

cs.LG 🔴 Advanced 2026-03-12 12 views
Theo Schwider Ramin Ramezani
cross-species transfer learning electrophysiology transcriptomics neuroscience machine learning

Key Findings

Methodology

This study utilizes the Allen Institute's Patch-seq datasets to investigate cross-species electrophysiology-to-transcriptomics mapping. By analyzing GABAergic interneurons from both mouse and human cortex, the study employs sparse PCA and a random forest as baseline models, and develops an attention-based BiLSTM model. This model operates directly on the structured IPFX feature-family representation, avoiding sparse PCA, and provides feature-family-level interpretability through learned attention weights. Finally, a cross-species transfer learning setting is evaluated, where the sequence model is pretrained on mouse data and fine-tuned on an aligned 4-class task on human data.

Key Results

  • On mouse data, the random forest baseline model achieved an accuracy of 90.72%, while the attention-based BiLSTM model, after applying SMOTE, improved accuracy to 92.35%.
  • On human data, the random forest model achieved an accuracy of 75.18%, while the attention-based BiLSTM model, after applying SMOTE, increased the macro-F1 score to 67.54%.
  • Cross-species transfer learning significantly improved prediction performance on human data, with the macro-F1 score increasing from 65.80% to 67.95%.

Significance

This study confirms the reproducibility of the Gouwens et al. baseline on mouse data and demonstrates the effectiveness of sequence models on electrophysiological features. Through cross-species transfer learning, the study shows that mouse data can serve as effective auxiliary supervision for human data, enhancing human subclass prediction accuracy. This provides new perspectives in neuroscience, particularly in understanding neuronal functions and molecular characteristics across species, with significant scientific and translational implications.

Technical Contribution

The technical contributions of this paper include the development of an attention-based BiLSTM model that can directly process electrophysiological features without relying on sparse PCA, providing interpretability through attention weights. Additionally, the study demonstrates the potential of cross-species transfer learning, proving the effectiveness of mouse data on limited human datasets.

Novelty

This study is the first to apply an attention-based BiLSTM model in electrophysiology-to-transcriptomics mapping, especially in the context of cross-species transfer learning. Compared to previous studies, this approach not only enhances model interpretability but also achieves significant performance improvements in mouse-to-human transfer learning.

Limitations

  • Due to the smaller and imbalanced human dataset, the model performs less well on certain rare subclasses compared to common ones.
  • Cross-species transfer learning may be affected by biological differences and experimental distribution shifts.
  • The complexity of the model may lead to higher computational costs, especially on large-scale datasets.

Future Work

Future research could explore more cross-species datasets to validate the model's generalization capabilities. Additionally, integrating other modalities, such as morphological features, could further improve prediction performance and interpretability.

AI Executive Summary

In neuroscience research, understanding the link between neuronal functional diversity and molecular characteristics has been a crucial topic. Traditional approaches often rely on single-species data, which limits our understanding of neuronal features across species. This paper proposes a novel method using cross-species transfer learning to improve the accuracy of electrophysiology-to-transcriptomics mapping, particularly in cortical GABAergic interneurons of mice and humans.

The study utilizes publicly available Patch-seq datasets from the Allen Institute, covering neurons from both mouse and human cortex. After quality control, the study analyzes 3,699 mouse visual cortex neurons and 506 human neocortical neurons. Using standardized electrophysiological features and sparse PCA, the study reproduces the major class-level separations reported in the original mouse study.

For supervised prediction, the study first employs a class-balanced random forest model as a baseline. Subsequently, an attention-based BiLSTM model is developed, which operates directly on the structured IPFX feature-family representation, avoiding sparse PCA, and providing feature-family-level interpretability through learned attention weights. This approach not only improves prediction accuracy but also enhances understanding of the model's decision-making process.

In a cross-species transfer learning setting, the study evaluates the sequence model pretrained on mouse data and fine-tuned on an aligned 4-class task on human data. Results show that transfer learning significantly improves the macro-F1 score on human data compared to a human-only training baseline, indicating that mouse data can serve as effective auxiliary supervision for human data, especially in cases of small and imbalanced datasets.

The significance of this study lies in confirming the reproducibility of the Gouwens et al. baseline on mouse data and demonstrating the effectiveness of sequence models on electrophysiological features. Through cross-species transfer learning, the study shows that mouse data can serve as effective auxiliary supervision for human data, enhancing human subclass prediction accuracy. This provides new perspectives in neuroscience, particularly in understanding neuronal functions and molecular characteristics across species, with significant scientific and translational implications.

However, the study also has limitations. Due to the smaller and imbalanced human dataset, the model performs less well on certain rare subclasses compared to common ones. Additionally, cross-species transfer learning may be affected by biological differences and experimental distribution shifts. Future research could explore more cross-species datasets to validate the model's generalization capabilities and integrate other modalities, such as morphological features, to further improve prediction performance and interpretability.

Deep Analysis

Background

In recent years, the field of neuroscience has made significant progress in understanding the connection between neuronal functional diversity and molecular characteristics. Electrophysiological recordings provide a functional view of neurons, while transcriptomics reveals molecular features. Gouwens et al. (2020) combined these two aspects using the Patch-seq technique, offering a new method for understanding neuronal types. The Patch-seq technique pairs whole-cell recordings with single-cell RNA sequencing, enabling direct mappings between physiology and transcriptomic identity. The Allen Institute's Patch-seq datasets provide a rich resource for such studies. However, existing research primarily focuses on single-species data, with limited cross-species studies, which restricts our understanding of neuronal features across different species.

Core Problem

The core problem is how to effectively map electrophysiological features to transcriptomic identity, particularly in a cross-species context. Existing methods often rely on single-species data, which may lead to insufficient generalization across different species. Additionally, human datasets are typically smaller and more imbalanced, further complicating prediction tasks. Solving this problem is crucial for understanding neuronal functions and molecular characteristics across species.

Innovation

The core innovations of this paper include:

1. Developing an attention-based BiLSTM model that can directly process electrophysiological features without relying on sparse PCA, providing interpretability through attention weights.

2. Employing cross-species transfer learning, utilizing mouse data as auxiliary supervision for human data, enhancing human subclass prediction accuracy.

3. Evaluating a cross-species transfer learning setting, where the sequence model is pretrained on mouse data and fine-tuned on an aligned 4-class task on human data.

Methodology

The methodology of this paper includes the following steps:

  • �� Data Collection: Utilizing the Allen Institute's Patch-seq datasets, covering neurons from both mouse and human cortex.
  • �� Data Preprocessing: Performing quality control on the data, using standardized electrophysiological features and sparse PCA for preliminary analysis.
  • �� Model Development: Developing an attention-based BiLSTM model that operates directly on the structured IPFX feature-family representation.
  • �� Transfer Learning: Pretraining the sequence model on mouse data and fine-tuning on an aligned 4-class task on human data.
  • �� Model Evaluation: Using macro-F1 score and accuracy as the main evaluation metrics.

Experiments

The experimental design includes:

  • �� Datasets: Utilizing the Allen Institute's Patch-seq datasets, covering 3,699 mouse visual cortex neurons and 506 human neocortical neurons.
  • �� Baseline Model: Using a random forest model as a baseline to evaluate its performance on mouse and human data.
  • �� Evaluation Metrics: Using macro-F1 score and accuracy as the main evaluation metrics.
  • �� Ablation Studies: Evaluating the performance of different model variants, including the use of attention mechanisms and SMOTE.

Results

The results analysis shows:

  • �� On mouse data, the random forest baseline model achieved an accuracy of 90.72%, while the attention-based BiLSTM model, after applying SMOTE, improved accuracy to 92.35%.
  • �� On human data, the random forest model achieved an accuracy of 75.18%, while the attention-based BiLSTM model, after applying SMOTE, increased the macro-F1 score to 67.54%.
  • �� Cross-species transfer learning significantly improved prediction performance on human data, with the macro-F1 score increasing from 65.80% to 67.95%.

Applications

The application scenarios of this study include:

  • �� Neuroscience Research: Enhancing understanding of neuronal functions and molecular characteristics across species through cross-species transfer learning.
  • �� Clinical Diagnosis: Providing new molecular markers for the diagnosis of neurological diseases, helping doctors more accurately identify and classify different types of neurons.
  • �� Drug Development: Assisting drug developers in identifying potential drug targets through more accurate neuronal classification, accelerating the discovery and development of new drugs.

Limitations & Outlook

The limitations of this paper include:

  • �� Dataset Size: The smaller and imbalanced human dataset may affect the model's generalization capabilities.
  • �� Biological Differences: Cross-species transfer learning may be affected by biological differences and experimental distribution shifts.
  • �� Computational Costs: The complexity of the model may lead to higher computational costs, especially on large-scale datasets. Future research could explore more cross-species datasets to validate the model's generalization capabilities and integrate other modalities, such as morphological features, to further improve prediction performance and interpretability.

Plain Language Accessible to non-experts

Imagine you're in a large library, where each book represents a neuron. Each book has two types of information: its content (like the electrophysiological features of a neuron) and its catalog information (like the transcriptomic features of a neuron). Traditional methods involve reading each book's content to understand its catalog information, which requires a lot of time and effort. Now, researchers have developed a new method, like a smart librarian, who can quickly infer a book's catalog information by observing its cover and table of contents (electrophysiological features). Even better, this librarian can work in different libraries (cross-species transfer learning), allowing them to perform well in various environments. This method allows us to understand each book's complete information more quickly and accurately without having to read each one individually.

ELI14 Explained like you're 14

Imagine you're playing a super complex game with lots of characters, each with their own skills and attributes. Now, you want to know each character's backstory, but you don't want to ask them one by one. So, you develop a super cool skill that lets you guess their backstory just by looking at their skills and attributes! Even better, this skill works not just in one game but in different games too! That's like what scientists are doing—they're looking at the electrophysiological features of neurons (like the characters' skills and attributes) to figure out the neurons' transcriptomic information (like the characters' backstory). And they can use this method across different species, just like switching characters in different games! Isn't that awesome?

Glossary

Cross-Species Transfer Learning

A machine learning approach that involves training on data from one species and then fine-tuning on data from another species to improve model generalization.

Used in this paper to transfer learning from mouse data to human data.

Electrophysiology

The study of the electrical properties of biological cells and tissues, particularly the electrical activity of neurons.

Used to measure the functional characteristics of neurons.

Transcriptomics

The study of the complete set of RNA transcripts produced by the genome, revealing the full range of gene expression.

Used to determine the molecular characteristics of neurons.

GABAergic Interneurons

A type of inhibitory neuron that uses gamma-aminobutyric acid (GABA) as a neurotransmitter.

The main type of neuron studied in this paper.

Patch-seq

A technique that combines whole-cell recordings and single-cell RNA sequencing to simultaneously obtain electrophysiological and transcriptomic data from neurons.

Used for data collection and analysis.

Sparse PCA

A principal component analysis method that improves interpretability through sparse loadings.

Used for feature dimensionality reduction and analysis.

Random Forest

An ensemble learning method that constructs multiple decision trees to improve classification performance.

Used as a baseline model for performance evaluation.

BiLSTM

Bidirectional Long Short-Term Memory network, a type of neural network that captures bidirectional dependencies in sequence data.

Used to process sequences of electrophysiological features.

Attention Mechanism

A neural network mechanism that highlights important features by assigning different weights.

Used to improve model interpretability.

SMOTE

A synthetic minority over-sampling technique that generates synthetic samples to balance datasets.

Used to address class imbalance issues.

Open Questions Unanswered questions from this research

  • 1 How can we further improve the generalization capabilities of cross-species transfer learning, especially between species with significant biological differences? Existing methods may not perform well in the face of extreme biological differences, requiring the development of more robust transfer learning strategies.
  • 2 How can we effectively integrate other modalities, such as morphological features, to further improve the accuracy of electrophysiology-to-transcriptomics mapping? This requires the development of new multimodal fusion methods.
  • 3 How can we improve the training efficiency and prediction performance of models in the context of limited dataset sizes? Existing methods may not perform well on small datasets, necessitating the development of new data augmentation and model optimization techniques.
  • 4 How can we enhance model interpretability and transparency without increasing computational costs? Existing methods may involve trade-offs between complexity and interpretability.
  • 5 How can we ensure model stability and consistency under varying experimental conditions? Changes in experimental conditions may lead to fluctuations in model performance, requiring the development of more stable model architectures.

Applications

Immediate Applications

Neuroscience Research

Enhancing understanding of neuronal functions and molecular characteristics across species through cross-species transfer learning, advancing fundamental research.

Clinical Diagnosis

Providing new molecular markers for the diagnosis of neurological diseases, helping doctors more accurately identify and classify different types of neurons.

Drug Development

Assisting drug developers in identifying potential drug targets through more accurate neuronal classification, accelerating the discovery and development of new drugs.

Long-term Vision

Cross-Species Neural Networks

Developing neural network models that can generalize across different species, promoting cross-species research in biology and medicine.

Personalized Medicine

Supporting personalized medicine through deeper neuronal feature analysis, helping to formulate more effective treatment plans.

Abstract

Single-cell electrophysiological recordings provide a powerful window into neuronal functional diversity and offer an interpretable route for linking intrinsic physiology to transcriptomic identity. Here, we replicate and extend the electrophysiology-to-transcriptomics framework introduced by Gouwens et al. (2020) using publicly available Allen Institute Patch-seq datasets from both mouse and human cortex. We focus on GABAergic inhibitory interneurons to target a subclass structure (Lamp5, Pvalb, Sst, Vip) that is comparable and conserved across species. After quality control, we analyzed 3,699 mouse visual cortex neurons and 506 human neocortical neurons from neurosurgical resections. Using standardized electrophysiological features and sparse PCA, we reproduced the major class-level separations reported in the original mouse study. For supervised prediction, a class-balanced random forest provided a strong feature-engineered baseline in mouse data and a reduced but still informative baseline in human data. We then developed an attention-based BiLSTM that operates directly on the structured IPFX feature-family representation, avoiding sPCA and providing feature-family-level interpretability via learned attention weights. Finally, we evaluated a cross-species transfer setting in which the sequence model is pretrained on mouse data and fine-tuned on human data for an aligned 4-class task, improving human macro-F1 relative to a human-only training baseline. Together, these results confirm reproducibility of the Gouwens pipeline in mouse data, demonstrate that sequence models can match feature-engineered baselines, and show that mouse-to-human transfer learning can provide measurable gains for human subclass prediction.

cs.LG q-bio.NC

References (20)

Conserved cell types with divergent features in human versus mouse cortex

R. Hodge, Trygve E Bakken, Jeremy A. Miller et al.

2019 1433 citations ⭐ Influential

Integrated Morphoelectric and Transcriptomic Classification of Cortical GABAergic Cells.

N. Gouwens, S. Sorensen, Fahimeh Baftizadeh et al.

2020 409 citations ⭐ Influential

Signature morphoelectric properties of diverse GABAergic interneurons in the human neocortex

Brian R. Lee, R. Dalley, Jeremy A. Miller et al.

2023 83 citations ⭐ Influential

Random Forests

L. Breiman

2001 112635 citations ⭐ Influential

Classification of electrophysiological and morphological neuron types in the mouse visual cortex

N. Gouwens, S. Sorensen, J. Berg et al.

2019 389 citations

Sparse Principal Component Analysis

H. Zou, T. Hastie, R. Tibshirani

2006 3270 citations

Single-neuron models linking electrophysiology, morphology, and transcriptomics across cortical cell types

Anirban Nandi, Thomas Chartrand, Werner Van Geit et al.

2020 45 citations

Scaled, high fidelity electrophysiological, morphological, and transcriptomic cell characterization

Brian R. Lee, Agata Budzillo, Kristen Hadley et al.

2021 45 citations

Shared and distinct transcriptomic cell types across neocortical areas

Bosiljka Tasic, Zizhen Yao, Lucas T. Graybuck et al.

2018 1530 citations

Neuron NeuroView Neurodata Without Borders : Creating a Common Data Format for Neurophysiology

Jeffery L. Teeters, Keith B. Godfrey, R. Young et al.

2015 162 citations

Human neocortical expansion involves glutamatergic neuron diversification

J. Berg, S. Sorensen, J. Ting et al.

2021 217 citations

Decoupled Weight Decay Regularization

I. Loshchilov, F. Hutter

2017 31501 citations

ArcFace: Additive Angular Margin Loss for Deep Face Recognition

Jiankang Deng, J. Guo, S. Zafeiriou

2018 7410 citations

Patch-seq: Past, Present, and Future

M. Lipovsek, C. Bardy, C. Cadwell et al.

2021 80 citations

Adam: A Method for Stochastic Optimization

Diederik P. Kingma, Jimmy Ba

2014 163716 citations View Analysis →

SMOTE: Synthetic Minority Over-sampling Technique

N. Chawla, K. Bowyer, L. Hall et al.

2002 29286 citations View Analysis →

Focal Loss for Dense Object Detection

Tsung-Yi Lin, Priya Goyal, Ross B. Girshick et al.

2017 30330 citations

Integration of electrophysiological recordings with single-cell RNA-seq data identifies novel neuronal subtypes

J. Fuzik, Amit Zeisel, Zoltán Máté et al.

2015 376 citations

Morpho-electric and transcriptomic divergence of the layer 1 interneuron repertoire in human versus mouse neocortex

Thomas Chartrand, R. Dalley, J. Close et al.

2022 68 citations

UMAP: Uniform Manifold Approximation and Projection

Leland McInnes, John Healy, Nathaniel Saul et al.

2018 7223 citations