SimEIT: A Scalable Simulation Framework for Generating Large-Scale Electrical Impedance Tomography Datasets

Abstract

Electrical Impedance Tomography (EIT) offers advantages over conventional imaging methods, such as X-ray and MRI, but suffers from an ill-posed inverse problem. Deep learning can alleviate this challenge, yet progress is limited by the lack of large, diverse, and reproducible datasets. We present SimEIT, a scalable framework for deterministic simulation and generation of synthetic EIT data. SimEIT enables high-throughput creation of diverse geometries and conductivity maps using parallelized finite element simulations, reproducible seeding, and automated validation. The framework provides multi-resolution, AI-ready HDF5 outputs with PyTorch integration. Demonstrated on two datasets exceeding 100,000 samples, SimEIT bridges the gap between physical simulation and AI training, supporting reliable benchmarking and development of advanced reconstruction algorithms.

Overview

Electrical impedance tomography (EIT) utilizes electrode arrays to inject electrical currents and measure boundary voltages through diverse stimulation patterns across a region of interest (ROI), reconstructing spatial conductivity distributions to characterize internal material arrangements. EIT offers advantages over radiation-based techniques (e.g., X-ray, MRI) through non-invasiveness, low cost-effectiveness, minimal power requirements, and rapid response, enabling medical applications such as breast cancer detection, stroke diagnosis, lung function monitoring, alongside industrial process monitoring.

Despite progress, EIT research is hampered by a lack of scalable, adaptable, and open-source datasets. This scarcity of diverse training data obstructs the development of robust AI-driven reconstruction algorithms needed to solve EIT's ill-posed inverse problem. To address this critical gap, we introduce SimEIT: an open-source, parallelized framework for generating large-scale, physically consistent EIT datasets. Built on the validated EIDORS engine, SimEIT integrates flexibility, reproducibility, and scalability through several key innovations:

Modular Framework & Parallel Processing: Flexible architecture with interchangeable components enables parallel execution in geometry generation and simulation stages, overComing scalability bottlenecks.
Geometry-Boundary Flexibility: Parametric customization of inclusion shapes (e.g., circles, ellipses, triangles), conductivity distributions, electrode placements, and domain boundaries (e.g., spherical substrates).
Reproducible High-Throughput Synthesis: Deterministic seed control ensures batch-wise traceability for large-scale, physically accurate data generation.
AI-Ready Data Optimization: Multi-resolution ground-truth maps (e.g., 256 × 256 to 32 × 32), differential outputs, and metadata-linked HDF5 storage with PyTorch integration.
EIDORS-Based Physical Fidelity: Maintains physical consistency while supporting MATLAB and open-source Octave environments.
Open Ecosystem & Visualization: Public codebase, Hugging Face demos, configurable noise models, and visualization tools enable community-driven expansion and validation.

By democratizing large-scale EIT data synthesis, SimEIT accelerates inverse solver development, enables systematic study of ill-posedness origins, and establishes a foundation for reproducible AI advancements in the field, launching the realization of the EIT promise across medical, industrial, and scientific domains.

Framework Architecture Overview

**Figure:** Overview of the dataset generation architecture.

The proposed framework employs a modular architecture comprising three coordinated phases: initialization, core framework execution, and post-processing, as shown in the figure above. The initialization phase establishes deterministic reproducibility through batch-specific seed initialization, base mesh configuration, and measurement pattern (e.g., adjacent/opposite current injection), while resolving baseline boundary conditions via cached the base forward solution. Subsequent framework stages integrate parallelized geometry generation and finite element simulation components, leveraging adaptations of the established EIDORS forward solver. Post-processing implements multi-scale data transformation and validation protocols, ensuring physical consistency while optimizing outputs for machine learning integration.

Generated Datasets

This section details the validation of our framework through generating two large-scale synthetic EIT datasets using an adjacent current injection pattern. It describes the efficient fixed-meshing strategy enabling high-throughput simulation, presents the characteristics and statistical distributions of the dataset, showcases sample diversity, and outlines framework features like multi-resolution support and PyTorch integration, addressing the critical need for scalable, open-source EIT data. The datasets were generated using MATLAB 2020b on a high-performance computing (HPC) cluster with distributed processing via SLURM. Both datasets are publicly available to support reproducible EIT research.

Dataset 1: Mixed Shapes with Diverse Geometries

The first dataset contains 100,000 samples featuring four distinct inclusion shapes: triangles, rectangles, circles, and ellipses. Each shape category exhibits configurable geometric degrees of freedom (DOF), such as position, size, and aspect ratio. For example, circles are defined by center coordinates (x,y) and radius (3 DOF), while ellipses require additional axes and rotation parameters (5 DOF). The maximum per-shape DOF was capped at 7 to maximize shape diversity while maintaining computational tractability.

**Figure:** Statistical distributions of key parameters in the first generated EIT dataset. (a) Number of objects. (b) Type of object geometries. (c) Logarithmic conductivity values of inclusions. (d) Fractional coverage area of objects within the domain.

The statistical properties of the generated dataset confirm its controlled diversity. The number of inclusions per sample is uniformly distributed from one to four (a), ensuring balanced complexity. The distribution of shapes is varied, with rectangles and circles being most frequent (b). Inclusion conductivities follow a near-uniform logarithmic distribution (c), offering a wider range than typical phantoms. The fractional area covered by objects is right-skewed, prioritizing samples with lower object density while still including high-density cases (d).

**Figure:** Example inclusion geometries and conductivity distributions. Each domain shows unique configurations of shapes with varying sizes, orientations, arrangements, and conductivity values (color-coded).

Samples of Dataset 1:

Our framework procedurally generates EIT datasets with morphological diversity. Each sample represents a circular domain containing randomized inclusions of circles, triangles, rectangles, and ovals with heterogeneous sizes, orientations, and spatial distributions. Conductivity values σ follow a logarithmic distribution spanning multiple orders of magnitude, creating challenging physical scenarios. To accommodate different computational requirements, conductivity maps are generated at 256 × 256 pixel resolution and downsampled to 128 × 128, 64 × 64, and 32 × 32 resolutions. A PyTorch DataLoader integrates these datasets into deep learning workflows, while built-in circular masks exclude extraneous regions outside the EIT domain boundary, ensuring only relevant pixels are processed.

Dataset 2: Circular Inclusions with Variations

This dataset comprises 100,000 samples containing exclusively circular inclusions. Each circle is defined by its center coordinates (x,y) and radius, resulting in 3 DOF per inclusion. The number of circles per sample varies from 1-4, with their radii and conductivities sampled from user-defined distributions. This dataset focuses on parametric variations of circular shapes to facilitate targeted studies on the impact of inclusion size and conductivity on EIT reconstruction performance.

BibTeX

@article{ameen2025simeit,
  title={SimEIT: A Scalable Simulation Framework for Generating Large-Scale Electrical Impedance Tomography Datasets},
  author={Ameen, Ayman A. and Mathis-Ullrich, Franziska and Kainz, Bernhard},
  year={2025},
}