Biomedical Image Computing
Students can send new ideas and suggestions for possible Semester- or Master projects to the following address:
This project aims to use large language models to predict the acceptance of academic papers by combining textual and contextual information. Leveraging text and figures from manuscripts alongside metadata (e.g., venue/journal), the project seeks to train a model that can evaluate the likelihood of acceptance based on established criteria such as relevance, novelty, clarity, and methodological soundness.
Supervisor: Gary Sarwin,
Professor: Ender Konukoglu
This project explores the potential of Vision-Language Models (VLMs) to predict surgical phases solely based on visual inputs from surgical videos, leveraging pre-existing knowledge without the need for labeled data or additional training. By inputting surgical procedure videos into a VLM, the project aims to assess the model’s ability to recognize and sequence phases of surgery based on its general visual-linguistic understanding of typical surgical steps and objects.
The project hypothesizes that VLMs can inherently identify relevant surgical contexts and transitions due to their extensive pre-training on diverse multimodal datasets. Through this work, we seek to demonstrate the feasibility of using VLMs as zero-shot predictors in specialized medical tasks, with potential applications in intraoperative decision support.
Supervisor: Gary Sarwin,
Professor: Ender Konukoglu
Fourier Neural Operators (FNOs) have emerged as a powerful tool for learning solutions to Partial Differential Equations (PDEs), leveraging the Fourier transform to efficiently represent functions in frequency space. However, most FNO architectures are constrained to structured grids and struggle with complex geometries. This project aims to develop a geometry-agnostic FNO that operates on curvilinear coordinates, enabling the model to handle arbitrary geometries without the need for specialized training. The selected student will start by studying the foundational work on FNOs and their group-equivariant extensions for PDEs. The project will then focus on implementing a novel approach that embeds input data into curvilinear coordinates, allowing the operator to learn solutions irrespective of the underlying geometry. Finally, the student will compare the performance of this method against state-of-the-art techniques.
Objectives:
- Study existing FNO architectures, particularly group-equivariant FNOs for learning PDEs on non-Euclidean domains.
- Develop and implement a geometry-agnostic FNO by embedding data into curvilinear coordinates.
- Benchmark the proposed method against state-of-the-art PDE solvers, particularly in complex geometric domains.
Supervisor: Kyriakos Flouris,
Professor: Ender Konukoglu
References:
Li, Z., Kovachki, N., Azizzadenesheli, K., et al. (2021). Fourier Neural Operator for Parametric Partial Differential Equations. arXiv:2010.08895.
Li, Z., Azizzadenesheli, K., Bhattacharya, K., et al. (2022). Learning Group Invariant Operators with Fourier Neural Operators. NeurIPS.
Energy-based models (EBMs) and diffusion models share a common foundation in the Langevin equation [1]. Diffusion models [2], however, benefit from predefined data points, making the learning process more structured and efficient. This project aims to explore whether introducing a similar structure into EBMs could lead to improved performance. The selected student will dive into the theory behind both diffusion and energy-based models and work on developing a novel generative model by applying the diffusion forward process to EBMs. The goal is to investigate if this approach can yield comparable improvements in learning efficiency.
Supervisor: Kyriakos Flouris,
Professor: Ender Konukoglu
References:
[1] How to Train Your Energy-Based Models; Yang Song, Diederik P. Kingma
[2] Denoising Diffusion Probabilistic Models; Jonathan Ho, Ajay Jain, Pieter Abbeel
Description for the Master project
One promising method for cardiovascylar disease prediction involves analyzing retinal images, as the retinal vasculature provides insights into cardiovascular health, including stroke risk. Researchers can use retinal images to assess overall health, and in previous work, a framework was developed combining graph-based retinal image representations with clinical data to enhance stroke prediction using a contrastive self-supervised model. The aim of this project is to extend our previous work. In particular, we aim to make predictions interpretable, which is crucial in the context of clinically relevant machine learning models. Further, we aim to cope with incomplete data, improve our model’s architecture by proposing state-of-the-art graph encoders, and potentially fine-tune ro build upon recent foundation models. Lastly, we aim to explore additional downstream tasks for cardiovascular diseases with potentially new modalities like brain images that can be paired with retinal fundus image graph representations. To summarize, the focus lies on exploring the use of retinal fundus image graph representations in a contrastive learning framework. In-depth literature research will be expected. The aim should also be to contribute to a scientific publication.
Your qualifications / what we are looking for
- Knowledge of common machine learning paradigms and architectures (graph neural networks, self-supervised learning, etc.)
- Knowledge of explainable and interpretable AI is advantageous
- Excellent programming skills in Python as well as familiarity with PyTorch (and PyTorch geometric)
- Full time commitment towards the completion of your project
- Ability to work independently on challenging projects
- Ability to understand scientific papers and conduct literature research
- Prior medical knowledge is advantageous
How to apply
Please send your CV and transcript to Neda Davoudi () and Bastian Wittmann ().
Links to previous work (e.g., your GitHub profile) are highly appreciated.
Co-Supervisors: Neda Davoudi, Bastian Wittmann, and Bjoern Menze
Professor: Ender Konukoglu
References:
[1] “Retinal vasculature of different diameters and plexuses exhibit distinct vulnerability in varying severity of diabetic retinopathy” (external page https://www.nature.com/articles/s41433-024-03021-4)
[2] “Geometric deep learning for disease classification in OCTA images” (external page https://iovs.arvojournals.org/article.aspx?articleid=2790851)
[3] “A foundation model for generalizable disease detection from retinal images”(external page https://www.nature.com/articles/s41586-023-06555-x)
[4] “Best of Both Worlds: Multimodal Contrastive Learning with Tabular and Imaging Data” (external page https://arxiv.org/abs/2303.14080)
[5] “GNNExplainer: Generating Explanations for Graph Neural Networks” (external page https://arxiv.org/abs/1903.03894)
[6] “TIP: Tabular-Image Pre-training for Multimodal Classification with Incomplete Data” (external page https://arxiv.org/abs/2407.07582)
[7] “Link prediction for flow-driven spatial networks” (external page https://arxiv.org/abs/2303.14501)
[8] "The emerging role of combined brain/heart magnetic resonance imaging for the evaluation of brain/heart interaction in heart failure" (external page https://www.mdpi.com/2077-0383/11/14/4009)
[9] Ferrando SB et al. Stroke and Retinal microvascular changes: Neuroimaging markers of brain damage and association with retinal Optical Coherence Tomography Angiography parameters.
Background and Motivation
We are currently developing a computer-vision-based method for automated neurological assessment for patients at the neurocritical care unit at the University Hospital Zurich. The staff performs the neurological assessment of these patients manually every couple of hours to detect secondary brain injuries as stroke or epileptic seizures. Based on pose estimation, we aim to continuously track patient movement and rapidly detect trends and abnormal patterns. The current approach separately leverages conventional RGB and thermal/infrared cameras. With this setup, we are limited to monocular 2D pose estimation for each modality.
Combining the thermal and RGB video streams for stereo vision would allow for refined movement monitoring and movement biomarker development.
Project Overview
- Preparing a test setup (outside the ICU) with an RGB and thermal camera, including camera calibration, stream synchronization, and test data acquisition.
- Leveraging public multi-spectral depth data (e.g., the Multi-Spectral Stereo (MS2) Dataset) to develop an adaptable proof-of-concept.
- If successful, test in the ICU environment. Due to patient privacy protection, this will be achieved with volunteers from the team instead of actual patients.
Requirements
- Programming knowledge in Python
- Familiarity with basic Computer Vision concepts
Supervising professor: Prof. Dr. Ender Konukoglu
Co-supervisor (contact): Yannick Suter ()
Neurocritical Care Unit, Department of Neurosurgery and Institute of Intensive Care Medicine, University Hospital Zurich and University of Zurich
Background: Foundation models are a breakthrough in the field of artificial intelligence. These models are characterized by massive size, reaching billions (even trillions) of parameters, and by the ability to be adapted to a wide variety of tasks without needing to be trained from scratch. The development of these models marks a pivotal shift in AI research and application, pushing the boundaries of what machines can understand and do. However, due to the huge size of foundation model, they are very demanding in terms of computation, memory footprint and bandwidth. For this reason, foundation models face significant computational challenges. These models are typically trained on massive clusters equipped with thousands of advanced GPUs. Moreover, they require cloud services for inference as well.
Aim: Quantization is an effective technique to reduce the stored model size of foundation models and accelerate their inference. For example, a 70B Llama model approximately needs 150GB GPU memory with 16-bit floating point weights, which requires two A100 80G GPUs for inference. If the model is quantized to 4 bits, the required GPU memory can be reduced to 35GB and allows the model to fit on a single GPU with less memory. Thus, in this project, we aim to further release the power of quantization, shrinking the vision and language foundation model size and accelerating their inference. We will consider vision and language foundation models such as SAM2 and LLAVA.
Methods: The project target mixed-precision quantization methods within a quantization-aware training framework, specifically adapted to the BitNet training and QLoRA finetuning process. Mixed-precision quantization allows for different parts of the model to be quantized at different levels, enabling a tailored approach where critical components of the model can retain higher precision to pre-serve essential information and model integrity, while less critical components can be quantized more aggressively to achieve greater reductions in memory usage and computational demand. The motivation behind using mixed-precision quantization lies in its potential to find Pareto-optimal points in the trade-offs between model size, computational efficiency, and performance accuracy. By selectively applying different quantization strategies across the model, it becomes possible to maintain or even enhance the model's effectiveness while still benefiting from the efficiency gains of lower-bit quantization.
Materials and Resources: The candidate will join the research team with extensive experience in machine learning, computer vision. The candidate will have the opportunity to work with active researchers and to be supervised by world-leading professors and senior researchers. Access to high-performance supercomputers equipped with rich GPU resource will be possible.
Nature of the Thesis:
review: 10%; Model building: 70%; Model validation: 10%; Results analysis: 10%
Requirements:
Familiar with Python, Pytorch;
Knowledge in machine learning, deep learning;
Experience with training deep learning models;
Knowledge with Transformers and Mamba is a bonus;
Knowledge with Pytorch-Lightning is a bonus.
Co-Supervisors:
Dr. Yawei Li ()
Dr. Guolei Sun ()
Professors:
Prof. Luca Benini (), main supervisor
Prof. Ender Konukoglu ()
Institutes:
Integrated System Lab & Computer Vision Lab, D-ITET, ETH Zurich
References:
[1] Pandey, Nilesh Prasad, et al. "A practical mixed precision algorithm for post-training quantization." arXiv preprint arXiv:2302.05397 (2023).
[2] Van Baalen, Mart, et al. "Bayesian bits: Unifying quantization and pruning." Advances in neural information processing systems 33 (2020): 5741-5752.
[3] Hu, Edward J., et al. "Lora: Low-rank adaptation of large language models." arXiv preprint arXiv:2106.09685 (2021).
[4] Wang, Hongyu, et al. "Bitnet: Scaling 1-bit transformers for large language models." arXiv preprint arXiv:2310.11453 (2023).
[5] Ma, Shuming, et al. "The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits." arXiv preprint arXiv:2402.17764 (2024).
Introduction: 3D Gaussian Splatting (3DGS) has become the de facto method of 3D representation in many vision tasks. In our recent work [1] we present a large-scale Gaussian splatted object dataset, and propose a masked autoencoder model to explore the potential of direct pretraining on Gaussian splats, highlighting its advantages over point clouds-based methods. This opens a new avenue for self-supervised 3D representation learning.
Following approach in [2], we aim to generalize Gaussian learning to the scene level. Unlike central objects, indoor scenes involve more challenges 1) Scene level data contains more Gaussians 2) Scene data includes multiple objects in different scale, making it more challenging for models to understand. 3) There are more tasks at the scene level, such as object detection, segmentation, and multi-modal tasks.
The goal: To extend our current work to the scene level, our tasks can be categorized into the following aspects: 1) datasets[3][4], 2) efficient backbone model[5], 3) downstream tasks for benchmark[6]
For the datasets, we need a large-scale dataset of trained Gaussian splatted scenes, based on the available indoor datasets, e.g., ScanNet and ScanNet++. 3DGS training can take a large amount of time, this step will be done at very beginning.
Efficient backbone model is of great importance. We observe in our work, even at the object level, the downsampled 3D Gaussian splats lose many color and geometric details, which can hinder the downstream task performance. The main reason for downsampling is the compromise between pretraining speed and quality. With the vanilla transformer backbone, the n^2*d time complexity is too time consuming for all the splats input without downsampling. However on the scene level, downsampling is not the option to choose, as sparse Gaussians can not achieve good coverage of the scene. The goal is to explore efficient backbone architecture in the literature, e.g., windows-based transformer, early token fusion, as well as the state space model architecture Mamba.
To validate our pretrained model on the scene-level dataset, we carry out benchmark on the downstream tasks. The targets are ScanNet200 3D Semantic Label Benchmark and the 3D Semantic Segmentation benchmark on ScanNet++. On this end, the segmentation head needs to be attached to the pretraining model, we will need to take inspiration from the current top-performing methods.
Requirement: Familiar with Python and Pytorch. Prior experience with computer vision, e.g. take computer vision courses at ETH Zurich. Knowledge in 3D vision and pretraining mechanismis a plus.
References:
[1] Ma, Qi, et al. "ShapeSplat: A Large-scale Dataset of Gaussian Splats and Their Self-Supervised Pretraining." arXiv preprint arXiv:2408.10906 (2024).
[2] Irshad, Muhammad Zubair, et al. "NeRF-MAE: Masked AutoEncoders for Self Supervised 3D representation Learning for Neural Radiance Fields." arXiv preprint arXiv:2404.01300 (2024).
[3] Dai, Angela, et al. "Scannet: Richly-annotated 3d reconstructions of indoor scenes." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.
[4] Yeshwanth, Chandan, et al. "Scannet++: A high-fidelity dataset of 3d indoor scenes." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2023.
[5] Liang, Dingkang, et al. "Pointmamba: A simple state space model for point cloud analysis." arXiv preprint arXiv:2402.10739 (2024).
[6] Wu, Xiaoyang, et al. "Point Transformer V3: Simpler Faster Stronger." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024.
Supervisor: Qi Ma, ;
Professor: Ender Konukoglu
Introduction: In recent years, large models trained on extensive datasets with substantial GPU resources have demonstrated remarkable generalizability. These models excel across various domains, including novel ones not represented in their training data. Within the realm of computer vision, there are two primary types of large models: vision-language models and vision foundation models. Vision-language models are trained on paired vision and language inputs, while vision foundation models are primarily trained using visual information such as images and videos. Although these models are designed for general purposes, they may not perform optimally for specific tasks like few-shot segmentation.
Few-shot segmentation aims to segment arbitrary categories given a few support samples with ground-truth masks for those classes. This approach significantly reduces annotation efforts and has wide-ranging applications, including medical image analysis and autonomous driving. While some studies suggest that large models can segment novel classes without additional support samples, this ability has not been systematically studied. Moreover, the visual information from support samples can complement the knowledge embedded in current large models. Consequently, adapting large models for few-shot semantic segmentation by incorporating additional visual information holds great promise.
The goal of the project is to explore large models for few-shot semantic segmentation. First, a rigorous study on existing models’ ability under current few-shot segmentation settings will be conducted. Second, novel algorithms built upon those models will be developed to further enhance segmentation performance on new classes.
Requirement: Familiar with Python and Pytorch. Prior experience with computer vision, e.g. take computer vision courses at ETH Zurich. Knowledge in image/video semantic segmentation is a plus.
References:
[1] Lanyun Zhu, et. al., “LLaFS: When Large Language Models Meet Few-Shot Segmentation”, CVPR 2024
[2] Alec Radford, et. al., “Learning Transferable Visual Models From Natural Language Supervision”, ICML 2021
Supervisors: Dr. Guolei Sun, ;
Dr. Yawei Li, .ch
Professor: Ender Konukoglu
Introduction:
The objective of this project is to investigate why diffusion models achieve significantly better Fréchet Inception Distance (FID) scores compared to other generative models like Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs). Diffusion models, which generate images by gradually denoising a variable starting from pure noise, have shown remarkable performance in producing high-quality, diverse images. This project will explore the underlying mechanisms of diffusion models, such as their training dynamics, the role of the denoising process, and the architecture's capacity to capture complex data distributions, to understand the factors contributing to their superior performance in terms of FID scores. By comparing these models with GANs and VAEs through both theoretical analysis and empirical experiments, this research aims to uncover the reasons behind the enhanced image quality and realism achieved by diffusion models. [1], [2]
References:
[1] https://arxiv.org/abs/2208.09392
[2] external page https://arxiv.org/abs/2006.11239
The project can be performed as a master or semester thesis.
Supervisor:
Kyriakos Flouris,
Professor: Ender Konukoglu,
Introduction: Digital twins of transportation networks require accurate and detailed information, such as the number of lanes and turning relations, which are crucial for effective transport modeling. This project aims to enhance road network data from OpenStreetMap (OSM) using satellite images to fill in missing or incomplete information. Specifically, the goal is to develop a machine learning (ML) program to infer road network features from satellite images. The project will follow these steps:
Define Data Requirements: Determine the necessary resolution and detail of OSM data, focusing on critical features like the number of lanes and turning relations.
Overlap and Georeference Data: Align OSM data with satellite images, ensuring consistent georeferencing for accurate data integration. Extract Labels from OSM: Identify and extract labels (number of lanes, turning relations) from OSM data to use as training data the ML model.
Train the ML Model: Develop and train an ML model using satellite images and extracted OSM labels to predict road features.
Validate the Model: Implement a validation framework to evaluate the ML model's performance against the ground truth and refine the model to improve accuracy.
Notes: This project is in collaboration with ETH spin-off Transcality (https://transcality.com/). Please find more information on their website for what they are doing.
Requirement: Familiar with Python and Pytorch. Prior experience with computer vision, e.g. take computer vision courses at ETH Zurich. Knowledge in image classification is a plus.
Supervisor: Dr. Guolei Sun, ;
Dr. Lukas Ambühl,
References:
[1] Syed Waqas Zamir, et. al., Isaid: A large-scale dataset for instance segmentation in aerial images, CVPRW 2019
[2] Yun Liu, et. al., Transformer in convolutional neural networks, MIR 2024
Professor: Ender Konukoglu, ETF E113,
Introduction: Video semantic segmentation (VSS) is a pivotal task within the domain of computer vision, aiming to assign predefined categories to each frame of a video sequence. The challenge lies not only in achieving accuracy but also in maintaining temporal consistency, minimizing unwanted flickering artifacts across consecutive frames. This pursuit of stability and precision is further compounded by the contemporary landscape of computational constraints and the burgeoning demand for real-time processing capabilities, particularly on edge devices. Against this backdrop, binarization emerges as a compelling avenue for optimization, offering a pathway to significant reductions in both computational complexity and memory footprint by harnessing the power of 1-bit parameters and bitwise operations. In the forthcoming master's thesis project, our ambition is to explore the adaptability of binarization techniques within the realm of VSS, with a view towards improving efficiency without compromising segmentation performance.
Goal:
- Familiarize with VSS and binarization.
- Adopt binarization to existing VSS methods.
- Design novel algorithms for efficient VSS.
- Possibility of a submission to top AI conferences such as ICLR 2024 and CVPR 2024.
Requirement: Familiar with Python and Pytorch. Prior experience with computer vision, e.g. take computer vision courses at ETH Zurich. Knowledge in image/video semantic segmentation is a plus.
Supervisors: Dr. Guolei Sun, ;
Dr. Yawei Li, .ch
Professor: Ender Konukoglu
References:
[1] Guolei Sun, et. al., “Mining Relations among Cross-Frame Affinities for Video Semantic Segmentation”, ECCV 2022
[2] Haotong Qin, et. al., “BiMatting: Efficient Video Matting via Binarization”, NeurIPS 2023
[3] Amir Gholami, et. al., “A Survey of Quantization Methods for Efficient Neural Network Inference”
This project aims to revolutionize the analysis of electroencephalography (EEG) data by developing a specialized foundational model utilizing the principles of artificial intelligence. Despite the critical role of EEG in diagnosing and treating neurological disorders, challenges such as low signal-to-noise ratios and complex signal patterns hinder practical analysis. By adapting strategies from successful domains like natural language processing and computer vision, this project will build a machine learning model tailored for EEG signals. The model will undergo extensive pre-training on diverse EEG datasets to establish a robust understanding of neural activities, followed by fine-tuning for specific clinical tasks such as seizure detection and sleep stage classification. Our approach promises to enhance the accuracy, efficiency, and accessibility of EEG diagnostics, paving the way for improved patient outcomes. Validation and testing using standard performance metrics will measure the model's efficacy, setting a new standard in EEG analysis.
Keywords: EEG Analysis, Foundational Models, Large Language Models, Machine Learning, Deep Learning, Transfer Learning, Signal Processing
Description
Electroencephalography (EEG) is a fundamental tool in neuroscience, allowing us to monitor the brain's electrical activity non-invasively. It is beneficial for diagnosing and treating neurological disorders, but interpreting EEG data can be tricky due to many factors, such as low signal-to-noise ratio and the inherent complexity of EEG signals.
Large Language Models (LLMs), such as OpenAI’s GPT series, are a specific type of Foundational Model designed to understand, generate, and manipulate human language. LLMs are trained on extensive collections of text data, allowing them to learn a wide range of language patterns and nuances. This training enables them to perform various language-related tasks, from simple text generation to more complex applications like summarization, translation, and answering questions across multiple domains.
Foundational Models, a broader category, include LLMs and models designed for other data types, such as images, audio, and time-series. The common thread among all Foundational Models is their training approach: they are typically pre-trained on a large, diverse set of data to develop a broad understanding of a particular type of input, whether text, visual content, or EEG signals. After this extensive pre-training phase, these models are fine-tuned on more specific datasets or tasks to adapt their capabilities to more specialized applications.
Foundation models, known for their extensive pre-training on large datasets before being fine-tuned for specific tasks, have dramatically altered the landscape in natural language processing and computer vision. Nevertheless, their application in interpreting the complexities inherent in EEG data is still developing. This project proposes to develop a sophisticated foundational model specifically for EEG analysis, harnessing the capabilities of deep learning and AI.
Our project aims to make EEG analysis more accessible and accurate using the power of artificial intelligence. We plan to build a specialized foundational model for EEG data in this project.
Here is what we are going to do:
- Dataset Compilation: We will gather various open-source EEG datasets. We will focus on ensuring the data is diverse, covering different demographics, conditions, and ways the data was collected.
- Model Architecture Design: Next, we will develop a neural network architecture tailored for EEG data. We’ll take inspiration from successful models in other areas and adapt them to meet our needs.
- Pretraining: We will train our model using the datasets we've compiled. We'll use transfer and semi-supervised learning to make the most of our data.
- Fine-Tuning: We will fine-tune our model using a smaller, more specific set of data for tasks like detecting seizures or classifying sleep stages.
- Validation and Testing: Finally, we will test our model to see how well it performs. We'll use metrics like accuracy, precision, recall, and the F1-score to evaluate and compare it to existing methods.
Requirements:
- Strong programming skills in Python
- Machine Learning background (courses etc.)
Related literature
- Jiang, Wei-Bang, Li-Ming Zhao, and Bao-Liang "Large Brain Model for Learning Generic Representations with Tremendous EEG Data in BCI" ICLR 2024
- Wang, Christopher, et al. "BrainBERT: Self-supervised representation learning for intracranial recordings." ICLR 2023.
- Cui, Wenhui, et al. "Neuro-gpt: Developing a foundation model for eeg." arXiv preprint arXiv:2311.03764 (2023)
- Chen, Yuqi, et al. "EEGFormer: Towards Transferable and Interpretable Large-Scale EEG Foundation Model." arXiv preprint arXiv:2401.10278 (2024).
Goal
Our goal is to make EEG analysis not only better but also easier for researchers and clinicians. This project is about bringing practical, effective AI solutions into the world of neuroscience to help improve how we understand and treat the human brain.
Contact Details
Please include your CV and transcript in the submission.
Thorir Mar Ingolfsson: external page https://thorirmar.com;
Yawei Li: https://yaweili.bitbucket.ioexternal page /;
Xiaying Wang: external page https://xiaywang.github.io/;
Professors
Luca Benini, Ender Konukoglu
Introduction and Project Description
Cardiovascular diseases stand as a predominant cause of death worldwide, presenting a significant challenge to global health systems. Early detection and timely intervention in these conditions are paramount to reduce mortality rates and improve patient outcomes. Central to this early detection is electrocardiograms (ECG), a critical diagnostic tool that provides a graphical representation of the heart's electrical activity. ECGs are instrumental in identifying various heart conditions, including arrhythmias, myocardial infarctions, and other cardiac anomalies. However, the traditional approach to ECG analysis, which relies heavily on manual examination by healthcare professionals, is not without its limitations. This process can be labor-intensive, time-consuming, and subject to human error, potentially leading to misdiagnoses or delayed treatment.
In response to these challenges, this project aims to develop a groundbreaking, automated system for detecting anomalies in ECG signals, utilizing the latest advancements in deep learning and foundational models. This system is designed to leverage the capabilities of artificial intelligence to interpret ECG data with precision and efficiency that surpasses traditional methods.
The core of this initiative involves training deep learning models on extensive datasets of ECG recordings. These models will be adept at identifying subtle patterns and deviations in the ECG signals that may indicate the presence of cardiovascular anomalies. By incorporating a wide range of ECG data, including signals from diverse patient populations and various cardiac conditions, the models will be well-equipped to recognize a broad spectrum of cardiac irregularities.
Furthermore, using foundational models in this project represents a significant innovation in medical diagnostics. These models, characterized by their extensive pre-training on large datasets, bring a heightened accuracy and adaptability to the analysis of ECG signals. Their ability to generalize from vast amounts of data makes them particularly suitable for this application, where detecting minute and often complex anomalies is critical.
The ultimate goal of this project is to create an automated ECG analysis system that is not only highly accurate but also efficient and scalable. This system can transform current practices in cardiovascular diagnostics, making early detection more accessible and reliable. In doing so, it could significantly contribute to reducing the global burden of cardiovascular diseases, ultimately saving lives and improving the quality of healthcare delivery.
Your task:
- To create a deep learning model for accurate ECG anomaly detection: Employ foundational models to analyze ECG data and identify abnormalities.
- To utilize large-scale, open-source ECG datasets for training: Leverage diverse datasets to ensure the model is robust and generalizable.
- To compare the model's performance with traditional methods: Assess the model's effectiveness in real-world scenarios against standard ECG analysis techniques.
Methodology
- Dataset Assembly: Gather and preprocess open-source ECG datasets on various cardiac conditions.
- Model Development: Design and train a deep learning model, integrating concepts from existing successful foundational models.
- Anomaly Detection Implementation: Implement the model to identify anomalies in ECG signals, such as arrhythmias, myocardial infarction indicators, etc.
- Performance Evaluation: Test the model using standard metrics like sensitivity, specificity, and ROC-AUC. Conduct a comparative analysis with conventional ECG analysis methods.
- Refinement: Based on testing results, refine the model to improve accuracy and reliability.
Expected Outcomes
- A highly accurate deep learning model for ECG anomaly detection, contributing to faster and more reliable cardiac disease diagnosis.
- A comprehensive analysis of the model's performance, providing insights into the potential of foundational models in medical diagnostics.
- Publication-worthy results that can be shared with the medical and AI research communities.
Status: Available
Looking for Master Project Students
Your Profile
- Background in Engineering (Biomedical, Mechanical, Chemical…)
- Familiar with Deep Learning, experience with TensorFlow or PyTorch
- Motivation to work on a project at the intersection of Neuroscience and Engineering
- Willingness to work in an interdisciplinary team (Hardware, Machine Learning, Psychology, Neuroscience)
Reach out
If you are interested, we would love to get to know you! Please write an Email to all co-supervisors, including your motivation, CV, and transcript of records:
Co-Supervision:
- Yawei Li:
- Thorir Mar Ingolfsson:
- Andrea Cossettini:
Professors:
Luca Benini, Ender Konukoglu
Introduction and Project Description [MA,SA]
Electroencephalography (EEG) stands as a pivotal non-invasive method for capturing the brain's electrical activity, playing an indispensable role in both neurological research and clinical diagnostics. It acts as a portal to the brain's complex mechanisms, providing crucial insights for diagnosing and treating a range of neurological conditions. Yet, the analysis of EEG signals is a complex endeavor, challenging due to the intricate nature of these signals and the nuanced differentiation required between normal and abnormal brain patterns.
This project is uniquely situated at the confluence of neuroscience and the rapidly evolving domain of artificial intelligence (AI). Its objective is to innovate in EEG signal analysis, addressing the nuanced challenges inherent in these signals. This initiative recognizes a significant void in AI applications: while foundation models have revolutionized areas like image and language processing, their impact on time-series data analysis, particularly for EEG, is yet to be fully realized.
Foundation models, known for their extensive pre-training on large datasets before being fine-tuned for specific tasks, have dramatically altered the landscape in fields like natural language processing and computer vision. Nevertheless, their application in interpreting the complexities inherent in EEG data is still in its nascent stages. This project proposes to develop a sophisticated foundational model specifically for EEG analysis, harnessing the capabilities of deep learning and AI.
This model represents a major advancement in the concept of 'AI for science,' where AI is not merely a tool for automation but a collaborative force in scientific exploration. By integrating advanced computational methods with rich EEG datasets, this model is designed to go beyond traditional EEG signal interpretation limits. Utilizing large-scale, publicly available EEG datasets, the model will undergo training to identify and analyze an extensive range of EEG patterns, covering both typical and atypical brain activities.
The project aims to address the existing limitations in EEG analysis, such as the reliance on labor-intensive manual interpretation and vulnerability to subjective bias. Through the automation of the analysis process and the establishment of standardized interpretation criteria, the foundational model is expected to markedly improve the efficiency and accuracy of EEG signal analysis, thereby reducing the time and effort required for precise diagnostics.
This initiative therefore seeks to pioneer a new approach in EEG signal analysis, developing an advanced, AI-based foundational model. This model, by effectively interpreting a wide array of EEG data and patterns, aims to enhance the accuracy and efficiency of EEG analysis, contributing significantly to advancements in neurological research and clinical diagnostics.
Your task:
- To develop a foundational model specifically tailored for EEG signal analysis: The model will be designed to capture the unique characteristics of EEG data.
- To leverage open-source datasets for pretraining: Utilize extensive, publicly available EEG datasets to train the model, ensuring a comprehensive and diverse data foundation.
- To evaluate the model's performance in real-world applications: Test the model's effectiveness in tasks such as anomaly detection, pattern recognition, and predictive analysis in EEG signals.
Methodology
- Dataset Compilation: Collect and preprocess open-source EEG datasets, ensuring data quality and diversity. This includes datasets from different demographics, conditions, and acquisition settings.
- Model Architecture Design: Design a neural network architecture suitable for EEG signal analysis, possibly incorporating elements from existing successful models in other domains.
- Pretraining: Train the model on the compiled dataset, employing transfer learning and unsupervised or semi-supervised learning methods.
- Fine-Tuning: Refine the model with a smaller, more specialized dataset for specific tasks (e.g., seizure detection, sleep stage classification).
- Validation and Testing: Evaluate the model's performance using standard metrics like accuracy, precision, recall, and F1-score. Perform comparative analysis with existing models.
Expected Outcomes
- A robust foundational model pre-trained on diverse EEG data, capable of adapting to various specific EEG analysis tasks.
- Enhanced accuracy and efficiency in EEG signal interpretation, contributing to better diagnostic and therapeutic approaches in neurology.
- A significant contribution to the field of biomedical signal processing, demonstrating the potential of foundational models in healthcare.
Status: Available
Looking for Master Project Students
Your Profile
- Background in Engineering (Biomedical, Mechanical, Chemical…)
- Familiar with Deep Learning, experience with TensorFlow or PyTorch
- Motivation to work on a project at the intersection of Neuroscience and Engineering
- Willingness to work in an interdisciplinary team (Hardware, Machine Learning, Psychology, Neuroscience)
Reach out
If you are interested, we would love to get to know you! Please write an Email to all co-supervisors, including your motivation, CV, and transcript of records:
Co-Supervision:
- Yawei Li:
- Thorir Mar Ingolfsson:
- Andrea Cossettini:
Professors:
Luca Benini, Ender Konukoglu
Introduction: Open vocabulary video semantic segmentation (OV-VSS) aims to assign a semantic label to each pixel of each frame of the video given an arbitrary set of open-vocabulary category names. There are a number of attempts on open vocabulary image semantic segmentation (OV-ISS). However, OV-VSS does not get enough attention due to the difficulty of video understanding tasks in modeling local redundancy and global correlation. In this master thesis project, we plan to fill the gap by extending existing OV-ISS methods to OV-VSS. Specifically, we aim to develop a OV-VSS method which achieves high accuracy by using temporal information and keeps high efficiency.
Goal:
- Familiarize with OV-ISS and OV-VSS.
- Adopt existing video semantic segmentation methods to OV-ISS.
- Propose an algorithm for OV-VSS.
- Possibility of a submission to top AI conferences such as NeurIPS 2024 and ICLR 2024.
Requirement: Familiar with Python and Pytorch. Prior experience with computer vision, e.g. take computer vision courses at ETH Zurich. Knowledge in image/video semantic segmentation is a plus.
Supervisor: Dr. Guolei Sun, .ch;
Dr. Yawei Li,
Professor: Ender Konukoglu
References:
[1] Guolei Sun, et. al., “Mining Relations among Cross-Frame Affinities for Video Semantic Segmentation”, ECCV 2022
[2] Guolei Sun, et. al., “Coarse-to-Fine Feature Mining for Video Semantic Segmentation”, ECCV 2022
[3] Feng Liang, et. al., “Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP”, CVPR 2023
Introduction:
In statistical learning, understanding the models' bias-variance trade-off is crucial, particularly under specific assumptions. This concept is vital from a domain generalization standpoint, as it relates to the divergence between source and target distributions. In the field of medical imaging, Castro et al. [1] have highlighted this by using a causal diagram (see Fig. 5) to illustrate medical image generation, which informs the divergence and consequently the bias-variance trade-off for an optimal statistical model.
Building on Castro et al.'s framework and the principles of the Shepp-Logan phantom [2], our project aims to develop a mechanism for generating toy medical image data. This mechanism will allow us to freely define and manipulate certain assumptions, thereby enabling fast and effective assessment of our models. Familiarity with Python and PyTorch will be beneficial, as they form the basis of our development and assessment platform.
References:
[1] Castro, D.C., Walker, I. & Glocker, B. Causality matters in medical imaging. Nat Commun 11, 3673 (2020). https://doi.org/10.1038/s41467-020-17478-w
[2] L. A. Shepp and B. F. Logan, "The Fourier reconstruction of a head section," in IEEE Transactions on Nuclear Science, vol. 21, no. 3, pp. 21-43, June 1974, doi: 10.1109/TNS.1974.6499235.
Supervisor:
Güney Tombak,
Professor: Ender Konukoglu,
Master Thesis in Machine Learning for NMR Spectroscopy:
Background and Motivation:
Nuclear Magnetic Resonance (NMR) spectroscopy plays a vital role in molecular structure analysis, particularly for proteins. The technique's effectiveness largely depends on the strength of the external magnetic field. High-field NMR spectrometers, often operating above 600 MHz, offer better data clarity but come with high technological and financial costs. Our initial work contributed to this area by exploring deep learning to enhance lower-field NMR spectra, aiming to approach the quality of high-field data. This approach has the potential to make high-field NMR more accessible and cost-effective. The following research phase seeks to refine these methods and explore their practical applications, hoping to broaden the toolset available for scientific investigation.
Project Description:
This master thesis, "Virtual Magnetic Field Enhancement in NMR Spectroscopy," invites you to engage in this evolving research area. The project focuses on several key objectives:
- Model Development: Adapting, training, and testing machine learning models for various NMR pulse sequences, including NOESY, and scaling these models to handle larger datasets.
- Bridging Theory and Practice: Transferring insights from synthetic to experimental data is a crucial step in testing the feasibility and accuracy of our approach.
- Practical Application Testing: Demonstrating how the developed algorithm can be applied in real-world scenarios to enhance the utility of NMR spectroscopy potentially. Practical examples could involve testing enhanced spectra with established NMR data analysis workflows, such as ARTINA and NMRtist (nmrtist.org).
- Prediction Quality Optimization: Investigating the best compromise in prediction quality between low and high-field NMR spectra, which is key to ensuring the practicality of our enhancements.
- Exploration of Diffusion Models: Experiment with emerging diffusion models to explore new possibilities in NMR spectrum analysis.
Your Role and Impact:
By participating in this project, you will contribute to an ongoing effort to advance NMR spectroscopy. Your work will help explore how machine learning can open new doors in scientific research, potentially making high-field NMR data more accessible. This project is an opportunity to challenge current technological limits and contribute to a field with broad implications for molecular biology and drug development.
We invite you to join this journey in advancing NMR spectroscopy and contributing to a meaningful scientific endeavour.
Supervisor:
Nicolas Schmid,
Professor: Ender Konukoglu, ETF E113,
Introduction:
Recent developments in the field of computer vision have highlighted the growing prominence of foundation models, particularly those like DINOv2 [1] and Segment-Anything [2], which have achieved impressive outcomes in processing natural images. Yet, the effectiveness of these models in medical imaging remains somewhat ambiguous. This project intends to bridge this gap by rigorously examining various training methodologies for these models. Our goal is to explore the most effective approaches to adapt these advanced foundation models for medical imaging, thereby enhancing their utility and potential impact in healthcare and medical research.
References:
[1] external page https://dinov2.metademolab.com/
[2] external page https://segment-anything.com/
Supervisors:
Anna Susmelj,
Ertunc Erdil,
Professor: Ender Konukoglu,
Introduction:
In contemporary computer vision applications, neural networks have demonstrated exceptional proficiency in semantic segmentation tasks across diverse application domains. However, these models are susceptible to significant performance degradation in real-world scenarios due to distributional disparities. This remains a prevalent concern in safety-critical domains such as autonomous driving and medical imaging. Recent research emphasizes the critical role of network architecture selection in addressing challenges related to domain generalization[1,2]. Specifically, transformer architectures have exhibited notably superior generalization capabilities, particularly in autonomous driving contexts. Conversely, classical convolutional networks like the widely adopted UNet continue to dominate the landscape in medical imaging applications [3]. This study aims to delve into the impact of network architecture in medical imaging contexts. Our objective is to identify a versatile architectural paradigm applicable to a wide spectrum of computer vision tasks, ranging from autonomous driving to medical imaging.
References:
[1] external page https://openaccess.thecvf.com/content/CVPR2022/html/Hoyer_DAFormer_Improving_Network_Architectures_and_Training_Strategies_for_Domain-Adaptive_Semantic_CVPR_2022_paper.html
[3] external page https://arxiv.org/abs/2004.04668
Supervisors:
Anna Susmelj,
Lukas Hoyer,
Professor: Ender Konukoglu,
Introduction:
The application of prior knowledge, in the form of exemplary shapes extracted from segmented volumetric medical images, has emerged as an appealing approach for reconstructing anatomical shapes from limited or incomplete measurements [1]. However, in certain medical contexts, essential anatomical structures might be absent in these segmentations, despite their critical role in clinical diagnostics. In this project, we aim to utilize a synthetic dataset from a rigid anatomical atlas in order to use it as a strong prior on anatomical shape variations via a combination of conditional diffusion models [2, 3] as a generative model and an implicit function as smoothness prior [4].
Requirements:
- Programming knowledge of Python.
- Familiarity with PyTorch
- Good mathematical background
References:
[1] external page https://openreview.net/forum?id=UuHtdwRXkzw
[2] external page https://arxiv.org/abs/2111.05826
[3] external page https://arxiv.org/abs/2302.05543
[4] external page https://arxiv.org/pdf/2303.12865.pdf
Supervisors:
Anna Susmelj,
Kyriakos Flouris,
Professor: Ender Konukoglu,
This project aims to integrate the principles of classical generative adversarial networks (GANs) with those of quantum generative adversarial networks (QGANs) to generate realistic, high-dimensional image data. Drawing inspiration from the foundational studies in [1], [2], and [3], the objective is to assess the potential of synthesizing images using near-term quantum devices, for example the noisy intermediate-scale quantum (NISQ) devices. The main aspect of the project will be deploying the methodologies from [1], [2], and [3] to train models on high-quality open-source datasets and/or investicating the capability of QGANs equipped with quantum circuit generators to produce images without resorting to dimensionality reduction or classical data processing techniques.
References:
[1] H.-L. Huang, Y. Du, M. Gong, Y. Zhao, Y. Wu, C. Wang, S. Li, F. Liang, J. Lin, Y. Xu, R. Yang, T. Liu, M.-H. Hsieh, H. Deng, H. Rong, C.-Z. Peng, C.-Y. Lu, Y.-A. Chen, D. Tao, X. Zhu, and J.-W. Pan, “Experimental quantum generative adversarial networks for image generation,” Phys. Rev. Appl., vol. 16, p. 024051, 2021.
[2] I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. Courville, “Improved training of Wasserstein GANs,” 2017, arXiv:1704.00028v3.
[3] S. Lok Tsang, M. T. West, S. M. Erfani and M. Usman, “Hybrid Quantum-Classical Generative Adversarial Network for High Resolution Image Generation,” 2023, arXiv:2212:11614v2.
Supervisor:
Kyriakos Flouris,
Professor: Ender Konukoglu, ETF E113,
This project aims to design an adaptable encoder model that leverages various medical imaging datasets. The model will be trained utilizing self-supervised and contrastive learning methods such as SimCLR [1] and masked autoencoders [2], emphasizing versatility and high performance to serve multiple applications in the medical imaging sector.
The project offers an opportunity to experiment with different machine learning paradigms, improve model performance, and tackle unique challenges presented by medical image datasets. The objective is to create a robust encoder model that can effectively serve as a backbone for a variety of tasks in medical imaging. Prerequisites for this project include a solid understanding of deep learning and prior experience with PyTorch framework.
References:
[1] Chen, T., Kornblith, S., Norouzi, M., & Hinton, G. (2020, November). A simple framework for contrastive learning of visual representations. In International conference on machine learning (pp. 1597-1607). PMLR.
[2] He, K., Chen, X., Xie, S., Li, Y., Dollár, P., & Girshick, R. (2022). Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 16000-16009).
Supervisors:
Güney Tombak ()
Ertunc Erdil ()
Professor: Ender Konukoglu, ETF E113,
The standard approach for estimating hemodynamics parameters involves running CFD simulations on patient-specific models extracted from medical images. Personalization of these models can be performed by integrating further data from MRA and Flow MRI. In this project we aim to estimate hemodynamics parameters from flow and anatomical MRI, which can be routinely acquired in clinical practice. The flow information and the geometry will be combined together in a computational mesh and will be processed using Graph Convolutional Neural Networks (GCNN) or other deep learning method. A motivated student will explore integration of the CFD model into the deep network construction. The proposed models will be trained using existing synthetic MRI datasets.
Supervisors:
Kyriakos Flouris ()
Professor: Ender Konukoglu, ETF E113,
Reducing the time of magnetic resonance imaging (MRI) data acquisition is a long standing goal. Shorter acquisition times would come with many benefits, e.g. higher patient comfort or enabling dynamic imaging (e.g. the moving heart). Ultimately it can lead to higher clinical throughput, which reduces the cost of MRI for one individual and will make MRI more widely accessible.
One possible avenue towards this goal is to under-sample the acquision and incorporate prior knowledge to solve the resulting ill-posed recosntruction problem. This strategy has received much attention and many different methods have been proposed.
In this project we aim to understand performance differences between the different methods and analyse which components make them work. We will implement State-of-the-art reconstruction methods and perform experiments to judge their performance and robustness properties.
Depending on student's interests the project can have a different focus:
- Supervised Methods [1]
- Unsupervised Methods [2], [3], [4]
- Untrained Methods [5]
References:
[1]: external page https://onlinelibrary.wiley.com/doi/10.1002/mrm.28827
[2]: external page https://ieeexplore.ieee.org/document/8579232
[3]: external page https://ieeexplore.ieee.org/document/9695412
[4]: external page https://link.springer.com/chapter/10.1007/978-3-031-16446-0_62
[5]: external page https://arxiv.org/abs/2111.10892
Supervisors:
Georg Brunner ()
Emiljo Mehillaj ()
Professor: Ender Konukoglu, ETF E113,