ICIP Workshop

Learning-by-Myself (LbM): Self-supervised learning in biosignals and biomedical image processing

Description

Abstract

Organizers

Description

Abstract

Organizers

Description

Self-Supervised Learning (SSL) is revolutionizing deep learning technologies by enabling the training of models without human-labeled data. In signal and image processing, cutting-edge systems rely on large Transformer neural networks pretrained on extensive signal/image datasets, using methods like contrastive or multitask learning. SSL is further explored in biosignals and medical image processing, dealing with diagnosis and disease prediction, leveraging unlabeled data to train models, reducing the need for manual annotation. This approach is transforming these fields by improving model performance and enabling the use of large, diverse datasets, leading to more accurate and efficient medical diagnostic and analysis tools.

Despite the surge in accepted articles mentioning SSL techniques in recent top-tier conferences, challenges hinder broader adoption in real-world applications. SSL models face complexity issues, lack a standardized evaluation protocol, exhibit biases and robustness concerns, and lack integration with related modalities like text or video.

The LbM workshop addresses these challenges by fostering interactions among experts. Through two keynotes, a panel, and paper presentations, LbM aims to unite the SSL community, including experts from various modalities, to frame SSL as a groundbreaking solution for biosignals and biomedical image processing and beyond. This workshop provides a dedicated platform for discussing and advancing SSL technologies, paving the way for their broader application in diverse fields.

Expected audience: We plan to have the presentation of 13 regular papers, two invited talks, and a panel, anticipating an audience >80.
Keywords: Self-supervised learning (SSL), Transformer Neural Networks, Contrastive Learning, Multitask Learning, Model Performance, Biosignal/Biomedical Image Processing, Medical Diagnostic Tools.
IEEE Xplore publication: Yes
Duration: Full day (6 hours)

Abstract

The effectiveness of deep learning relies heavily on the quality of the representations it uncovers from data [1]. These representations should capture essential structures within the raw input, such as intermediate concepts, features, or latent variables, which are beneficial for subsequent tasks. While supervised learning with large annotated datasets can yield valuable representations, acquiring such datasets is often expensive, time-consuming, and impractical. Additionally, annotations are often insufficiently detailed for many potential applications, and the supervised representations they produce may be biased toward the specific task they were trained on, limiting their utility in other contexts [2]. This challenge is particularly pronounced in various applications. For instance, annotation in medical data is challenging due to the need for specialized expertise, time-consuming manual processes, and potential inconsistencies in annotations. This leads to limited availability of annotated datasets, hindering the development and evaluation of machine learning models in medical applications, especially for rare conditions or specialized domains [3].

To address these challenges, researchers are exploring unsupervised [1] and self-supervised learning (SSL) [4-7]. In particular, SSL, i.e., the process of training models to produce meaningful representations using unlabeled data, is a promising solution to challenges caused by difficulties in curating large-scale annotations. Unlike supervised learning, SSL can create generalist models that can be fine-tuned for many downstream tasks without large-scale labeled datasets. While there is already a growing trend to leverage SSL in recent medical imaging AI literature, as well as a few narrative reviews [8, 9], the most suitable strategies and best practices for medical images have not been sufficiently investigated.

Moreover, biomedical signals (biosignals) represent a fundamental resource in the biomedical domain, including many modalities, such as electroencephalography (EEG), electromyography (EMG), and electrocardiography (ECG), magnetoencephalography (MEG), bioacoustics (BAC), electrooculography (EOG), electrodermal activity (EDA), speech. Moreover, with the progress of IoT (Internet of Things) and the spread of wearable devices, their role is increasingly becoming more relevant, especially in telemedicine and precision medicine [10]. Contrastive learning (e.g. SimCLR, BYOL, SwAV, CPC) is the primary choice in most of the works dealing with single- or multi-class pathology classification for SSL-based biosignals analysis [11, 12], combined with different data augmentation techniques [13]. Regardless of the choice of the upstream and downstream tasks within the SSL, it is important to introduce modules or blocks which are able to exploit important prior domain knowledge. In this regard, biologically meaningful and compatible to the medical task data augmentation techniques could be more beneficial than adding some complexity to a standard SSL pre-training strategy designed to work with poor-data augmentation techniques.

The current trend is to evaluate SSL models solely based on their downstream performance, hence necessitating potentially dozens or hundreds of costly fine-tunings. Another direction, currently represented by scarce literature [14], proposes to measure the quality and robustness of a given representation without downstream fine-tuning, speeding-up drastically the development process. The LbM workshop will offer a forum for the SSL community to actively discuss the best practices for exploitation of important domain knowledge, biologicaly meaningful augmentation and how SSL models should be evaluated.

Multimodality is certainly the use case where SSL could unleash its true potential. In the medical domain, multimodal data are often complementary with each other, meaning that each type of data (e.g., images, sensor data, text report) can be used to extract unique latent representations and allow a better understanding of pathology at the subject level. Combining embeddings from different sources may improve the generalization capability of models and open new possibilities for deep phenotyping [15] and precision medicine. Nevertheless, achieving multimodal SSL is a long-term goal that we should tackle step by step as a community. With LbM, we will hope to encourage original research in the direction of SSL for biosignals combined with other modalities (biomedical images, text descriptions).

original research in the direction of SSL for biosignals combined with other modalities (biomedical images, text descriptions).

Another SSL problem is the data aggregation procedure. Although there is evidence that the aggregation of multiple datasets can improve the accuracy of downstream models [16], [17], domain adaptation [18], i.e. the problem of avoiding significant performance degradation due to changes in the marginal distribution of the feature space space (domain shift), remains a major issue which needs to be further investigated. Since SSL for medical images is a promising yet nascent research area and the optimal strategies for training these models are still to be explored, we hope that LbM workshop will stimulate researchers to discuss and explore different categories of self-supervised learning for their biosignals/medical image datasets, in addition to fine-tuning strategy and pre-trained weights.

Although there exist several tools which facilitate the processing of biosignals/biomedical images, the lack of data standardization is still a relevant issue, which could negatively affect the investigation of novel SSL approaches. In addition, social and technical biases for SSL models are an active field of research [18-20]. Interestingly however, and despite the clear growing adoption of SSL, the inclusiveness and robustness of SotA models remains a completely open question. More precisely, SSL architectures currently struggle at encompassing the information from diverse populations and different pathologies, making them potentially unfair or unreliable with realistic conditions [19-22]. Furthemore, the complexity of biosignals/biomedical images-videos not only reflects in the nature of the neural architecture that must be tailored for the considered domain, but also in the need for extreme compute resources for the training of such models, which often requires weeks or months and hundreds of high-end GPUs, each worth many thousands of euros [23]. Hence, and despite a short-term astonishing jump in performance , large-scale SSL models may quickly become a major barrier for academic research as it is already impossible for the vast majority of the institutions to train them, hence relying on two or three companies. Very few attemps have been made to solve this issue [24], and we hope that the LbM workshop will foster interest around the efficiency of SSL models that appears as a critical topic in a world facing climate change.

The subject(s) proposed to be covered by the LbM workshop showcase specificity within the aim of the ICIP 2024 that deals with an emerging approach in signal processing and computer vision in the area of biomedical engineering and would compliment and add value to the conference, attracting resarchers both from the field of computer science and biomedical engineering, generating new momentum in the field, by proposing SSL-based solutions in the areas of medical diagnostics, precision medicine and disease prediction. The characteristics of the LbM focus on the emerging field of SSL within the scope of biomedical engineering, fostering interdisciplinarity and synergies with other communities, such as medical doctors and researchers, following the example of flagship conferences, such as IEEE ISBI, EMBC, ICASSP.

References:

- Y. Bengio, “Deep learning of representations for unsupervised and transfer learning,” In Proceedings of ICML workshop on unsupervised and transfer learning (pages 17–36), 2012.
- M. T Rosenstein, Z. Marx, L. P. Kaelbling, and T. G Dietterich, “To transfer or not to transfer,” In NIPS 2005 workshop on transfer learning, vol. 898, pp. 1–4, 2005.
- A. Sylolypavan, D. Sleeman, H. Wu, and M. Sim, “The impact of inconsistent human annotations on AI driven clinical decision making,” NPJ Digit Med. 2023 Feb 21; vol 6, no. 1, pg. 26. doi: 10.1038/s41746-023-00773-3.
- C. Doersch and A. Zisserman “Multi-task self-supervised visual learning,” In Proceedings of the IEEE International Conference on Computer Vision (pages 2051–2060), 2017.
- S. Gidaris, P. Singh, and N. Komodakis. “Unsupervised representation learning by predicting image rotations,” arXiv preprint arXiv:1803.07728 (2018).
- I. Misra, C L. Zitnick, and M. Hebert, “Shuffle and learn: unsupervised learning using temporal order verification. In European Conference on Computer Vision,” pp. 527–544, Springer, 2016.
- R. D. Hjelm, A. F., et al., “Learning deep representations by mutual information estimation and maximization,” In International Conference on Learning Representation, 2018.
- R. Krishnan, P. Rajpurkar, and E. J. Topol, “Self-supervised learning in medicine and healthcare,” Nat. Biomed. Eng., vol. 6, no. 12, pp. 1346-1352, 2022.
- S. Shurrab and R. Duwairi, “Self-supervised learning methods and applications inmedical imaging analysis: a survey,” PeerJ Comput. Sci., vol. 8, pp. e1045, 2022.
- I. Cheol Jeong et al., “Wearable devices for precision medicine and health state monitoring,” IEEE Transactions on Biomedical Engineering, vol. 66, no. 5, pp. 1242–1258, 2018.
- T. Mehari and N. Strodthoff, “Self-supervised representation learning from 12-lead ecg data,” Computers in biology and medicine, vol. 141, p. 105114, 2022.
- M. Nakamoto et al., “Self-supervised contrastive learning for electrocardiograms to detect left ventricular systolic dysfunction,” Proceedings of the Annual Conference of JSAI, vol. JSAI2022, 2022.
- S. Soltanieh et al., “Analysis of augmentations for contrastive ECG representation learning,” in 2022 International Joint Conference on Neural Networks (IJCNN). IEEE, pp. 1–10, 2022.
- S. Zaiem, T. Parcollet, S. Essid, and A. Heba. “Pretext tasks selection for multitask self-supervised audio representation learning,” IEEE Journal of Selected Topics in Signal Processing, pp. 1–15, 2022.
- P. N. Robinson, “Deep phenotyping for precision medicine,” Human mutation, vol. 33, no. 5, pp. 777–780, 2012.
- A. Lemkhenter and P. Favaro, “Towards sleep scoring generalization through self-supervised meta-learning,” In 44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC). IEEE, 2022, pp. 2961–2966.
- M. N. Mohsenvand et al., “Contrastive representation learning for electroencephalogram classification,” In Machine Learning for Health. PMLR, 2020, pp. 238–253.
- H. Guan and M. Liu, “Domain adaptation for medical image analysis: a survey,” IEEE Transactions on Biomedical Engineering, vol. 69, no. 3, pp. 1173–1185, 2021.
- R. Bommasani, D. A. Hudson, et al., “On the opportunities and risks of foundation models,” CoRR, abs/2108.07258, 2021.
- J. Buolamwini and T. Gebru, “Gender shades: Intersectional accuracy disparities in commercial gender classification,” In (Sorelle A. Friedler and Christo Wilson, Eds), Proceedings of the 1st Conference on Fairness, Accountability and Transparency, vol. 81 of Proceedings of Machine Learning Research, pp. 77–91. PMLR, 23–24 Feb 2018.
- M. Garnerin, S. Rossato, and L. Besacier, “Gender Representation in French Broadcast Corpora and its Impact on ASR Performance,” In 1st International Workshop, pp. 3–9, Nice, France, October 2019. ACM Press.
- W.-N. Hsu, A. Sriram, et al., “Robust wav2vec 2.0: Analyzing domain shift in self-supervised pre-training,” ArXiv, abs/2104.01027, 2021.
- S. Evain, H. Nguyen, et al., “Task agnostic and task specific self-supervised learning from speech with lebenchmark,” In NeurIPS 2021, August 2021.
- P.-H. Chi, P.-H. Chung, et al., “Audio albert: A lite bert for self-supervised learning of audio representation,” In 2021 IEEE Spoken Language Technology Workshop (SLT), pp. 344–350. IEEE, 2021.

Planned composition:The LbM workshop will be structured as follows:

- Invited talk 1 [30 mins (25mins presentation + 5 mins Q&A)]
- Regular papers 1-7 (each paper 15 mins presentation + 5mins Q&A, total 2:20 hrs; 10 mins break)
Afternoon session (3hrs)
- Invited talk 2 [30 mins (25mins presentation + 5 mins Q&A)]
- Regular papers 8-13 (each paper 15 mins presentation + 5mins Q&A, total 2:00 hrs; 10 mins break)
- Discussion panel (the two keynote speakers, moderated by the LbM organizers) (20mins)

Measure to ensure diversity:

Diverse Speaker Representation: The invited speakers will be balanced in terms of backgrounds, including gender, race, ethnicity, and career stage. This will provide diverse perspectives and role models for participants.
Inclusive Program Content: The aforementioned program’s focus areas cover a wide range of topics and are accessible to individuals with varying levels of expertise. The panel discussion will also address issues related to availability of biosignals/biomedical imaging datasets for under-represented groups.
Financial Support: We plan to offer financial assistance, i.e., registration fee waiver, to help individuals from under-represented groups attend the workshop. This can help mitigate financial barriers to participation.
Diversity in Organizing Committee: The workshop's organizing committee is diverse and inclusive, with members from different backgrounds and perspectives (biomedical engineering, computer science, physics, electrical and computer engineering).
Promotion and Outreach: We will engage a variety of channels to promote the workshop and reach a diverse audience, i.e., targeted outreach to organizations and communities in the Middle East and North Africa (MENA) region.

Organizers

Prof. Leontios Hadjileontiadis Professor leontios.hadjileontiadis@ku.ac.ae

Prof. Thanos Stouraiti Professor thanos.stouraitis@ku.ac.ae

Prof. Panos Liatsis Professor panos.liatsis@ku.ac.ae

Prof. Panos Liatsis Professor naoufel.werghi@ku.ac.ae

Prof. Marius George Linguraru Professor Not KU member

Sheikh Zayed Institute for Pediatric Surgical Innovation at Children's National Hospital, Washington, D.C.; Dept. of Biomedical Engineering, George Washington University, School of Medicine and Health Sciences, Washington, D.C (MLingura@childrensnational.org)

ICIP Workshop

Learning-by-Myself (LbM): Self-supervised learning in biosignals and biomedical image processing

Description

Abstract

Organizers

KU AI