MA/MSc Internship for EUGLOH program

Title: Multimodal learning for the calibration of low-cost air pollutant sensors and their prediction

Keywords: Deep learning, multi-modal data, sensor calibration, air pollutant, domain shift

Internship Duration: 30/11/-1 - 30/11/-1

Head of the hosting team: vincent Vigneron

Address of the host laboratory:
IBISC EA4526
Team SIAM
40 rue du Pelvoux
91020 Courcouronnes France

Supervisor 1: Vincent VIGNERON
E-mail: vincent.vigneron@univ-evry.fr
Phone: +33663568760

Supervisor 2: Jean-Philippe CONGE

Internship description:

Since COP21, atmospheric pollutant measurement systems have rapidly grown, which, combined with crowdsourcing, make it possible to represent air quality spatially [4]. This mapping initiated by the traditional actors of surveillance, local communities, is in its infancy. Nevertheless, it raises the question of data uncertainty, their exploitation, and the possibilities new AI technologies offer, particularly by deep learning [1] for (regulatory) air quality surveillance [5,6]. It also questions the drift of sensors between laboratory calibration and their use in the real world. Many parameters can affect this drift. The accuracy of these sensors is relative, and the measurement uncertainties are still poorly known.
Low-cost sensors (LCS) have emerged as promising tools for air quality measurements, finding applications in indoor monitoring situations, such as schools and hospitals, and outdoor deployments in smart cities. However, ensuring the reliability and accuracy of LCS measurements requires a crucial calibration step. This project aims at investigating the suitability of different calibration models to enhance LCS predictions in case of pollutant interferences.
Several sensors and reference machines has been settled on various industrial sites to evaluate the calibration performances.
The outdoor evaluation assessed the very performance and effectiveness of various calibration models, such as linear regression, support vector regression or convolutional neural networks to identify better-suited models for multivariate regression tasks in air quality monitoring [2]. Moreover, we investigated the impact of including additional predictive variables, such as meteorological data, on the calibration process [3]. The practical implications of this study extend to real-world air quality monitoring applications, where the improved calibration of low-cost sensors can contribute to better decision-making processes and more informed policies.

Many questions remain unsolved: how much the sensor calibration is reliable in the case of pollution peaks? Can we use peripheral (neighbor) LCSs to improve the precision of our LCS? How can we recontruct partial or uncertain information from these neighbor sensors? how to make sure that the parameters do not diverge from the actual conditions? Should we also use topographic data? The subject aims to answer to some questions through mathematical derivations or computer simulations with the available models.

Profile and skills sought
Ability to understand and develop adaptive learning algorithms and process data, index them and exploit them in an operational system to achieve the mission described above. Programming skills: Python or C/C++. Experience with Tensorflow or Pytorch would be a plus. French is not mandatory.

Professional qualities sought after autonomy, personal skills to interact with research and business teams, motivation for new technologies, creativity to implement an innovative solution.

Techniques used during the internship:

The first step will consist in (1) running experiments for evaluating air pollutant calibration models over different databases with incomplete data, (2) evaluating the drift of sensors and models over different periods, sensors, and databases, (3) proposing a method to include the processing of missing data.
This work could continue in thesis (1) by comparing the performances of the representation in the temporal, time-frequency, time-scale domains (2) by applying tensor decompositions on the feature layers (3) by studying the influence of the network structure of the underlying phenomenon on the signal representation. Industrial collaborations are possible.

Bibliography:

[1] Goodfellow, Y. Bengio, and A. Courville. 2016. Deep Learning. The MIT Press
[2] C. Malings, R. Tanzer, A. Hauryliuk, S. P. N. Kumar, N. Zimmerman, L. B. Kara, and A. A. Presto. 2019. Development of a general calibration model and longterm performance evaluation of low-cost sensors for air pollutant gas monitoring. Atmospheric Measurement Techniques 12, 2 (2019), 903–920. https://doi.org/10. 5194/amt-12-903-2019
[3] A. Souani, A. Hucher, V. Vigneron, and H. Maaref. 2023. A calibration methodology of low-cost air pollutant sensor calibration using neural networks. In 20th International Multi-Conference on Systems, Signals & Devices (SSD).
[4] N. Zimmerman, A. Presto, S. Kumar, J. Gu, A. Ha

Possibility of PhD : Yes

Remarks concerning the PhD position: Univ Evry, université Paris Saclay

Research field(s) of interest to the hosting team:

Medicine and health sciences, Veterinary science
Information and Data sciences, Maths

Language(s) spoken in the host laboratory: french/english