Home - Xie ZHANG

About me!

Xie ZHANG is currently a Ph.D. student in the CS department , at the University of Hong Kong (HKU). He is a member of HKU AIOT Lab  which  Prof. Chenshu Wu directs. Previously, he received his master's degree in the major of Pattern Recognition and Intelligence Systems from the School of Intelligent Systems Engineering,  Sun Yat-sen University. During his master's degree, he focused on Wi-Fi sensing, under the supervision of Prof. Wanquan Liu. Before that, he received his B.A. degree in the major of Software Engineering, at Sun Yat-sen University.

My research focuses on enabling intelligent agents to perceive the physical world through wireless signal learning. Specifically, I am dedicated to building wireless sensing systems with a physics-learning co-design for applications such as smart buildings, human-computer interaction, and non-intrusive healthcare. 
The ultimate goal is to develop efficient sensing systems that capture information about physical entities using ubiquitous wireless signals like light, thermal radiation, radio frequency waves, and sound.
Currently, I am working on the following topics:

  Thermo-assisted Sensing focuses on exploring and exploiting thermal radiation from surrounding objects to acquire information about their position, size, motion, etc., as a fully passive sensing technique.
  Interpretable Deep Wireless Sensing is aimed at developing mathematically interpretable deep learning models for wireless sensing, while also exploring methods for integrating physical principles.
  Multi-modality Sensing focuses on aligning and fusing information from different physical signals to obtain a comprehensive understanding of targets from various perspectives.

Publications

Preprints

Zhang, X., Li, C., & Wu, C. (2025). Tapor: 3D Hand Pose Reconstruction with Fully Passive Thermal Sensing for Around-device Interactions (Number arXiv:2501.17585). arXiv. https://doi.org/10.48550/arXiv.2501.17585

@misc{Zhang_Li_Wu_2025,
  title = {Tapor: 3D Hand Pose Reconstruction with Fully Passive Thermal Sensing for Around-device Interactions},
  url = {http://arxiv.org/abs/2501.17585},
  doi = {10.48550/arXiv.2501.17585},
  note = {arXiv:2501.17585 [cs]},
  number = {arXiv:2501.17585},
  publisher = {arXiv},
  author = {Zhang, Xie and Li, Chenxiao and Wu, Chenshu},
  year = {2025},
  month = jan,
  abbreviated = {Arxiv, under review}
}

This paper presents the design and implementation of Tapor, a privacy-preserving, non-contact, and fully passive sensing system for accurate and robust 3D hand pose reconstruction for around-device interaction using a single low-cost thermal array sensor. Thermal sensing using inexpensive and miniature thermal arrays emerges with an excellent utility-privacy balance, offering an imaging resolution significantly lower than cameras but far superior to RF signals like radar or WiFi. The design of Tapor, however, is challenging, mainly because the captured temperature maps are low-resolution and textureless. To overcome the challenges, we investigate the thermo-depth and thermo-pose properties and present a novel physics-inspired neural network design that learns effective 3D spatial representations of potential hand poses. We then formulate the 3D pose reconstruction problem as a distinct retrieval task, enabling precise determination of the hand pose corresponding to the input temperature map. To deploy Tapor on IoT devices, we introduce an effective heterogeneous knowledge distillation method that reduces the computation by 377x. We fully implement Tapor and conduct comprehensive experiments in various real-world scenarios. The results demonstrate the remarkable performance of Tapor, which is further illustrated by four case studies of gesture control and finger tracking. We envision Tapor to be a ubiquitous interface for around-device control and have released the dataset, software, firmware, and demo videos at https://github.com/IOT-Tapor/TAPOR.

Conference proceedings

Li, C., Zhang, X., & Wu, C. (2025, April). Facial Expression Recognition with DToF Sensing. ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

@inproceedings{Li_Zhang_Wu_Dtof,
  title = {Facial Expression Recognition with DToF Sensing},
  booktitle = {ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  author = {Li, Chengxiao and Zhang, Xie and Wu, Chenshu},
  year = {2025},
  month = apr,
  abbreviated = {ICASSP'25}
}

Facial Expression Recognition (FER) is crucial for understanding human emotions, with applications spanning from mental health assessment to marketing recommendation systems. However, existing camera-based methods raise privacy concerns, while RF-based approaches suffer from limited environmental generalizability and high cost. In this work, we propose ToFace, a FER system leveraging a low-cost (4.8$) Direct Time-of- Flight (DToF) sensor that has been available on commodity smartphones. This sensor provides an extremely low-resolution 8 × 8 depth map and a clear Field of View (FoV), significantly mitigating privacy concerns while avoiding the impact of ambient objects. Despite the benefits, the low-resolution depth map in- troduces significant challenges for precise expression recognition due to limited facial structure information. We first develop a physical model to extract additional spatial information from the intermediate sensor output, i.e., the transient histograms. We then propose a physics-integrated neural network to re- construct a facial structure map comprising both depth and orientation for accurate expression recognition. We conduct real- world experiments with 12 users and compare our model with several baselines. The results demonstrate that ToFace achieves the highest recognition accuracy of 75%.

Zhang, X., & Wu, C. (2024). TADAR: Thermal array-based detection and ranging for privacy-preserving human sensing. Proceedings of the 25th International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing (MOBIHOC ’24), 1–10. https://doi.org/https://doi.org/10.1145/3641512.3686357

@inproceedings{Zhang2024TADAR,
  address = {Athens, Greece},
  title = {{TADAR}: {Thermal} array-based detection and ranging for privacy-preserving human sensing},
  doi = {https://doi.org/10.1145/3641512.3686357},
  booktitle = {Proceedings of the 25th international symposium on theory, algorithmic foundations, and protocol design for mobile networks and mobile computing ({MOBIHOC} '24)},
  publisher = {ACM},
  author = {Zhang, Xie and Wu, Chenshu},
  year = {2024},
  pages = {1--10},
  code = {https://github.com/aiot-lab/TADAR},
  demo = {https://youtu.be/0hGqzSYlh4o},
  file = {https://arxiv.org/pdf/2409.17742},
  abbreviated = {MobiHoc'24}
}

Human sensing has gained increasing attention in various applications. Among the available technologies, visual images offer high accuracy, while sensing on the RF spectrum preserves privacy, creating a conflict between imaging resolution and privacy preservation. In this paper, we explore thermal array sensors as an emerging modality that strikes an excellent resolution-privacy balance for ubiquitous sensing. To this end, we present TADAR, the first multi-user Thermal Array-based Detection and Ranging system that estimates the inherently missing range information, extending thermal array outputs from 2D thermal pixels to 3D depths and empowering them as a promising modality for ubiquitous privacy-preserving human sensing. We prototype TADAR using a single commodity thermal array sensor and conduct extensive experiments in different indoor environments. Our results show that TADAR achieves a mean F1 score of 88.8% for multi-user detection and a mean accuracy of 32.0 cm for multi-user ranging, which further improves to 20.1 cm for targets located within 3 m. We conduct two case studies on fall detection and occupancy estimation to showcase the potential applications of TADAR. We hope TADAR will inspire the vast community to explore new directions of thermal array sensing, beyond wireless and acoustic sensing. TADAR is open-sourced on GitHub: https://github.com/aiot-lab/TADAR.

Yin, K., Tang, C., Zhang, X., & Yao, H. (2021). Robust Human Activity Recognition System with Wi-Fi Using Handcraft Feature. 2021 IEEE Symposium on Computers and Communications (ISCC), 1–8. https://doi.org/10.1109/ISCC53001.2021.9631459

@inproceedings{yinRobustHumanActivity2021,
  title = {Robust {{Human Activity Recognition System}} with {{Wi-Fi Using Handcraft Feature}}},
  booktitle = {2021 {{IEEE Symposium}} on {{Computers}} and {{Communications}} ({{ISCC}})},
  author = {Yin, Kang and Tang, Chengpei and Zhang, Xie and Yao, Hele},
  year = {2021},
  month = sep,
  pages = {1--8},
  issn = {2642-7389},
  doi = {10.1109/ISCC53001.2021.9631459},
  abbreviated = {ISCC'21}
}

WiFi-based Human activity recognition (HAR) system has the drawback of the new domain inadaptability. Numerous studies have proposed to solve this problem, but these methods have the limitations of needing the new domain data or fine-tuning the model. In this paper, we propose HARW, a cross-domain HAR system using Wi-Fi. Specifically, a novel domain-independent feature extraction algorithm is proposed based on the multiple signal classification algorithm, which extracts three physical factors (i.e. time of flight, change rate of path length, and angle of arrival) simultaneously to construct the TCA feature. Then, A two-stage model is proposed to recognize activities based on TCA. The experimental results show that HARW can increase the average accuracy rate by 9 % and the best accuracy can reach 60%, without new domain data and fine-tuning the model, outperforming the method that only uses CSI raw data. In addition. HARW adonts onlv a nair of Wi-Fi devices.

Zhang, X., Tang, C., An, Y., & Yin, K. (2021). WiFi-Based Multi-task Sensing. In T. Hara & H. Yamaguchi (Eds.), Mobile and Ubiquitous Systems: Computing, Networking and Services (pp. 169–189). Springer International Publishing. https://doi.org/10.1007/978-3-030-94822-1_10

@inproceedings{zhangWiFiBasedMultitaskSensing2022,
  title = {{{WiFi-Based Multi-task Sensing}}},
  booktitle = {Mobile and {{Ubiquitous Systems}}: {{Computing}}, {{Networking}} and {{Services}}},
  author = {Zhang, Xie and Tang, Chengpei and An, Yasong and Yin, Kang},
  editor = {Hara, Takahiro and Yamaguchi, Hirozumi},
  year = {2021},
  pages = {169--189},
  publisher = {Springer International Publishing},
  address = {Cham},
  doi = {10.1007/978-3-030-94822-1_10},
  isbn = {978-3-030-94822-1},
  langid = {english},
  file = {https://arxiv.org/pdf/2111.14619},
  code = {https://github.com/Zhang-xie/Wimuse},
  abbreviated = {MobiQuitous'21}
}

WiFi-based sensing has aroused immense attention over recent years. The rationale is that the signal fluctuations caused by humans carry the information of human behavior which can be extracted from the channel state information of WiFi. Still, the prior studies mainly focus on single-task sensing (STS), e.g., gesture recognition, indoor localization, user identification. Since the fluctuations caused by gestures are highly coupling with body features and the user’s location, we propose a WiFi-based multi-task sensing model (Wimuse) to perform gesture recognition, indoor localization, and user identification tasks simultaneously. However, these tasks have different difficulty levels (i.e., imbalance issue) and need task-specific information (i.e., discrepancy issue). To address these issues, the knowledge distillation technique and task-specific residual adaptor are adopted in Wimuse. We first train the STS model for each task. Then, for solving the imbalance issue, the extracted common feature in Wimuse is encouraged to get close to the counterpart features of the STS models. Further, for each task, a task-specific residual adaptor is applied to extract the task-specific compensation feature which is fused with the common feature to address the discrepancy issue. We conduct comprehensive experiments on three public datasets and evaluation suggests that Wimuse achieves state-of-the-art performance with the average accuracy of 85.20%, 98.39%, and 98.725% on the joint task of gesture recognition, indoor localization, and user identification, respectively.

Journal articles

Hu, P., Tang, C., Yin, K., & Zhang, X. (2021). WiGR: A Practical Wi-Fi-Based Gesture Recognition System with a Lightweight Few-Shot Network. Applied Sciences, 11(8), 3329. https://doi.org/10.3390/app11083329

@article{huWiGRPracticalWiFiBased2021,
  title = {{{WiGR}}: {{A Practical Wi-Fi-Based Gesture Recognition System}} with a {{Lightweight Few-Shot Network}}},
  shorttitle = {{{WiGR}}},
  author = {Hu, Pengli and Tang, Chengpei and Yin, Kang and Zhang, Xie},
  year = {2021},
  month = jan,
  journal = {Applied Sciences},
  volume = {11},
  number = {8},
  pages = {3329},
  publisher = {Multidisciplinary Digital Publishing Institute},
  doi = {10.3390/app11083329},
  urldate = {2021-05-26},
  copyright = {http://creativecommons.org/licenses/by/3.0/},
  langid = {english}
}

Wi-Fi sensing technology based on deep learning has contributed many breakthroughs in gesture recognition tasks. However, most methods concentrate on single domain recognition with high computational complexity while rarely investigating cross-domain recognition with lightweight performance, which cannot meet the requirements of high recognition performance and low computational complexity in an actual gesture recognition system. Inspired by the few-shot learning methods, we propose WiGR, a Wi-Fi-based gesture recognition system. The key structure of WiGR is a lightweight few-shot learning network that introduces some lightweight blocks to achieve lower computational complexity. Moreover, the network can learn a transferable similarity evaluation ability from the training set and apply the learned knowledge to the new domain to address domain shift problems. In addition, we made a channel state information (CSI)-Domain Adaptation (CSIDA) data set that includes channel state information (CSI) traces with various domain factors (i.e., environment, users, and locations) and conducted extensive experiments on two data sets (CSIDA and SignFi). The evaluation results show that WiGR can reach 87.8–94.8% cross-domain accuracy, and the parameters and the calculations are reduced by more than 50%. Extensive experiments demonstrate that WiGR can achieve excellent recognition performance using only a few samples and is thus a lightweight and practical gesture recognition system compared with state-of-the-art methods.

Zhang, X., Tang, C., Yin, K., & Ni, Q. (2021). WiFi-based Cross-Domain Gesture Recognition via Modified Prototypical Networks. IEEE Internet of Things Journal, 1–1. https://doi.org/10.1109/JIOT.2021.3114309

@article{zhangWiFibasedCrossDomainGesture2021,
  title = {{{WiFi-based Cross-Domain Gesture Recognition}} via {{Modified Prototypical Networks}}},
  author = {Zhang, Xie and Tang, Chengpei and Yin, Kang and Ni, Qingqian},
  year = {2021},
  journal = {IEEE Internet of Things Journal},
  pages = {1--1},
  doi = {10.1109/JIOT.2021.3114309},
  file = {/papers/WiGr.pdf},
  code = {https://github.com/Zhang-xie/WiGr},
  abbreviated = {IoTJ'21}
}

Numerous deep learning studies have achieved remarkable advances in WiFi-based human gesture recognition (HGR) using channel state information (CSI). However, since the CSI patterns of the same gesture change across domains (i.e., users, environments, locations, and orientations), recognition accuracy might degrade significantly when applying the trained model to new domains. To overcome this problem, we propose a WiFi-based cross-domain gesture recognition system (WiGr) which has a domain-transferable mapping to construct an embedding space where the representations of samples from the same class are clustered, and those from different classes are separated. The key insight of WiGr is using the similarity between the query sample representation and the class prototypes in the embedding space to perform the gesture classification, which can avoid the influence of the cross-domain CSI patterns change. Meanwhile, we present a dual-path prototypical network (Dual-Path PN) which consists of a deep feature extractor and a dual-path (i.e., Path-A and Path-B substructures) recognizer. The trained feature extractor can extract the gesture-related domain-independent features from CSI, namely, the domain-transferable mapping. In addition, WiGr implements the cross-domain HGR based on only a pair of WiFi devices without retraining in the new domain. We conduct comprehensive experiments on three data sets, one is built by ourselves and the others are public data sets. The evaluation suggests that WiGr achieves 86.8%–92.7% in-domain recognition accuracy and 83.5%–93% cross-domain accuracy under the four-shot condition.