About Me

Hello and welcome to my homepage!

I was born in 2002 and am one of the many newcomers in the field of Artificial Intelligence. My primary research interests lie in Computer Vision, but I am also passionate about integrating multimodal learning to tackle real-world problems.

I aim to simplify complex challenges by optimizing how problems are defined, and I’m committed to deepening my theoretical understanding of deep learning to gain broader perspectives.

I’m always open to collaboration in any form — feel free to reach out to me via email at 2021252439@qq.com

Educations

2020.09 - 2024.06, Bachelor of Science. in Data Science and Big Data Technology, Beijing Institute of Technology, Zhuhai, China.

Skills

🔹 Programming Languages: Experienced in Python development; have a basic understanding of Java, C++, and R.

🔹 Development Environment: Experienced in Linux development; comfortable with terminal-based workflows and scripting.

🔹 Deep Learning Frameworks: Familiar with mainstream frameworks such as PyTorch; well-versed in popular algorithms in Computer Vision (CV) and Natural Language Processing (NLP).

🔹 Model Lifecycle: Skilled in the full pipeline of model development, including design, training, lightweight optimization (e.g., quantization, pruning), and deployment.

🔹 Machine Learning Methods: Solid understanding of machine learning, deep learning, and reinforcement learning techniques.

🔹 Data Handling: Capable of building web crawlers, performing data cleaning, analysis, and visualization.

Projects

Det-SAM2:Technical Report on the Self-Prompting Segmentation Framework Based on Segment Anything Model 2

Zhiting Wang, Qiangong Zhou, Zongyang Liu

Tech Report Project

We developed a real-time video stream inference pipeline based on SAM2, which integrates an object detection model to automatically provide conditional prompts for long video segmentation. This is the first fully functional self-prompted SAM2 framework for long videos. It supports advanced features such as online addition of new categories during inference and preloading a memory bank. Furthermore, the pipeline maintains constant GPU and CPU memory usage throughout, enabling efficient inference on arbitrarily long videos.

Key words: Video Segmentation, SAM2

HiLight: Technical Report on the Motern AI Video Language Model

Zhiting Wang, Qiangong Zhou, Kangjie Yang, Zongyang Liu, Xin Mao

Tech Report Project

We first design a VideoEncoder with finer-grained modality alignment by introducing a contrastive loss between patches and tokens based on CLIP-ViP. Then, we build a dual-tower vision encoder composed of the improved CLIP-ViP and Long-CLIP, which extracts both video and image features. The fused visual features are fed into the Gemma-2B language model, enabling video-based dialogue capability.

Key words: Video Language Model, CLIP-ViP, Long-CLIP, Gemma-2B

FreeV:Free Lunch in MultiModal Diffusion U-ViT

Qiangong Zhou, Youyu Zhou ,Yahong Wang

Paper Project

We propose FreeV, an adaptation of the FreeU strategy—originally designed for U-Net—into the Transformer-based U-ViT architecture. FreeV significantly improves generation quality without additional training or fine-tuning. The key insight is to balance the contributions from the backbone, skip connections, and fused features within U-ViT, maximizing its strengths while addressing its limitations in feature fusion.

Key words: U-ViT, Diffusion Model

Optimal Use of Attention Mechanisms: Comparative Study in U-Net for Image Segmentation Tasks

Qiangong Zhou, Guanzhang Su, Jiayi Chen, Yuangen Chen, Youyu Zhou

Paper

The study finds that while attention can enhance channel-wise feature weighting, it may negatively affect the backbone’s convolutional feature extraction. However, it effectively complements the encoder-decoder structure by improving high-level semantic representation.

Key words: Attention Mechanism, U-Net, Image Segmentation

Curriculum Vitae

You can download my chinese CV in PDF format from the following link: Download CV