Selected Projects

March 2022 - April 2023, Remote
Mentored by Dr. Huazheng Wang and Dr. Mengdi Wang
Reproduced the code in paper Reinforcement Online Learning to Rank with Unbiased Reward Shaping. [code link]
Formulate the off-policy LTR with biased feedback under general click model as a Markov Decision Process, and bridge the area of off-policy learning to rank and offline reinforcement learning
Propose CUORL, a Click model-agnostic Unified Off-policy LTR method that could utilize any offline RL algorithm as a plug-in solver, and we instantiate it using CQL.
Conduct extensive empirical experiments to validate the effectiveness of our algorithm using real-world LTR datasets under different click models. [code link]

November 2021 - Januaray 2022, USTC
Mentored by Dr. Jie Wang
Trained an agent to learn the Atari game: pong with proximal policy optimization algorithm (PPO).
The result reached an average reward of twenty points after training on RTX 3060 GPU for 14310 epochs, where the maximum reward is twenty-one points.
Took advantage of Actor-Critic policy, clipping technique to reduce variance.

September 2021 - November 2021, USTC
Mentored by Dr. Jie Wang
Reproduced Deep Q-Network (DQN) and its variants (Double DQN, Duel DQN) using PyTorch to play Atari games.
Reduced correlation between input data by applying experience replay technique to the model.
Improved stability through employing fixed target technique to the model.

September 2021 - November 2021, USTC
Mentored by Dr. Lixiang Tan
On 8-core CPU with 8 threads, the acceleration ratio was stable at around 3 when the number of FFT points was large( $2^20$ or more).
Theoretically analyzed FFT algorithm and found relative independence of each butterfly operation in each step, which can be paralleled.
Added appropriate parallel compilation guidance using OpenMP to maximize the effectiveness of parallel.

April 2021 - November 2021, USTC
Mentored by Dr. Wei Lu
Reduced distortion error to around 0.5% with requirement of 3% and extended measurement band width to 1k-100k.
Applied window functions to reduce Spectrum Leakage. Considering both effectiveness and feasibility, I chose Hanning window finally.
Designed an algorithm to accurately detect the center spectrum by adding energy from nearby spectrum lines.
Developed an LCD to visualize relevant data and input analog signals.