Selected Projects

Offline Reinforcement Learning to Rank

  • March 2022 - April 2023, Remote

  • Mentored by Dr. Huazheng Wang and Dr. Mengdi Wang

  • Reproduced the code in paper Reinforcement Online Learning to Rank with Unbiased Reward Shaping. [code link]

  • Formulate the off-policy LTR with biased feedback under general click model as a Markov Decision Process, and bridge the area of off-policy learning to rank and offline reinforcement learning

  • Propose CUORL, a Click model-agnostic Unified Off-policy LTR method that could utilize any offline RL algorithm as a plug-in solver, and we instantiate it using CQL.

  • Conduct extensive empirical experiments to validate the effectiveness of our algorithm using real-world LTR datasets under different click models. [code link]

Playing Pong via Proximal Policy Optimization

  • November 2021 - Januaray 2022, USTC

  • Mentored by Dr. Jie Wang

  • Trained an agent to learn the Atari game: pong with proximal policy optimization algorithm (PPO).

  • The result reached an average reward of twenty points after training on RTX 3060 GPU for 14310 epochs, where the maximum reward is twenty-one points.

  • Took advantage of Actor-Critic policy, clipping technique to reduce variance.

Deep Q-Networks Reproduction

  • September 2021 - November 2021, USTC

  • Mentored by Dr. Jie Wang

  • Reproduced Deep Q-Network (DQN) and its variants (Double DQN, Duel DQN) using PyTorch to play Atari games.

  • Reduced correlation between input data by applying experience replay technique to the model.

  • Improved stability through employing fixed target technique to the model.

Implementing FFT Parallel Algorithms via Openmp

  • September 2021 - November 2021, USTC

  • Mentored by Dr. Lixiang Tan

  • On 8-core CPU with 8 threads, the acceleration ratio was stable at around 3 when the number of FFT points was large(2^20 or more).

  • Theoretically analyzed FFT algorithm and found relative independence of each butterfly operation in each step, which can be paralleled.

  • Added appropriate parallel compilation guidance using OpenMP to maximize the effectiveness of parallel.

Signal Distortion Measurement Device Design

  • April 2021 - November 2021, USTC
  • Mentored by Dr. Wei Lu
  • Reduced distortion error to around 0.5% with requirement of 3% and extended measurement band width to 1k-100k.

  • Applied window functions to reduce Spectrum Leakage. Considering both effectiveness and feasibility, I chose Hanning window finally.

  • Designed an algorithm to accurately detect the center spectrum by adding energy from nearby spectrum lines.

  • Developed an LCD to visualize relevant data and input analog signals.