Markov Decision Processes and Reinforcement Learning

We study both theory and application of Markov decision processes. The theoretical pursuit has focused on the development of successive approximation algorithms and the dimension reduction of the functional space. In parallel, we also study machine learning techniques to tackle large scale dynamic optimization problem. In particular, we are tackling problems involving multiple decision makers using the reinforcement learning: multi-agent reinforcement learning.


  1. New successive approximation algorithms for the Markov decision processes
  2. Sampling-based optimization for high-dimensional Markov decision processes
  3. Dimension reduction for Markov decision processes
  4. Reinforcement learning for multi-agent in collaborative and competitive environment
  5. Deep reinforcement learning for large scale dynamic optimization


  1. Dynamic Pricing of Multiple Product with Sales Milestones
  2. Infrastructure management with multiple HVAC systems
  3. Search and rescue optimization for a fleet of drones
  4. Bayesian control chart for multi-variate process
  5. Dynamic control of service systems



Optimal paths for two drones with search and rescue mission


The drones are stationed in the upper corner where they can be charged. The drones are to be returned before the depletion of the battery to the station, while monitoring the area for search and rescue in a coordinated fashion.

Reinforcement learning for building management

Three agents are controlling HVACs in a large retail store with an objective to minimize the total energy spending while maintaining a comfortable atmosphere in the building throughout the day.



Financial Engineering

We study financial engineering problems that involve stochastic asset dynamics. We have studied the pricing of non-trivial derivatives such as swing options, callable convertible bonds, developed statistical arbitrage algorithms such as optimal pairs trading, and risk-rationing models.


  1. Analysis of first passage times
  2. Lattice models for multiple non-stationary underlying assets
  3. Finite difference method and its variants


  1. Optimal thresholds for pairs trading
  2. Valuation of real options with long maturity
  3. Valuation of game options
  4. Optimal risk rationing models for large investment companies
  5. Valuation of options with optimally managed portfolio



Optimal Pairs Trading


To the left, price processes of two correlated assets (Coke and Pepsi) are shown. An OU process is defined from the two processes, and the first passage time leads to the calculation of optimal trading policy.