Reinforcement learning is a very good way of machine learning , Not only AlphaGo Used , Many quantitative trading systems also use this learning method .

Agents in reinforcement learning agent It is a comprehensive state state, get some action action, reward reward Around the environment env A system of , Determine the impact of fundamental actions on the environment step The environment of env Yes action reward reward Or in each round episode To determine reward To calculate agent A process of action function .

In the whole process of reinforcement learning , Lots of great routines , in especial openAI Of GYM Is to strengthen the learning environment to do a normative agreement , It greatly reduces the difficulty and complexity of intensive learning , It is also highlighted , Reinforcement learning focuses on different algorithms , Each different algorithm is necessary for learning .
But several problems are very serious :
1, Every intelligent learning system is an environment, which is very important and special , This particularity is the need to take deep learning as a key factor to be solved and overcome in the use of the system
2,GYM Of env The definition has a very important assumption , It is often ignored in the process of use .

It is assumed that the environment is a dependent agent agent And changed , In other words, the environment is affected by agent One of the conditions of control , adopt action change state, therefore GYM Medium step It's input action output state Of .

But in the real world " people " There are only a few things that can change the environment , How to adapt to the environment is the big problem , That is, how to make the most favorable action in the big environment action, This behavior has no effect on the whole environment ( I don't think so ) Affected , therefore step Not to action
to state It is state to
action. How much impact does an automatic trading system have on the market environment ? If used step(action) To generate state Is it a bit self deceptive ? Primitive inaccuracy is fatal .

3,short and long problem , A trading environment , Whether we want to be supervised or unsupervised , Of course, no one chooses unsupervised in the face of interests , Because nobody wants to lose money , So let's be clear , Everyone has supervision , You need to make money .
The question is what is the standard ?Short and Long In the financial sector, it's a basic problem , The first thing to do is to determine these two criteria , But set that standard , Strictly follow this standard , Will it be profitable ?

When I was working on an earlier version of the trading system , It's used Short and Long Core standards , So when I do a new version of the system , I took this question to a senior consultant “ Traders ”. He was surprised to forget me at the beginning , Maybe it's too much low I don't want to answer , But under my urging , He gave a definite answer :“ not always !” He asked me “ Why? ” I was lost in thought , Then he took a pen on the paper and began to explain . In more than ten minutes of his explanation , Even though I look very serious , But my mind is in the air , Because I already know what he's going to say .“ What about the end ?”“ Profitability depends on the decisiveness of traders , Perseverance and luck ”… Okay , Your high salary is justified , But I really can't teach computers “ decisive ”,“ Resolute ” and “ luck ” ah ! So every time I see it, I use it Short and Long Do standard Agent I want to see how to teach computer decisiveness ”,“ Resolute ” and “ luck .

Every time I sit down and meditate , I always feel that I have come back to the origin again , How to choose again ? In fact, it's not complicated to make a set of software , The hard part is whether you're on the right path ? How complex can the code be ? Deep learning and intensive learning are some difficulties , How hard is that ? The difficulty is not in these theoretical and fundamental things , It's about whether you're right ?

