Introduction
Dear readers, before we venture any further into trading AI and technical indicators, it's crucial to understand that the information provided in this article is for educational and informational purposes only. We are not offering any financial or stock recommendations, nor are we providing any investment advice.
These days, the finance and trading world is experiencing a big boost because of Artificial Intelligence. A lot of people are excited about using AI programs to help them trade and understand how money markets work. The project at hand focuses on creating an AI system that can be used specifically for trading. It's all about making trading a bit smarter and easier for everyone involved.
For this project I named Trado AI, we are utilizing a technique known as Deep Reinforcement Learning. If you're familiar with machine learning, you may recognize this term. It is a field in Artificial Intelligence dedicated to training Neural Network Agents to complete tasks through trial and error in a designated environment. A noteworthy example of this is AlphaGo, which was created by Google DeepMind. If you haven't heard of it, it's definitely worth taking a look at. AlphaGo is an AI trained using Deep Reinforcement Learning to play the complex game of Go. AlphaGo is exposed to a Go environment where it can take some specific actions and based on the good actions it took it gets nice rewards. After millions of epochs, the AI learned to play the game from scratch above the super-human level. The idea of Reinforcement Learning is similar to the example of AlphaGo training but differs in the environment and strategies used.
Our environment is the stock market and our strategy is to train a Deep RL Agent that can decide when to buy, sell, and hold a stock. So it's too much of an introduction, let's get started!
Things Needed
Here are the things we needed for this project,
- Google Colab: I'm using Google Colab for the whole project. The great advantage of using Google Colab is that you don't need to install the whole libraries for projects since some of them are already installed. You can also use your own local system or other software like SageMaker Studio Lab.
- Dataset: For creating the environment, we need a dataset. I'm using the Reliance Dataset available here. This dataset consists of the price of the stock in a one-minute time frame.
- Gym: Gym is an open-source environment library created by OpenAI. It has a collection of environments that can be used for RL tasks. It also supports custom environments.
- Stable Baselines3 and sb3_contrib: An open-source Reinforcement Learning library that provides different types of Reinforcement Learning Algorithms. Sb3_contrib is a library that comes under stable baselines3 used for Experimental reinforcement learning.
- Numpy: None of us can imagine a Machine Learning project without Numpy. An all-in-one mathematical library for scientific computing.
- Pandas: Pandas is a library used for managing our CSV datasets
- Sklearn: For this project, sklearn is used for scaling our dataset.
The Approach
- Preprocess the dataset: First, we'll preprocess the dataset by removing unwanted columns, adding Technical Indicators, scaling, etc
- Custom Gym Environment: Using the preprocessed dataset we create a custom trading environment using Gym which maps some of the features of the real stock market.
- Testing Environment: We'll test if the environment is performing correctly or not.
- Train Agent: Then we train an agent using PPO (Proximal Policy Optimization). PPO is one of the state-of-the-art RL algorithms which can be used for a ton of RL tasks. For this project, we are using a variant of PPO called the RecurrentPPO which uses an LSTM policy network. This helps us to learn the time series nature of stock data.
- Testing the Agent: Finally, we test the agent to see if it learned the policy for making good trades.
Data Preprocessing
from google.colab import drivedrive.mount('/content/drive')
!pip install sb3_contrib shimmy gym stable_baselines3
import numpy as npimport pandas as pdimport gymfrom gym import spacesfrom stable_baselines3.common.evaluation import evaluate_policyfrom stable_baselines3.common.vec_env import DummyVecEnvfrom sklearn.preprocessing import MinMaxScalerfrom sb3_contrib import RecurrentPPO
dataset = pd.read_csv('/content/drive/MyDrive/Trado AI/Datasets/Reliance_dataset.csv')
dataset.reset_index(drop=True, inplace=True)keep_mask = (dataset.index + 1) % 5 == 0 # Iterating every 5 step aheaddataset = dataset[keep_mask]
Calculating Technical Indicators
Calculating MACD (Moving Average Convergence Divergence)
# Define a function to calculate MACD and its signal linedef calculate_macd(dataset, short_window, long_window, signal_window):# Calculate Exponential Moving Average (EMA) for short and long windowsdataset['EMA_short'] = dataset['Close'].ewm(span=short_window, adjust=False).mean() # Calculate short EMAdataset['EMA_long'] = dataset['Close'].ewm(span=long_window, adjust=False).mean() # Calculate long EMA# Calculate MACD by subtracting long EMA from short EMAdataset['MACD'] = dataset['EMA_short'] - dataset['EMA_long'] # Calculate MACD line# Calculate the signal line (another EMA) for the MACDdataset['Signal'] = dataset['MACD'].ewm(span=signal_window, adjust=False).mean() # Calculate signal line# Remove the intermediate EMA values, keeping only MACD and Signal columnsdataset.drop(['EMA_short', 'EMA_long'], axis=1, inplace=True) # Remove temporary EMA columns
Calculating RSI (Relative Strength Index)
# Define a function to calculate RSI (Relative Strength Index)def calculate_rsi(dataset, window):# Calculate the differences between consecutive closing pricesdiff = dataset['Close'].diff() # Calculate price differences# Separate gains (positive differences) and losses (negative differences)gain = diff.where(diff > 0, 0) # Get positive differences as gainsloss = -diff.where(diff < 0, 0) # Get negative differences as losses# Calculate average gains and losses using a rolling windowavg_gain = gain.rolling(window=window).mean() # Calculate average gainsavg_loss = loss.rolling(window=window).mean() # Calculate average losses# Calculate the Relative Strength (RS) by dividing average gains by average lossesrs = avg_gain / avg_loss # Calculate Relative Strength# Calculate RSI by applying the RSI formularsi = 100 - (100 / (1 + rs)) # Calculate RSI values# Add the calculated RSI values as a new column in the datasetdataset['RSI'] = rsi # Add RSI values to the dataset
dataset.rename(columns={"close":"Close", "open":"Open", "high":"High", "low":"Low"}, inplace=True)
Logarithmic Transformation of Dataset
cleaned_df = dataset.copy() # Copy the dataseteps=0.001# Apply Logarithmic Transformationcleaned_df['Open'] = np.log(cleaned_df.pop('Open')+eps)cleaned_df['High'] = np.log(cleaned_df.pop('High')+eps)cleaned_df['Low'] = np.log(cleaned_df.pop('Low')+eps)cleaned_df['Close'] = np.log(cleaned_df.pop('Close')+eps)
calculate_macd(cleaned_df, 12, 26, 9)calculate_rsi(cleaned_df, 14)cleaned_df['MACD_Signal_diff'] = cleaned_df['MACD'] - cleaned_df['Signal']
cleaned_df.dropna(inplace=True)
Custom Trading Environment Using Gym
- Available money for trade
- Maximum shares that can be bought with the money
- Transactional cost
- Profit/loss
- Based on these factors, let's design a custom trading environment,
import gymimport numpy as npINITIAL_BALANCE = np.log(5000 + eps) # Converting the log range.class TradingEnv(gym.Env):def __init__(self, df, initial_balance=INITIAL_BALANCE, max_shares=1, transaction_cost_percentage=0.00000000000000001):super(TradingEnv, self).__init__()self.df = df # Datasetself.current_step = 0 # Variable for iterating through each stepself.max_steps = len(df) - 1 # Total number of steps / lenght of the datasetself.initial_balance = initial_balance # Balance for the first timeself.max_shares = max_shares # Number of shares can be bought with the priceself.transaction_cost_percentage = transaction_cost_percentage # Cost for each transactionself.action_space = gym.spaces.Discrete(3) # Actions that the agent can take. 3 actions: Buy (0), Sell (1), Hold (2)# Observation space where the model can observe the dataself.observation_space = gym.spaces.Box(low=-np.inf, high=np.inf, shape=(11,), dtype=np.float32)self.balance = self.initial_balance # Changing balanceself.bought = Falseself.sold = Falseself.hold = Falseself.shares_held = 0self.stock_price = 0self.total_profit = 0def reset(self):"""Reset the environment each time the agent completes iteration through the dataset"""self.current_step = 0self.balance = self.initial_balanceself.shares_held = 0self.total_profit = 0self.stock_price = self.df['Close'].iloc[self.current_step]return self._get_observation()def step(self, action):"""Method for taking action"""self.current_step += 1 # Iterating through the dataset featuresself.stock_price = self.df['Close'].iloc[self.current_step]reward = 0.0if action == 0: # Buyself._buy_shares()elif action == 1: # Sellself._sell_shares()else:self.hold = Truereward = 0.001 # Giving a small positive reward got holding in essential casepass# Only calculate reward when the agent performs Buy and Sell actionif (self.bought or self.sold) and self.hold == False:reward = self._calculate_reward()done = self.current_step >= self.max_stepsobservation = self._get_observation()return observation, reward, done, {}def _get_observation(self):"""The whole observation space the agent can interact with.Includes prices and technical indicators"""observation = np.array([self.df['Open'].iloc[self.current_step],self.df['Low'].iloc[self.current_step],self.df['High'].iloc[self.current_step],self.df['Close'].iloc[self.current_step],self.df['MACD'].iloc[self.current_step],self.df['Signal'].iloc[self.current_step],self.df['MACD_Signal_diff'].iloc[self.current_step],self.df['RSI'].iloc[self.current_step],self.balance,self.shares_held,self.stock_price,], dtype=np.float32)return observationdef _buy_shares(self):"""Environment conditions for a buy action"""try:max_shares_affordable = int(self.balance / (self.stock_price * (1 + self.transaction_cost_percentage)))shares_to_buy = np.random.randint(1, max_shares_affordable + 1) # Choose a random number of shares to buytransaction_cost = shares_to_buy * self.stock_price * self.transaction_cost_percentageself.balance -= (shares_to_buy * self.stock_price) + transaction_costself.shares_held += shares_to_buyself.bought = Trueself.sold = Falseself.hold = Falseexcept:self.bought = Falsepassdef _sell_shares(self):"""Environment conditions for a sell action"""if self.shares_held > 0:shares_to_sell = np.random.randint(1, self.shares_held + 1) # Choose a random number of shares to sellself.balance += (shares_to_sell * self.stock_price) * (1 - self.transaction_cost_percentage)self.shares_held -= shares_to_sellself.sold = Trueself.bought = Falseself.hold = Falseelse:self.sold = Falsepassdef _calculate_reward(self):"""Reward calculation based on the agent's action"""current_balance = self.balance + (self.shares_held * self.stock_price)profit = current_balance - self.initial_balance # Profit calculation based on the current balance and initial balancereward = (profit - self.total_profit) * 200 # Reward scaled in terms or 200self.total_profit = profit # updating profitreturn rewarddef render(self):"""Information rendering to the screen"""print("Stock Price:", np.exp(self.stock_price)-eps)print("Account Balance:", np.exp(self.balance)-eps)print("Number of Shares:", self.shares_held)print("Profit:", np.exp(self.total_profit)-eps)print("--------------------------")
- Initially, the agent takes a random action (0 for buy, 1 for sell, 2 for hold) since it doesn't know what to do. Based on the action taken, the agent is gifted with a reward that can be positive or negative which is determined by the change in the profit. If the profit is negative, ie, loss, the agent gets a negative reward, if the profit is positive, the agent gets a positive reward.
- The reward is only calculated based on the sell and buy actions.
- A small reward is given for holding in necessary conditions.
- During each action taken by the agent, the account balance, shares, and holdings are changed except for the hold trade.
- In each iteration, the agent receives the 5-minute prices from the dataset by interacting with the environment.
env = TradingEnv(cleaned_df) # Initializing the environmentobservation = env.reset() # Reset# Testing with some actions# Reset the environmentdone = Falsedef test_env(action_):action = action_ # Buy actionnext_observation, reward, done, info = env.step(action)print("Reward:",reward)print("Action:",action)env.render()observation = next_observationtest_env(0)test_env(1).....---------Reward: 0.0Action: 0Stock Price: 538.5999999999999Account Balance: 9.282311765109982Number of Shares: 1Profit: 0.999--------------------------Reward: -0.16723934541928998Action: 1Stock Price: 537.7Account Balance: 4991.645019411402Number of Shares: 0Profit: 0.9973290042164796--------------------------
Training the RecurrentPPO Agent
model = RecurrentPPO('MlpLstmPolicy', env, verbose=1, ent_coef=0.70, device='cuda')model.learn(1000000)
------------------------------------------| rollout/ | || ep_len_mean | 7.41e+04 || ep_rew_mean | 49 || time/ | || fps | 271 || iterations | 7813 || time_elapsed | 3676 || total_timesteps | 1000064 || train/ | || approx_kl | 9.518117e-06 || clip_fraction | 0 || clip_range | 0.2 || entropy_loss | -1.1 || explained_variance | -0.2 || learning_rate | 0.0003 || loss | -0.624 || n_updates | 78120 || policy_gradient_loss | 4.98e-06 || value_loss | 0.0854 |------------------------------------------
Testing the Agent
env = TradingEnv(cleaned_df)observation = env.reset()observation_arr = []action_arr = []for i in range(50): # 50 * 5 minutesaction, _ = model.predict(observation)obs, rewards, done, _ = env.step(action)observation_arr.append(obs)action_arr.append(action)print({"Rewards": rewards, "action": action})
Stock Price: 320.8499999999999Account Balance: 5380.210045281541Number of Shares: 0Profit: 1.0750419938479097--------------------------
# Combine observations into a single arraycombined_observations = np.array(observation_arr)# Create figureplt.figure(figsize=(12, 6))# Plot price movements (observation index 0)plt.plot(combined_observations[:, 0], linewidth=2) #label='Price Movements', linewidth=2)# Plot action dotsfor t, action in enumerate(action_arr):if action == 0: # Buy: Colored: Greenplt.scatter(t, combined_observations[t, 0], color='green', s=100)elif action == 1: # Sell: Colored: Redplt.scatter(t, combined_observations[t, 0], color='red', s=100)elif action == 2: # Hold: Colored: Blueplt.scatter(t, combined_observations[t, 0], color='blue', s=100)plt.xlabel('Time step')plt.ylabel('Price')plt.title('Price Movements with Actions')plt.legend()plt.grid()plt.show()
model.save('/content/drive/MyDrive/Trado AI/Models/TradoV3')