Let us worry about your assignment instead!

We Helped With This R Programming Homework: Have A Similar One?

SOLVED
CategoryProgramming
SubjectR | R Studio
DifficultyCollege
StatusSolved
More InfoAnswers To Statistics Homework
415011

Assignment Description

 

 There are 20 states: s0 to s19. There are only three actions that an agent can take in any one state: the agent can move up, down, or left (but cannot move right). Rewards associated with entering all states are 0 (zero) except for state s0, the goal, which has a reward of +10 and s1, a punishing end state, which has a negative reward of -10. s3 is the Start State. s0 is the Goal State and s1 is an End State. Both states s0 and s1 are also “absorbing states” which means that an agent can move into either of those two states, but then the game ends, meaning that the agent cannot exit from, or move out of, either of these two states.

If an agent is in any one of the 20 states (except for states s0 and s1), that agent can attempt to take any one of three actions: move up, move down, or move left (but the agent cannot move right). However, the “physics” of this “Race to Goal” ‘world’ are such that it is impossible to pass through the four outer wall boundaries or perimeters of this world. For example, if the agent is in state s8 and attempts to move left, the agent simply bounces off the perimeter wall and remains in state s8. Or if the agent is in state s18 and attempts to move up, the agent remains in state s18.

Furthermore, the heavy red border walls on portions of the interior of this ‘world’ are also impenetrable. So, for example, if the agent is in state s14 and attempts to move left, the agent bounces off the heavy red border boundary and simply remains in state s14.

You have just been hired by Alphabet Inc. the parent company of Google, as a Reinforcement Learning Data Scientist for $205,000 annual salary. On your first day of work your supervisor hands you this assignment with the following questions to answer:

1) Using the ReinforcementLearning package in R, and modifying the attached RR.R file that we demonstrated in video lectures 17 & 18 of the Udemy course Reinforcement Learning with R, write R code to create a new Race-to-Goal environment which models the complete “physics” of our modified environment, including all states, actions, and rewards. You will also need to write R code to: define the state and action sets; to run the sampleExperience() function to generate simulated data; and to run the ReinforcementLearning() function to output your modified RACE.model. Also see the attached Race_to_Goal_template.R file to help you get started.

Several tips: Use my attached Race_to_Goal_template.R file to get you started. MAKE SURE to use the same set.seed(1234) command that you see on line 44 on that file each time just before you run the sampleEnvironment() function so that you get the same outputted data! Use the same reinforcement learning parameter values in the control structure that you see “hard-wired” into lines 56-60 of the .R file.

As a check, note that my solution file used exactly 38 separate lines of code to define all possible state-action-new-state triples inside the modified RACE.env() function. You do not need to have more lines, and you cannot have fewer lines and adequately define all of the possible state-action-new-state triples, in my opinion.

 

 

2) Run all of the code using exactly N=1000 resamples in the sampleExperience() function. Generate: (1) the optimal policy (the “best” complete set of state-action pairs given that you are in any particular state on the grid and must take an action to leave that state (exclude absorbing states s0 and s1 from the optimal policy); and (2) the optimal “plan” (sequence of specific states to move through beginning in state s3 Start State to reach state s0 Goal State. MAKE SURE THAT YOU RUN set.seed(1234) each time just before you run the sampleExperience() function to collect your data.

 

What is the optimal policy? Your answer would include every state-action pair in the policy except for states s0 and s1. Note that in my solution there are no impossible state-action pairs such as moving right when the agent is in state s11. There are also no counter-intuitive state-action pairs such as moving up from state s8.

What is the optimal plan? This would include the entire sequence of states traversed in moving from Start State s3 until you are in s0 Goal State.

 

3) Run the code again this time using exactly N=5000 resamples in the sampleExperience() function. Make sure that you run set.seed(1234) just before you run the sampleExperience() function.

Do any of the state-action pairs from the original optimal policy change compared to N=1000? Look at each one carefully. If so, which one(s)?

Look at the values in the outputted state-action-function-Q-matrix. Are the numbers’ magnitudes changed from when N=1000? If so, why do you think this is so?

4) Run the code again this using exactly N=10000 resamples in the sampleExperience() function. Make sure that you run set.seed(1234) just before you run the sampleExperience() function.

Do any of the state-action pairs from the original optimal policy change compared to N=1000 and also compared to the optimal policy when N=5000? Look at each one carefully. If so, which one(s)?

Look at the values in the outputted state-action-function-Q-matrix. Are the numbers’ magnitudes changed from when N=1000 and to N=5000? If so, why do you think this is so?

Assignment Code




# Load the package
library(ReinforcementLearning)

# Here we define our "Race-to-Goal" environment
RACE.env <- function(state, action) {
  next_state <- state
  ## define all possible state-action-next state triples
  if(state == state("s0") && action == "right") next_state <- state("s1")
  if(state == state("s0") && action == "up") next_state <- state("s4")
  ## Note: no need to define being in s0 and choosing action
  ## to move left as the next_state would still be s0.
  ## You only need to define possible movements to a new state.
  if(state == state("s1") && action == "right") next_state <- state("s2")
  if(state == state("s1") && action == "left") next_state <- state("s0")
  if(state == state("s1") && action == "up") next_state <- state("s5")
  if(state == state("s2") && action == "right") next_state <- state("s3")
  if(state == state("s2") && action == "left") next_state <- state("s1")
  if(state == state("s2") && action == "up") next_state <- state("s6")
  if(state == state("s3") && action == "left") next_state <- state("s2")
  if(state == state("s3") && action == "up") next_state <- state("s7")
  if(state == state("s4") && action == "down") next_state <- state("s0")
  if(state == state("s4") && action == "right") next_state <- state("s5")
  if(state == state("s5") && action == "down") next_state <- state("s1")
  if(state == state("s5") && action == "left") next_state <- state("s4")
  if(state == state("s5") && action == "right") next_state <- state("s6")
  if(state == state("s6") && action == "left") next_state <- state("s5")
  if(state == state("s6") && action == "right") next_state <- state("s7")
  if(state == state("s6") && action == "down") next_state <- state("s2")
  if(state == state("s7") && action == "up") next_state <- state("s11")
  if(state == state("s7") && action == "left") next_state <- state("s6")
  if(state == state("s7") && action == "down") next_state <- state("s3")
  if(state == state("s8") && action == "up") next_state <- state("s12")
  if(state == state("s8") && action == "right") next_state <- state("s9")
  if(state == state("s9") && action == "left") next_state <- state("s8")
  if(state == state("s9") && action == "right") next_state <- state("s10")
  if(state == state("s9") && action == "up") next_state <- state("s13")
  ## There is no need to define action movements out of state s10
  ## or out of state s12 as those are end (or absorbing) states.
  if(state == state("s11") && action == "up") next_state <- state("s15")
  ## But you do need to define actions into states s10 or s12:
  if(state == state("s11") && action == "left") next_state <- state("s10")
  if(state == state("s11") && action == "down") next_state <- state("s7")
  if(state == state("s13") && action == "right") next_state <- state("s14")
  if(state == state("s13") && action == "down") next_state <- state("s9")
  if(state == state("s13") && action == "left") next_state <- state("s12")
  if(state == state("s14") && action == "right") next_state <- state("s15")
  if(state == state("s14") && action == "left") next_state <- state("s13")
  if(state == state("s14") && action == "down") next_state <- state("s10")
  if(state == state("s15") && action == "up") next_state <- state("s19")
  if(state == state("s15") && action == "down") next_state <- state("s11")
  if(state == state("s15") && action == "left") next_state <- state("s14")
  if(state == state("s16") && action == "right") next_state <- state("s17")
  if(state == state("s17") && action == "left") next_state <- state("s16")
  if(state == state("s17") && action == "right") next_state <- state("s18")
  if(state == state("s18") && action == "left") next_state <- state("s17")
  if(state == state("s18") && action == "right") next_state <- state("s19")
  if(state == state("s19") && action == "down") next_state <- state("s15")
  if(state == state("s19") && action == "left") next_state <- state("s18")
  
  ## define rewards in each state
  ## make them all 0 initially:
  reward <- 0
  ## Then define the exceptions: Entering
  ## Goal state s12 has reward of +10;
  ## Are only two ways to enter s12
  ## from s13 or from s8:
  if (next_state == state("s12") && (state == state("s13"))) reward <- 10
  if (next_state == state("s12") && (state == state("s8"))) reward <- 10
  ## Negative Reward End state s10 has reward of -10.
  ## Can enter state s10 from s9, from s11, and from s14
  if (next_state == state("s10") && (state == state("s9"))) reward <- -10
  if (next_state == state("s10") && (state == state("s11"))) reward <- -10
  if (next_state == state("s10") && (state == state("s14"))) reward <- -10
  
  ## Function returns a list of next_state and reward
  out <- list("NextState" = next_state, "Reward" = reward)
  return(out)
}

# Define state and action sets
states <- c("s0", "s1", "s2", "s3", "s4",
            "s4", "s5", "s6", "s7", "s8",
            "s9", "s10", "s11", "s12", "s13",
            "s14", "s15", "s16", "s17", "s18", "s19")
states # twenty states
actions <- c("up", "down", "left", "right")
actions # four actions

# Sample N = 5000 random sequences from the
# Race-to-Goal environment function above.
# Data format must be (s,a,r,s_new) tuples,
# each as rows in a dataframe structure.

# Set seed for replicability
set.seed(1234)
# ?sampleExperience
data <- sampleExperience(N = 5000, 
                         env = RACE.env, 
                         states = states, 
                         actions = actions)

# Show first 250 records of data
data

## Performing Reinforcement Learning

# Define reinforcement learning parameters
control <- list(alpha = 0.1, # low learning rate
                gamma = 0.5, # middle discount factor
                # epsilon only relevant when sampling
                # new experience based on known rewards
                epsilon = 0.1) # low exploration factor
control

# Perform reinforcement learning
# ?ReinforcementLearning
RACE.model <- ReinforcementLearning(data, 
                                    s = "State", 
                                    a = "Action", 
                                    r = "Reward", 
                                    s_new = "NextState",
                                    actionSelection = "random",
                                    control = control)

# Print result
print(RACE.model)

#--------------------------------
# We had already run an intial data set
# and found a policy

# Now we fine-tune" the existing policy
# with a new data set and we deliberately
# choose "epsilon-greedy" action selection
# Define reinforcement learning parameters
control <- list(alpha = 0.1, # low learning rate
                gamma = 0.5, # middle discount factor
                # epsilon only relevant when sampling
                # new experience based on existing policy
                epsilon = 0.1) # low exploration factor
control

# Set seed for replicability
set.seed(123)

# Sample N = 5000 sequences from the environment 
# using epsilon-greedy action selection
data_new <- sampleExperience(N = 5000, 
                             # use same environment
                             env = RACE.env, 
                             states = states, 
                             actions = actions, 
                             # note we are using the
                             # existing model from before
                             model = RACE.model, 
                             actionSelection = "epsilon-greedy", 
                             control = control)

# view first 250 records
data_new

# Update the existing policy using new training data
model_new <- ReinforcementLearning(data_new, 
                                   s = "State", 
                                   a = "Action", 
                                   r = "Reward", 
                                   s_new = "NextState", 
                                   control = control,
                                   model = RACE.model)

# Print result
print(model_new)
## State-Action function Q

## 
## Reward (last iteration)
## [1] 4410

summary(model_new)
## Model details
## Learning rule:           experienceReplay
## Learning iterations:     2
## Number of states:        20
## Number of actions:       4
## Total Reward:            4410
## 
## Reward details (per iteration)
## Min:                     -470
## Max:                     4410
## Average:                 1970
## Median:                  1970
## Standard deviation:      3450.681

# Plot reinforcement learning curve
plot(model_new)


Assignment Code



# Load the package
library(ReinforcementLearning)

# Define your modified  "Race-to-Goal" environment
RACE.env <- function(state, action) {
  next_state <- state
  ## define all possible state-action-next state triples
  
  # <Your modified code goes here>
  
  ## define rewards in each state
  ## make them all 0 initially:
  reward <- 0
  ## Then define the exceptions: Entering
  ## Goal state s0 has reward of +10:
  
  # <Your modified code goes here>
  
  ## Negative Reward End State s1 has reward of -10.
  
  # <Your modified code goes here>  
  
  ## Function returns a list of next_state and reward
  out <- list("NextState" = next_state, "Reward" = reward)
  return(out)
}

# Define state and action sets

# <Your modified code for 20 states goes here>  

# <Your modified code for the 3 actions goes here>  

# Sample N = 1000 (originally) random sequences from the
# Race-to-Goal environment function above.
# Data format must be (s,a,r,s_new) tuples,
# each as rows in a dataframe structure.

# Set seed for replicability
set.seed(1234)
data <- # <Your code for goes here>  
  # Use the sampleExperience() function
  
  # Show first 250 records of data
  data

## Perform Reinforcement Learning

# Define reinforcement learning parameters
# Use same parameters unchanged
control <- list(alpha = 0.1, # low learning rate
                gamma = 0.5, # middle discount factor
                # epsilon only relevant when sampling
                # new experience based on known rewards
                epsilon = 0.1) # low exploration factor

# Perform reinforcement learning
RACE.model <- # <Your code for goes here>  
  # Use the ReinforcementLearning() function
  
  # Print result
  print(RACE.model)

Frequently Asked Questions

Is it free to get my assignment evaluated?

Yes. No hidden fees. You pay for the solution only, and all the explanations about how to run it are included in the price. It takes up to 24 hours to get a quote from an expert. In some cases, we can help you faster if an expert is available, but you should always order in advance to avoid the risks. You can place a new order here.

How much does it cost?

The cost depends on many factors: how far away the deadline is, how hard/big the task is, if it is code only or a report, etc. We try to give rough estimates here, but it is just for orientation (in USD):

Regular homework$20 - $150
Advanced homework$100 - $300
Group project or a report$200 - $500
Mid-term or final project$200 - $800
Live exam help$100 - $300
Full thesis$1000 - $3000

How do I pay?

Credit card or PayPal. You don't need to create/have a Payal account in order to pay by a credit card. Paypal offers you "buyer's protection" in case of any issues.

Why do I need to pay in advance?

We have no way to request money after we send you the solution. PayPal works as a middleman, which protects you in case of any disputes, so you should feel safe paying using PayPal.

Do you do essays?

No, unless it is a data analysis essay or report. This is because essays are very personal and it is easy to see when they are written by another person. This is not the case with math and programming.

Why there are no discounts?

It is because we don't want to lie - in such services no discount can be set in advance because we set the price knowing that there is a discount. For example, if we wanted to ask for $100, we could tell that the price is $200 and because you are special, we can do a 50% discount. It is the way all scam websites operate. We set honest prices instead, so there is no need for fake discounts.

Do you do live tutoring?

No, it is simply not how we operate. How often do you meet a great programmer who is also a great speaker? Rarely. It is why we encourage our experts to write down explanations instead of having a live call. It is often enough to get you started - analyzing and running the solutions is a big part of learning.

What happens if I am not satisfied with the solution?

Another expert will review the task, and if your claim is reasonable - we refund the payment and often block the freelancer from our platform. Because we are so harsh with our experts - the ones working with us are very trustworthy to deliver high-quality assignment solutions on time.

Customer Feedback

"Thanks for explanations after the assignment was already completed... Emily is such a nice tutor! "

Order #13073

Find Us On

soc fb soc insta


Paypal supported