STAT 19000: Project 8 — Fall 2021
Motivation: A key component to writing efficient code is writing functions. Functions allow us to repeat and reuse coding steps that we used previously, over and over again. If you find you are repeating code over and over, a function may be a good way to reduce lots of lines of code!
Context: We’ve been learning about and using functions all year! Now we are going to learn more about some of the terminology and components of a function, as you will certainly need to be able to write your own functions soon.
Scope: r, functions
Dataset(s)
The following questions will use the following dataset(s):
-
/depot/datamine/data/goodreads/csv/interactions_subset.csv
Questions
Question 1
Read the interactions_subset.csv
into a data.frame called interactions
. We have provided you with the function get_probability_of_review
below.
After reading in the data, run the code below, and add comments explaining what the function is doing at each step.
# A function that, given a string (userID) and a value (min_rating) returns a value (probability_of_reviewing).
get_probability_of_review <- function(interactions_dataset, userID, min_rating) {
# FILL IN EXPLANATION HERE
user_data <- subset(interactions_dataset, user_id == userID)
# FILL IN EXPLANATION HERE
read_user_data <- subset(user_data, is_read == 1)
# FILL IN EXPLANATION HERE
read_user_min_rating_data <- subset(read_user_data, rating >= min_rating)
# FILL IN EXPLANATION HERE
probability_of_reviewing <- mean(read_user_min_rating_data$is_reviewed)
# Return the result
return(probability_of_reviewing)
}
get_probability_of_review(interactions_dataset = interactions, userID = 5000, min_rating = 3)
Provide 1-2 sentences explaining overall what the function is doing and what arguments it requires.
You may want to use |
library(data.table)
interactions <- fread("/path/to/dataset")
Your kernel may crash! As it turns out, the |
Relevant topics: function, subset
-
R code used to solve this problem.
-
Modified
get_probability_of_review
with comments explaining each step. -
1-2 sentences explaining overall what the function is doing.
-
Number and name of arguments for the function,
get_probability_of_review
.
Question 2
We want people that use our function to be able to get results even if they don’t provide a minimum rating value.
Modify the function get_probability_of_review
so min_rating
has the default value of 0. Test your function as follows.
get_probability_of_review(interactions_dataset = interactions, userID = 5000)
Now, in R (and in most languages), you can provide the arguments out of order, as long as you provide the argument name on the left of the equals sign and the value on the right. For example the following will still work.
get_probability_of_review(userID = 5000, interactions_dataset = interactions)
In addition, you don’t have to provide the argument names when you call the function, however, you do have to place the arguments in order when you do.
get_probability_of_review(interactions, 5000)
-
Code used to solve this problem.
-
Output from running the code.
Question 3
Our function may not be the most efficient. However, we can reduce the code a little bit! Modify our function so we only use the subset
function once, rather than 3 times.
Test your modified function on userID 5000. Do you get the same results as above?
Now, instead of using subset
, just use regular old indexing in your function. Do your results agree with both versions above?
-
Code used to solve this problem.
-
Output from running the code.
Question 4
Run the code below. Explain what happens, and why it is happening.
head(read_user_min_rating_data)
Google "Scoping in R", and read. |
-
The results of running the R code.
-
1-2 sentences explaining what happened.
-
1-2 sentences explaining why it is happening.
Question 5
Apply our function to the interactions
dataset to get, for a sample of 10 users, the probability of reviewing books given that they liked the book.
Save this probability to a vector called prob_review
.
To do so, determine a minimum rating (min_rating
) value when calculating that probability. Provide 1-2 sentences explaining why you chose this value.
You can use the function |
You can pick any 10 users you want to compose your sample. |
-
R code used to solve this problem.
-
The results of running the R code.
-
1-2 sentences explaining why you this particular minimum rating value.
Question 6
Change the minimum rating value, and re-calculate the probability for your selected 10 users.
Make 1 (or more) plot(s) to compare the results you got with the different minimum rating value. Write 1-2 sentences describing your findings.
-
R code used to solve this problem.
-
The results of running the R code.
-
1-2 sentences comparing the results for question (5) and (6).
Please make sure to double check that your submission is complete, and contains all of your code and output before submitting. If you are on a spotty internet connection, it is recommended to download your submission after submitting it to make sure what you think you submitted, was what you actually submitted. |