TDM 10100: Project 10 — 2023
Motivation: As we have learned, functions are foundational to more complex programs and behaviors.
There is an entire programming paradigm based on functions called functional programming.
Context:
We will apply functions to entire vectors of data using tapply
and sapply
. We learned how to create functions, and now the next step we will take is to use it on a series of data.
Dataset(s)
The project will use the following dataset(s):
-
/anvil/projects/tdm/data/restaurant/orders.csv
-
/anvil/projects/tdm/data/restaurant/vendors.csv
The read.csv() function automatically delineates by a comma`,` You can also load the |
Questions
Question 1 (2 pts)
Please load the datasets into data frames named orders
and vendors
There are many websites that explain how to use grep
and grepl
(the l
stands for logical
) to search for patterns. See, for example: statisticsglobe.com/grep-grepl-r-function-example
-
Use the
grepl
function and thesubset
function to make a new data frame fromvendors
, containing only the rows with "Fries" in the column calledvendor_tag_name
. -
Now use the
grep
function and row indexing, to make a data frame fromvendors
that (as before) contains only the rows with "Fries" in the column calledvendor_tag_name
. -
Verify that your data frames in questions 1a and 1b are the same size.
Question 2 (2 pts)
-
In the data frame
vendors
, there are two types ofdelivery_charge
values: 0 (which represented free delivery) and 0.7 (which represents non-free delivery). Make a table that shows how many of each type of value there are in thedelivery_charge
column. -
Please use the
prop.table
function to convert these counts into percentages.
Question 3 (2 pts)
-
Consider only the vendors with
vendor_category_id == 2
. Among these vendors, find the percentages of thedelivery_charge
column that are 0 (free delivery) and 0.7 (non-free delivery). -
Now consider only the vendors with
vendor_category_id == 3
, and again find the percentages of thedelivery_charge
column that are 0 (free delivery) and 0.7 (non-free delivery).
Question 4 (1 pt)
-
Solve questions 3a and 3b again, but this time, solve these two questions with one application of the
tapply
command, which provides the answers to both questions. (It is fine to give only the counts here, in question 4a, and convert the counts to percentages in question 4b.) -
Now (instead) use an user-defined function inside the
tapply
to convert your answer from counts into percentages.
Question 5 (1 pt)
-
Starting with your solution to question 4a, now use the
sapply
command to convert your answer from counts into percentages. Your solution should agree with the percentages that you found in question 4b.
Project 10 Assignment Checklist
-
Jupyter Lab notebook with your code, comments and output for the assignment
-
firstname-lastname-project10.ipynb
-
-
R code and comments for the assignment
-
firstname-lastname-project10.R
.
-
-
Submit files through Gradescope
Please make sure to double check that your submission is complete, and contains all of your code and output before submitting. If you are on a spotty internet connection, it is recommended to download your submission after submitting it to make sure what you think you submitted, was what you actually submitted. In addition, please review our submission guidelines before submitting your project. |