- Details
- Parent Category: Programming Assignments' Solutions

# We Helped With This R Studio Programming Homework: Have A Similar One?

Category | Programming |
---|---|

Subject | R | R Studio |

Difficulty | Graduate |

Status | Solved |

More Info | Statistics Homework Help |

## Short Assignment Requirements

## Assignment Code

```
#------------------------------------------
#------------------------------------------
############## Homework # 4 ##############
#------------------------------------------
#------------------------------------------
# Directions:
# In Assignment 4, you will transform a sample of
# 2500 trump tweets for analysis, using data with over
# 30,000 tweets. You will use the preprocessCorpus function
# to simplify the transformation and then demonstrate and
# understanding of the transformation from a corpus to
# a DTM or TDM.
#------------------------------------------
######### Preliminary Code #########
#------------------------------------------
#------------------------------------------
## Get/Set Your Working Directory
#------------------------------------------
#------------------------------------------
## Load Packages (libraries)
#------------------------------------------
#------------------------------------------
######### Solutions #########
#------------------------------------------
#------------------------------------------
# 1. First, import the data from the .csv file (trump_tweets.csv)
# as a dataframe names trumpt. Identify the ID column and column with text
# that you would like to analyze and rename them or create them as necessary
# to create the Corpus from a DataframeSource.
#
#------------------------------------------
# ANSWER #
#------------------------------------------
# 2. Run a command using the set.seed function and your birthday,
# as in HW #3. Using Hw #3 as a guide, use the sample()
# function to create a sample of 2500 tweets without replacement.
# Name your subset of tweets "trump_sub". Then, remove the
# original dataframe.
#------------------------------------------
# ANSWER #
#------------------------------------------
# 3. Create the Corpus object named trumpcorp. Then, use the preprocessCorpus()
# function, which you loaded into your workspace, to cleanse the corpus.
# Use lemmatization, keep hashtags, remove SMART stopwords
# and do not preserve intraword punctuation.
#------------------------------------------
# ANSWER #
#------------------------------------------
# 4. Create a Document Term Matrix, named trumpDTM, with default settings.
# View the high-level DTM information. How many Terms are in your
# DTM?
#------------------------------------------
# ANSWER #
#------------------------------------------
# 5. Now, apply a a maximum term length of 15 and a minimum term length of 4 to the DTM.
# How does this impact the number of terms? How many terms are there in the DTM with
# min and max term lengths? How does this impact the size of your DTM?
#------------------------------------------
# ANSWER #
#------------------------------------------
# 6. Use the appropriate functions to determine
# which terms appear more than 100 times in your TDM?
# and 250 times in your TDM? What do you notice about
# these terms? Use the appropriate function to find terms associated
# with terms appearing 250 times. What does
# Donald Trump talk about most on Twitter based on your sample?
# In your opinion, are these the most important terms?
# Why or why not? Explain.
#------------------------------------------
# ANSWER #
#------------------------------------------
# 7. Next, consider the sparsity of the DTM. Based on the
# sparsity, Based on this, should you remove Sparse
# Terms? Explain. If you remove terms with .995
# sparsity, how many terms remain? If you remove terms
# with .99 sparsity, how many remain? Which would you
# use and why? Explain.
#
#------------------------------------------
# ANSWER #
#------------------------------------------
# 8. tfidf weighting is probably the most popular weighting used in Text Mining.
# Apply tfidf weighting to your matrix (which removes sparse terms). Do not use normalization.
# View the 50 highest weighted terms. In your opinion, do these seem
# to be the most important terms in your sample of trump tweets?
# why or why not?
#------------------------------------------
# ANSWER #
#------------------------------------------
```