Let us worry about your assignment instead!

We Helped With This Programming Homework: Have A Similar One?

SOLVED
CategoryProgramming
SubjectOther
DifficultyUndergraduate
StatusSolved
More InfoProgramming Assignment Help
68091

Assignment Description

DATA SCIENCE TOOLS & TECHNIQUES

 

Competency 4032.1.1: Database Management System - The graduate performs data analysis using database systems and query language.

Competency 4032.1.2: Data Wrangling  - The graduate manages data using wrangling tools, techniques, and methods.

Competency 4032.1.3: Data Analysis - The graduate performs data analysis using software tools and techniques.                                                                                                                             

Task 1: Estimating Population Size

 

Introduction:

 

The United States collects and analyzes demographic data from the U.S. population. The U.S. Census Bureau provides annual estimates of the population size of each U.S. state and region. Many important decisions are made using the estimated population dynamics, including the investments in new infrastructure, such as schools and hospitals; establishing new job training centers; opening or closing schools and senior centers; and adjusting the emergency services to the size and characteristics of the demographics of metropolitan and other areas, states, or the country as a whole. The census data and estimates are publicly available on the U.S. census website.

 

As a professional in the data analytics industry, you should know how to use tools that support the different stages and methods of analyzing data. These tools include environments that support performing data scraping, wrangling data, or applying various analyses.

 

For this project, you will use Python to scrape the web links from the HTML code of the U.S. Census Bureau’s Population Estimates web page, use SQL to spot differences in the population size, and use linear regression in R to predict the size of the population of your state in 2020.

 

The goal is to demonstrate your skill sets with Python, SQL, and R to support various data analytics processes.

 

You will use versions of Python, SQL, and R of your choosing that you will indicate in the attached “Student Submission Form.” You will also include the names of the files that house your responses to the task prompts, the code used, and all input and output files you used in your analyses.

 

Requirements:

 

Your submission must be your original work. No more than a combined total of 30% of the submission and no more than a 10% match to any one individual source can be directly quoted or closely paraphrased from sources, even if cited correctly. Use the Turnitin Originality Report available in Taskstream as a guide for this measure of originality.

 

You must use the rubric to direct the creation of your submission because it provides detailed criteria that will be used to evaluate your work. Each requirement below may be evaluated by more than one rubric aspect. The rubric aspect titles may contain hyperlinks to relevant portions of the course.

 

Submit a completed copy of the attached “Student Submission Form” that includes the following elements:

1.    versions of the programming environments for Python, SQL, and R used for the task

2.    an inventory of the code, input, and output files used in each part

Submit one zipped folder with three subfolders that include the code, input, and output files from each part of the task. Place the completed “Student Submission Form” in the main folder. Place the responses to the task prompts from each part in one PDF file for each part, and include these PDF files in the respective subfolders.

Part I: Python

Develop a web links scraper program in Python that extracts all of the unique web links that point out to other web pages from the HTML code of the “Current Estimates” web link and that populates them in a comma-separated values (CSV) file as absolute uniform resource indicators (URIs).

A.    Explain how the Python program extracts the web links from the HTML code of the “Current Estimates” web link.

B.    Explain the criteria you used to determine if a link is a locator to another HTML page. Specify the code segment that executesthis action as part of your explanation.

C.    Explain how the program ensures that relative links are saved as absolute URIs in the output file. Specify the code segmentthat executes this action as part of your explanation.

D.    Explain how the program ensures that there are no duplicated links in the output file. Specify the code that executes thisaction as part of your explanation.

E.     Provide the Python code you wrote to extract all the unique web links from the HTML code of the “Current Estimates” web link that point out to other HTML pages.

F.     Provide the HTML code of the “Current Estimates” web page.

G.    Provide the CSV file that your script created.

H.    Test your script and provide a screenshot of the successfully executed results.

Part II: SQL

I.      Calculate the mathematical difference in the population size estimates for each U.S. state the Census Bureau provided in twoconsecutive years using the most current data and the latest historical datasets for the national total population. Provide the SQL code and resulting table in your submission.

J.     Write a code to join the two tables on the year and state fields into one SQL table that identifies the absolute differences (inwhole rounded hundreds) in the estimates of 10,000 individuals or more between the two datasets. If the earlier estimates are larger than 10,000, the cells should indicate a negative value. Provide a screenshot of your tested code showing successful execution.

K.    Explain how you prepared the data and how the datasets were imported into two SQL tables. Provide a screenshot of thesuccessfully executed SQL code.

L.     Export the data from the SQL table into a CSV file, with rows representing the states and columns representing the years thatboth datasets estimate, that only shows the differences between the datasets (in whole rounded tens of thousands) that exceed 10,000 individuals.

Part III: R

M.    Create a linear regression analysis with R to predict the size of the population for the state you live in for 2020 based on theCurrent Estimates Data dataset.

N.    Explain how you prepared the data and how the dataset was imported into R, including a screenshot of your results.

O.    Using the estimates for the most recent year in the dataset, create an R script to display a histogram (using one million as theinterval size) of the current estimated population size of your state. Provide a screenshot of your results.

P.     Create an R script that will tabulate a statistical description of the estimated 2020 data. Provide a screenshot of your results.

Q.    Predict the population size of your state using a linear regression. Provide a screenshot of your results.

R.    Acknowledge sources, using in-text citations and references, for content that is quoted, paraphrased, or summarized.

1.                         IWP Task 1 Rubric

2.                         U.S. Census Bureau: Current Estimates

Assignment Description

IWP Task 1 (0417)

 

 

Not Evident

Approaching Competence

Competent

A. Python: How the Program Extracts Links

The explanation does not address how the Python program extracts the web links from the HTML code of the “Current Estimates” web link.

The explanation addresses how the Python program extracts the web links from the HTML code of the “Current Estimates” web link but contains inaccuracies.

The explanation accurately explains how the Python program extracts the web links from the HTML code of the “Current Estimates” web link.

B. Python: Criteria Used

The criteria used to determine if a link is a locator to another HTML page is not explained.

The submission explains the criteria used to determine if a link is a locator to another HTML page, but either the explanation contains inaccuracies, or it does not specify the correct code segment that executes this action.

The submission explains the criteria used to determine if a link is a locator to another HTML page, specifying the code segment that executes this action.

C. Python: Relative Links

The criteria in the program that translates and ensures that relative links are saved as absolute URIs in the output file is not explained.

The submission explains the criteria in the program that translates and ensures that relative links are saved as absolute URIs in the output file, but either the explanation contains inaccuracies, or it does not specify the correct code segment that executes this action.

The submission explains the criteria in the program that translates and ensures that relative links are saved as absolute URIs in the output file, specifying the code segment that executes this action.

D. Python: Duplicated Links

The submission does not explain how the program ensures that there are no duplicated links in the output file.

The submission explains how the program ensures that there are no duplicated links in the output file, but it does not specify the code segment that executes this action. The explanation contains inaccuracies.

The submission explains how the program ensures that there are no duplicated links in the output file, specifying the code segment that executes this action.

E. Python: Functioning Python Code

The submission does not provide a Python code.

The submission provides a functioning Python code but the code does not extract all the unique web links from the HTML code of the “Current Estimates” web link that point out to other HTML pages.

The submission provides the functioning Python code written to extract all the unique web links from the HTML code of the “Current Estimates” web link that point out to other HTML pages.

F. Python: HTML Code

The submission does not provide a HTML code of the “Current Estimates” web page.

The submission provides an HTML code but it is not for “Current Estimates” web page.

The submission provides the HTML code of the “Current Estimates” web page.

G. Python: CSV File

The CSV file is not provided.

Not applicable.

The CSV file that the script created is provided.

H. Python: Screenshot of Results

A screenshot of the successfully executed results is not provided.

Not applicable.

A screenshot of the successfully executed results from the written script is provided.

I.  SQL: Differences in Population Size Estimates

The submission does not identify the differences in population size.

The submission calculates the difference in population size and provides SQL code, but either the differences identified or the SQL code provided contains inaccuracies, or the code provided does not match the given data outcome.

The submission calculate the difference in population size and provides accurate and logical SQL code that created the new data table.

J. SQL: Joining Tables

The submission does not provide SQL code that joins the two tables.

Not applicable.

The submission provides SQL code that joins the two tables and provides a screenshot to show successful execution of the code.

K. SQL: How Data Was Prepared

The submission does not explain how the data was prepared and how the datasets were imported into the SQL tables.

The submission explains how the data was prepared and how the datasets were imported into the SQL tables, but either the explanation contains inaccuracies, or a screenshot of the successfully executed SQL code was not provided.

The submission explains how the data was prepared and how the datasets were imported into the SQL tables. The submission includes a screenshot of the successfully executed SQL code.

L. SQL: CSV File

The data from the SQL table is not exported into a CSV file.

The data from the SQL table is exported into a CSV file, but it does not show the differences between the datasets computed, or data set contains errors.

The data from the SQL table is correctly exported into a CSV file that shows that the differences between the datasets are accurately computed.

M. R: Linear Regression Analysis

A linear regression analysis with R is not created.

A linear regression analysis is created with R to predict the size of the population for the state selected for the year 2020 based on the Current Estimates Data dataset, but the regression contains errors.

A linear regression analysis is created with R for use in predicting the size of the population for the state selected for the year 2020 based on the Current Estimates Data dataset.

N. R: How Data Was Prepared

The submission does not explain how the data was prepared and imported into R.

The submission explains how the data was prepared and imported into R, but it does not include a screenshot of the results, or the explanation contains inaccuracies.

The submission explains how the data was prepared and imported into R and includes a screenshot of the results.

O. R: Histogram

An R script using the estimates for the most recent year in the dataset is not created.

An R script is created using the estimates for the most recent year in the dataset, but it does not display a histogram (using one million as the interval size) of the current latest estimated population size of the selected state, or the submission does not include a screenshot of the results.

An R script is accurately created, using the estimates for the most recent year in the dataset, to display a histogram (using one million as the interval size) of the current latest estimated population size of the selected state. The submission includes a screenshot of the results.

P. R: Statistical Description

An R script that will tabulate a statistical description of the estimated 2020 data using the summary method in R is not created.

An R script is created that will tabulate a statistical description of the estimated 2020 data, but either it did not use the summary method in R, or the submission does not provide a screenshot of accurate results.

An R script is accurately created that will tabulate a statistical description of the estimated 2020 data using the summary method in R. The submission includes a screenshot of accurate results.

Q. R: Population Size of State

The submission does not predict the population size of the selected state.

The submission proposes a prediction of the population size of the selected state but does not use data to support the prediction or does not provide a screenshot of the results.

The submission proposes a data-supported prediction of the population size of the selected state based on the data points for each year from the most current data dataset for the year 2020 using linear regression. The submission includes a screenshot of the results.

R. https://lrps.wgu.edu/provision/71484321

The submission does not include both in-text citations and a reference list for sources that are quoted, paraphrased, or summarized.

The submission includes in-text citations for sources that are quoted, paraphrased, or summarized, and a reference list; however, the citations and/or reference list is incomplete or inaccurate.

The submission includes in-text citations for sources that are properly quoted, paraphrased, or summarized and a reference list that accurately identifies the author, date, title, and source location as available.

https://lrps.wgu.edu/provision/27641407

Content is unstructured, is disjointed, or contains pervasive errors in mechanics, usage, or grammar. Vocabulary or tone is unprofessional or distracts from the topic.

Content is poorly organized, is difficult to follow, or contains errors in mechanics, usage, or grammar that cause confusion. Terminology is misused or ineffective.

Content reflects attention to detail, is organized, and focuses on the main ideas as prescribed in the task or chosen by the student. Terminology is pertinent, is used correctly, and effectively conveys the intended meaning. Mechanics, usage, and grammar promote accurate interpretation and understanding.

 

Customer Feedback

"Thanks for explanations after the assignment was already completed... Emily is such a nice tutor! "

Order #13073

Find Us On

soc fb soc insta


Paypal supported