- Parent Category: Programming Assignments' Solutions
We Helped With This R Language Programming Assignment: Have A Similar One?
|Subject||R | R Studio|
|More Info||Statistics Assignment Help|
Silver nanoparticle-based surface-enhanced Raman spectroscopy (SERS) is a label-free, non-invasive technique that can be used for many applications, including the detection of oral squamous cell cancer (OSCC) using saliva. You will analyse Cancer dataset available on Blackboard. These data contain a total of 180 SERS spectra that were acquired from saliva from 90 normal healthy individuals and from 90 confirmed oropharyngeal cancer patients. The group label (1=Cancer, 0–Control) is stored in the first column of the dataset. Each SERS spectra is made of 593 Raman shifts (cm^¹) ranging between 800 and 1800 cm"¹. The exact shift values are stored in the first row of the dataset. Each SERS spectra represents the intensity of reflected light measured at that shift and is used to identify the chemical groups of molecules present in a saliva sample under investiga- tion. For example, if a patient has OSCC, molecules from specific chemical groups will be present in the saliva and there will be prominent peaks (reflected light) at the relevant Raman shifts. Your goal as a researcher is to use these data to investigate how well the OSCC status can be predicted from these SERS spectra. You will produce a report describing the analyses you carry out, the results of your analyses, and discussion and interpretation of the results. The following points should be considered in your report and expanded upon in any direction that you wish. 1. Explore and describe the dataset. Make use of plots, summary statistics, etc. 2. Find and report 5 Raman shifts at which the differences between the two groups are the highest and run a K-nearest neighbors model to predict the OSCC status given these 5 predictors. Use cross-validation to select the number of nearest neighbors K. Report the average test error rate, sensitivity and specificity of the classification algoritlim, averaged over 100 independent runs of splitting the data into the train and test sets. How do the results differ, if you use 10 slufts instead? 15? 3. Run a penalised logistic classifier with Lasso penalty to automatically select both the Raman shifts and the number of them that contribute the most to the differ- entiation between the two groups. You will need to include the argument family = "binocial" in the glanet R function in order to make it fit a logistic regression model rather than a linear one. Use cross-validation algorithm of your choice to select the value of parameter A. Report the average test error rate, sensitivity and specificity of the classification algorithm, averaged over 1000 independent runs of splitting the data into the train and test sels. Compare and discuss the differences in results between the two approaches. The R code should be appended at the end of the report. Tee report (including appen- dices and figures) should not exceed 15 sides of Al using fout size: 11pt. Marks will b given for technical content, implementation and written presentation (see the assessment criteria below for the breakdown of marks in the different categorks).