Module 6: Applied Data Analytics Capstone

Project Presentation Instructions

The narrated presentation should include slides on the following:

Introduction (one slide)

Data (two slides)

Methods (one slide)

Analysis (two to three slides)

Conclusion (one slide)

INTRODUCTION

First summarize the purpose of the presentation and the data being analyzed. Then summarize the questions posed in the analysis of the data and the conclusions formed from the analysis. Finally, briefly outline what is contained in the rest of the presentation.

DATA

Give a description of the most important data used for analysis in this section. This should include outlining the types of data, count, any pre-processing needed, if there were missing values and steps taken, any need for formatting, normalizing, or transforming categorical values to quantitative variables.

METHODS

Give the methods you used to gather the data and the set up used for analysis.

ANALYSIS AND RESULTS

Include in this section what was analyzed and the conclusions you made from the analysis. Insert any charts you created from the data in this section. Should be based on analysis techniques that have been covered in class including: descriptive statistics, correlations, ANOVA, linear regression, and multiple linear regression.

CONCLUSION

Restate the questions you raised in the Introduction, as well as the most relevant results from the analysis. If your presentation contains more than one set of data or analysis, this is the place to compare the different results as needed. Include any questions or recommendations for additional data as needed.

I have attached the files and you have to make slides from these files according to the instructions. You will not need anything else as I already attached the files

Page | 13

[Type the company name]

Data Analytics on COVID-19 Variants

Project Report

Bs 2/14/2022

Table of Contents

Introduction: 4

Data: 5

Methods: 7

Analysis: 8

Conclusion: 10

Appendix: 12

Table of Figures

Figure 1

8

Figure 2

9

Figure 3

9

Figure 4

9

Figure 5

10

Figure 6

11

Introduction:

In this report, we will analyze the dataset of COVID-variants. We will apply the technique of data pre-processing in the dataset. Then, we will apply the technique of exploratory data analysis to view the data in the form of graph. Then, we will apply linear regression model to predict the data in the dataset.

First of all, we import the python libraries and load the dataset. Then, we apply data-preprocessing techniques on the dataset. In data-preprocessing, raw data is converted into the meaningful data. We check the quality of data before applying machine learning. For this purpose we need to remove the irrelevant data. Then, we need to remove the duplicate records from the dataset. After this, we check the data type of the column whether it is correct or not. Then, we need to standardize the data. Here, the mean value is 0 and standard deviation is 1 to scale the values in the dataset. Then, we check the outliers in the dataset. If outlier is present in the dataset, then remove it. After remove the outliers, we need to handle the missing data in the dataset. Then, normalize the data. At the end, we need to encode the categorical data into numeric data. For this purpose, we use label encoding or one hot encoding. In this way, we can clean our raw data into understandable format.

The next step is exploratory data analysis to view the chart and graphs of the dataset. We plot many graphs such as histogram, bar charts, heat map, line graphs, and box plot and frequency table. These graphs help to view the different type of data in the dataset.

The next step is to apply the machine learning model in our dataset. For this purpose, we apply linear regression model to predict the data in our dataset. It can show the actual and predicted value of the dataset. At the end, we find the model accuracy and make the scatter plot.

Data:

The data is all about COVID-variants. It consists of 100416 rows and 6 columns. It tells that the location where variants found. It also tells the date when it founds. It also tells the number of sequences, percentage of sequences and total number of sequences in COVID-19. There are 24 types of COVID variants in our dataset which is as follows:

1. Alpha

2. B.1.1.277

3. B.1.1.302

4. B.1.1.519

5. B.1.160

6. B.1.177

7. B.1.221

8. B.1.258

9. B.1.367

10. B.1.620

11. Beta

12. Delta

13. Epsilon

14. Eta

15. Gamma

16. Iota

17. Kappa

18. Lambda

19. Mu

20. Omicron

21. S:677H.Robin1

22. S:677P.Pelican

23. Non who

24. Others

Here, we tell the description of data of each variab

Exploratory Data Analysis (EDA) and Linear Regression

[Type the document subtitle]

Bs 3/14/2022

Assignment 3

Part A:

First of all, we need to import the libraries and read the CSV file of COVID Variants and print the first five rows of the dataset which is as follows:

Descriptive statistics:

Then, we need to analyze the descriptive statistics of data frame. Descriptive statistics give the statistic description of the dataset. We can calculate sum( ) and mean( ) and count( ) etc. of each columns of the dataset. Here, we calculate the sum() of the dataset.

Here, we calculate the mean() of the dataset.

Here, we find the statistic description of categorical data of the dataset.

Here, we find the statistic description of all the attributes of the dataset.

Histograms:

The histogram of the COVID Variants Dataset contains the plots of numeric data. This plot consists of num_sequences, perc_sequences and num_sequences_total. The histogram of the COVID Variants Dataset is shown below:

Bar charts:

The bar chart of the COVID Variants Dataset is shown below:

Heat maps:

The heat map of the COVID Variants Dataset is shown below:

Line graphs:

The line graph of the COVID Variants Dataset is shown below:

Box plots:

The box plot of the COVID Variants Dataset is shown below:

Frequency tables:

Now, this is the frequency table of COVID Variants Dataset. This frequency table consists of the number of variants in the dataset it tells the frequency of each variant.

This frequency table consists of the location where variants are found. It tells the frequency of each location.

Data Preprocessing

Assignment

Kunal Shah 3/7/2022

Data Preprocessing:

Data preprocessing is the technique which is used to convert the row data into the understandable format. We need to check the quality of data before applying the machine learning algorithms. Here, we discuss the steps of data preprocessing.

Import Libraries and Read Data:

First of all, we need to import the libraries and read the .csv file of COVID Variants and print the first five rows of the dataset which is as follows:

Remove irrelevant Data:

Now, we need to remove the irrelevant data. This is also known as feature scaling. We select those features which contains some information of data and drop the others features. The selected features are shown below:

T

Remove Duplicate Records:

Now, we need to remove the duplicates record from the dataset. You can see that there is no duplicate record present in the dataset. The length of the dataset is 100416

Check Data types:

After this, we check the data types of dataset whether it is correct or not. You can see that it gives the correct data types of each variable.

Standardize the data:

Now, we need to standardize the data. It can scale all the values in the dataset with the mean value are 0 and standard deviation value is 1. The standardize data is shown below:

Investigate the Outliers:

Now, we need to check the outlier in the dataset. The outlier is the data point which is far from the other values in the dataset. For this purpose, we draw the box plot to find out the outliers.

Now, we need to apply z-score technique to find the index of outlier. This is the z-score index of the outlier which is present in our dataset.

Missing Data:

Now, we need to handle the missing data from the dataset. You can see that there is no missing data present in the dataset.

Normalize the data:

Then, we need to normalize the data. Normalization is the techniques which convert the numeric columns into the standard scale. In machine learning, some values are different from the other value multiple times. Here, you can see the normalize data.

Encoding Categorical Data:

At the end, we use Label Encoder to encode the data. It converts categorical data into numeric data. We convert the data into x and y. The x value contains the data frame that is as follows:

The y value contains the target value that is as follows:

#### Why Choose Us

- 100% non-plagiarized Papers
- 24/7 /365 Service Available
- Affordable Prices
- Any Paper, Urgency, and Subject
- Will complete your papers in 6 hours
- On-time Delivery
- Money-back and Privacy guarantees
- Unlimited Amendments upon request
- Satisfaction guarantee

#### How it Works

- Click on the “Place Order” tab at the top menu or “Order Now” icon at the bottom and a new page will appear with an order form to be filled.
- Fill in your paper’s requirements in the "
**PAPER DETAILS**" section. - Fill in your paper’s academic level, deadline, and the required number of pages from the drop-down menus.
- Click “
**CREATE ACCOUNT & SIGN IN**” to enter your registration details and get an account with us for record-keeping and then, click on “PROCEED TO CHECKOUT” at the bottom of the page. - From there, the payment sections will show, follow the guided payment process and your order will be available for our writing team to work on it.