Ask Question, Ask an Expert

+61-413 786 465

info@mywordsolution.com

Ask Computer Engineering Expert

Project: Pokemon Go! Analytics

General Instructions

1. This is the second group project. The total is 100 points. There are 20 extra points for extended works and 10 extra points for 5 teams with the best predictions.
2. Submit (1) your project report, (2) Python code(s), and (3) data files (CSV, Excel, JSON) in Blackboard. DO NOT email your project. Late submissions will not be accepted.

1. Introduction

Pokemon Go! became a very famous augmented reality (AR) game in 2016 summer. In this project, we want to understand the success of the mobile app game. Specifically, the purposes of this project are (1) to do web scraping using BeautifulSoup, (2) to construct a Pandas dataframe, (3) to explore/visualize the numeric data using matplotlib or seaborn, and finally (4) to use sklearn to build machine learning models to predict the app's review counts. The 5 best teams will get extra credits. Finally, for more extra credits, you can (5) analyze the app's screenshot images using deep learning with tensorflow.

2. Data Description

For this project, I have downloaded app pages of Pokemon Go! from Google Play Store and Apple App Store from July 21 2016 to October 31 2016:

The webpages were downloaded every ten minutes. This means that there are 144 (=24x6) HTML files for a given day and a given platform.

Once you extract the ZIP file, you will see 103 date folders under "data" folder. Each date folder contains HTML files downloaded in the specified date. Each HTML file name is formatted as "HH_MM_pokemon_PLATFORM.html", where HH is hour, MM is minute, and PLATFORM is either "android" or "ios". Note that due to intermittent connection errors, some HTML files may not be properly downloaded.

3. Project Instructions

Please follow the following steps to parse, organize, explore, and predict.

Web Scraping

The first step is to extract various values from the raw HTML files. You can use BeautifulSoup or other Python modules.

1. From all the iOS pages (ending with "_ios.html"), extract (i) number of customer ratings in the Current Version (let's call it ios_current_ratings); (ii) number of customer ratings in All Versions (ios_all_ratings); and (iii) file size in MB (ios_file_size). For example, the extracted values should be: 4688, 106508, 110 for "2016-07-21/00_00_pokemon_ios.html" file. Note that there are 3 values from iOS.

2. From all the Android pages (ending with "_android.html"), extract (i) average rating (in the scale between 1.0 and 5.0) (android_avg_rating); (ii) number of total ratings (android_total_ratings); (iii) number of ratings for 1-5 stars (android_ratings_1, android_ratings_2, ... , android_ratings_5); (iv) file size in MB (android_file_size). For example, the extracted numbers should be: 3.9, 1281802, 199974, 71512, 117754, 165956, 726597, 58 for the "2016-07- 21/00_00_pokemon_android.html" file. Note that there are 8 values from Android.

Data Organization

The next step is to organize the extracted values, so that we can do some data exploration. As we have time series data, we will organize the data by datetime (Note that datetime is a Python data type).

1. Using the extracted values from the previous step, create a dictionary, where the key is datetime object and the value is a dictionary with extracted values from iOS and Android HTML files. For example, for the case of "2016-07-21- 00_00_pokemon_android.html" file and "2016-07- 21/00_00_pokemon_ios.html" file, the key should be datetime(2016, 7, 21, 0, 0, 0) and the value should be:
{‘ios_current_ratings': 4688, ‘ios_all_ratings':
106508, ‘ios_file_size': 110, ‘android_avg_rating':
3.9, ‘android_total_ratings': 1281802,
‘android_rating_1': 199974, ‘android_rating_2': 71512,
‘android_rating_3': 117754, ‘android_rating_4':
165956, ‘android_rating_5': 726597,
‘android_file_size': 58}

2. Convert the dictionary into a Pandas dataframe where the index is datetime and columns are names of the extracted 11 iOS/Android values.

3. Save the dataframe into three formats (JSON, CSV, Excel). The file names are data.json, data.csv, and data.xlsx.

Data Exploration

Now that we have Pandas dataframe ready, we can start exploring the data.

1. Use describe() method to find the count/mean/std/min/25%/50%75%/max values for each 11 variables.
2. Use scatter_matrix() method to find pairs of variables with high correlations (either positive or negative).
3. For identified pairs, calculate the Pearon's correlation coefficients. You can use
corrcoef() function in numpy module for this.
4. Use matplotlib or other tools to create time series graphs for each of the 11 variables.
a. It is your decision either to put all time series in one graph or to have individual graphs for each time series. Also, use your judgment to combine Android and iOS data together.
b. As the files are collected in every 10 minutes, there are multiple values for a given date. Thus, the X-axis should incorporate dates and times.

Prediction Model

At this point, I am sure you are familiar with the data. Now let's build a machine learning model on the success of Pokemon Go! app. People often use the number of ratings (ios_all_ratings and android_total_ratings) as a proxy of app success.

1. Build two best regression models (one for iOS and one for Android) using sklearn using cross validation. Try to add/remove variables among the 11. Of course, you can create your own variables if you want. Try various algorithms in the module: LinearRegression, Ridge, Lasso, etc.

2. Submit your predicted values of ios_all_ratings and android_total_ratings.

3. Deep Learning

This is an optional task for extra points. We want to understand the screenshots of the app.

1. Identify all unique screenshots from iOS and Android pages. Note that you can use the URLs to distinguish different images. Also note that there are multiple images in each app page.

2. Download the screenshot images from iOS and Android webpages.

3. For each image, use tensorflow to extract the tags with the corresponding probabilities.

Project submission and report

1. Submit (1) your project report, (2) Python code(s), and (3) data files (CSV, Excel, JSON) in Blackboard.
2. Please make proper references when you use others' codes.
3. For the project report, I expect the followings:
a. High-level description of your codes
b. For Data Exploration, present results, numbers, graphs, etc. with your interpretations
c. For Prediction Model, describe how you came up with your regression model. Also, report your two predicted values.
d. For (optional) Deep Learning, report the number of unique screenshots for iOS and Android and report the tags/probabilities for each image. Finally, please submit the downloaded images.

Computer Engineering, Engineering

  • Category:- Computer Engineering
  • Reference No.:- M92295494
  • Price:- $80

Guranteed 48 Hours Delivery, In Price:- $80

Have any Question?


Related Questions in Computer Engineering

Question recall the optimal page replacement algorithmthe

Question : Recall the optimal page replacement algorithm. The victim page chosen by the algorithm is the one that will not be used the longest period of time. Can this algorithm exhibit Belady's anomaly? Justify your ans ...

The freemont automobile factory has discovered that the

The Freemont Automobile Factory has discovered that the longer a worker has been on the job, the more parts the worker can produce. I need help finding an application that computes and displays a worker's anticipated out ...

Under what circumstances is it ethical if ever to use

Under what circumstances is it ethical, if ever, to use consumer information in marketing research? Explain why you consider it ethical or unethical.

You have 2 tasks to createer-modeling in ms visio simple

You have 2 tasks to createER-Modeling in MS visio. Simple jab for the database expert that would take him 30 min to finish. My requirement not only for the expert to get the task done. I need to do it myself. Then he cor ...

What are some breakthrough events in the evolution of

What are some breakthrough events in the evolution of elearning?

How do you calculate the annual interest rate of 12

How do you calculate the annual interest rate of 12% compounded monthly. I know how to do for annually but not monthly. You are offered the opportunity to put some money away for retirement. You will receive 10 annual pa ...

Situation you are designing a system with a requirement to

Situation: You are designing a system with a requirement to provide direct access to 10,000 records. The data file grows at a rate of 5% per year. Evaluate the effect of a static hashed file with a load factor of .45. 1. ...

Lnguage isnbspcgenerate a sparse vector class with

Language is  C++ Generate a sparse vector class with * operator, such as  Vector Vector::operator * (Vector& param) A multiplication (*) operators returns element-wise multiplication of two vectors in another vector. Giv ...

Suppose that you want an operation for the adt list that

Suppose that you want an operation for the ADT list that adds an array of items to the end of the list. The header of the method could be as follows. public void addAll(T[] items) Write an implementation of this method f ...

Question suppose that you were creating a new global

Question : Suppose that you were creating a new global organization. The new organization will provide Information Technology (IT) infrastructure consulting services, computer security consulting services, and cloud comp ...

  • 4,153,160 Questions Asked
  • 13,132 Experts
  • 2,558,936 Questions Answered

Ask Experts for help!!

Looking for Assignment Help?

Start excelling in your Courses, Get help with Assignment

Write us your full requirement for evaluation and you will receive response within 20 minutes turnaround time.

Ask Now Help with Problems, Get a Best Answer

Why might a bank avoid the use of interest rate swaps even

Why might a bank avoid the use of interest rate swaps, even when the institution is exposed to significant interest rate

Describe the difference between zero coupon bonds and

Describe the difference between zero coupon bonds and coupon bonds. Under what conditions will a coupon bond sell at a p

Compute the present value of an annuity of 880 per year

Compute the present value of an annuity of $ 880 per year for 16 years, given a discount rate of 6 percent per annum. As

Compute the present value of an 1150 payment made in ten

Compute the present value of an $1,150 payment made in ten years when the discount rate is 12 percent. (Do not round int

Compute the present value of an annuity of 699 per year

Compute the present value of an annuity of $ 699 per year for 19 years, given a discount rate of 6 percent per annum. As