Ask Computer Engineering Expert

Project: Pokemon Go! Analytics

General Instructions

1. This is the second group project. The total is 100 points. There are 20 extra points for extended works and 10 extra points for 5 teams with the best predictions.
2. Submit (1) your project report, (2) Python code(s), and (3) data files (CSV, Excel, JSON) in Blackboard. DO NOT email your project. Late submissions will not be accepted.

1. Introduction

Pokemon Go! became a very famous augmented reality (AR) game in 2016 summer. In this project, we want to understand the success of the mobile app game. Specifically, the purposes of this project are (1) to do web scraping using BeautifulSoup, (2) to construct a Pandas dataframe, (3) to explore/visualize the numeric data using matplotlib or seaborn, and finally (4) to use sklearn to build machine learning models to predict the app's review counts. The 5 best teams will get extra credits. Finally, for more extra credits, you can (5) analyze the app's screenshot images using deep learning with tensorflow.

2. Data Description

For this project, I have downloaded app pages of Pokemon Go! from Google Play Store and Apple App Store from July 21 2016 to October 31 2016:

The webpages were downloaded every ten minutes. This means that there are 144 (=24x6) HTML files for a given day and a given platform.

Once you extract the ZIP file, you will see 103 date folders under "data" folder. Each date folder contains HTML files downloaded in the specified date. Each HTML file name is formatted as "HH_MM_pokemon_PLATFORM.html", where HH is hour, MM is minute, and PLATFORM is either "android" or "ios". Note that due to intermittent connection errors, some HTML files may not be properly downloaded.

3. Project Instructions

Please follow the following steps to parse, organize, explore, and predict.

Web Scraping

The first step is to extract various values from the raw HTML files. You can use BeautifulSoup or other Python modules.

1. From all the iOS pages (ending with "_ios.html"), extract (i) number of customer ratings in the Current Version (let's call it ios_current_ratings); (ii) number of customer ratings in All Versions (ios_all_ratings); and (iii) file size in MB (ios_file_size). For example, the extracted values should be: 4688, 106508, 110 for "2016-07-21/00_00_pokemon_ios.html" file. Note that there are 3 values from iOS.

2. From all the Android pages (ending with "_android.html"), extract (i) average rating (in the scale between 1.0 and 5.0) (android_avg_rating); (ii) number of total ratings (android_total_ratings); (iii) number of ratings for 1-5 stars (android_ratings_1, android_ratings_2, ... , android_ratings_5); (iv) file size in MB (android_file_size). For example, the extracted numbers should be: 3.9, 1281802, 199974, 71512, 117754, 165956, 726597, 58 for the "2016-07- 21/00_00_pokemon_android.html" file. Note that there are 8 values from Android.

Data Organization

The next step is to organize the extracted values, so that we can do some data exploration. As we have time series data, we will organize the data by datetime (Note that datetime is a Python data type).

1. Using the extracted values from the previous step, create a dictionary, where the key is datetime object and the value is a dictionary with extracted values from iOS and Android HTML files. For example, for the case of "2016-07-21- 00_00_pokemon_android.html" file and "2016-07- 21/00_00_pokemon_ios.html" file, the key should be datetime(2016, 7, 21, 0, 0, 0) and the value should be:
{‘ios_current_ratings': 4688, ‘ios_all_ratings':
106508, ‘ios_file_size': 110, ‘android_avg_rating':
3.9, ‘android_total_ratings': 1281802,
‘android_rating_1': 199974, ‘android_rating_2': 71512,
‘android_rating_3': 117754, ‘android_rating_4':
165956, ‘android_rating_5': 726597,
‘android_file_size': 58}

2. Convert the dictionary into a Pandas dataframe where the index is datetime and columns are names of the extracted 11 iOS/Android values.

3. Save the dataframe into three formats (JSON, CSV, Excel). The file names are data.json, data.csv, and data.xlsx.

Data Exploration

Now that we have Pandas dataframe ready, we can start exploring the data.

1. Use describe() method to find the count/mean/std/min/25%/50%75%/max values for each 11 variables.
2. Use scatter_matrix() method to find pairs of variables with high correlations (either positive or negative).
3. For identified pairs, calculate the Pearon's correlation coefficients. You can use
corrcoef() function in numpy module for this.
4. Use matplotlib or other tools to create time series graphs for each of the 11 variables.
a. It is your decision either to put all time series in one graph or to have individual graphs for each time series. Also, use your judgment to combine Android and iOS data together.
b. As the files are collected in every 10 minutes, there are multiple values for a given date. Thus, the X-axis should incorporate dates and times.

Prediction Model

At this point, I am sure you are familiar with the data. Now let's build a machine learning model on the success of Pokemon Go! app. People often use the number of ratings (ios_all_ratings and android_total_ratings) as a proxy of app success.

1. Build two best regression models (one for iOS and one for Android) using sklearn using cross validation. Try to add/remove variables among the 11. Of course, you can create your own variables if you want. Try various algorithms in the module: LinearRegression, Ridge, Lasso, etc.

2. Submit your predicted values of ios_all_ratings and android_total_ratings.

3. Deep Learning

This is an optional task for extra points. We want to understand the screenshots of the app.

1. Identify all unique screenshots from iOS and Android pages. Note that you can use the URLs to distinguish different images. Also note that there are multiple images in each app page.

2. Download the screenshot images from iOS and Android webpages.

3. For each image, use tensorflow to extract the tags with the corresponding probabilities.

Project submission and report

1. Submit (1) your project report, (2) Python code(s), and (3) data files (CSV, Excel, JSON) in Blackboard.
2. Please make proper references when you use others' codes.
3. For the project report, I expect the followings:
a. High-level description of your codes
b. For Data Exploration, present results, numbers, graphs, etc. with your interpretations
c. For Prediction Model, describe how you came up with your regression model. Also, report your two predicted values.
d. For (optional) Deep Learning, report the number of unique screenshots for iOS and Android and report the tags/probabilities for each image. Finally, please submit the downloaded images.

Computer Engineering, Engineering

  • Category:- Computer Engineering
  • Reference No.:- M92295494
  • Price:- $80

Guranteed 48 Hours Delivery, In Price:- $80

Have any Question?


Related Questions in Computer Engineering

Does bmw have a guided missile corporate culture and

Does BMW have a guided missile corporate culture, and incubator corporate culture, a family corporate culture, or an Eiffel tower corporate culture?

Rebecca borrows 10000 at 18 compounded annually she pays

Rebecca borrows $10,000 at 18% compounded annually. She pays off the loan over a 5-year period with annual payments, starting at year 1. Each successive payment is $700 greater than the previous payment. (a) How much was ...

Jeff decides to start saving some money from this upcoming

Jeff decides to start saving some money from this upcoming month onwards. He decides to save only $500 at first, but each month he will increase the amount invested by $100. He will do it for 60 months (including the fir ...

Suppose you make 30 annual investments in a fund that pays

Suppose you make 30 annual investments in a fund that pays 6% compounded annually. If your first deposit is $7,500 and each successive deposit is 6% greater than the preceding deposit, how much will be in the fund immedi ...

Question -under what circumstances is it ethical if ever to

Question :- Under what circumstances is it ethical, if ever, to use consumer information in marketing research? Explain why you consider it ethical or unethical.

What are the differences between four types of economics

What are the differences between four types of economics evaluations and their differences with other two (budget impact analysis (BIA) and cost of illness (COI) studies)?

What type of economic system does norway have explain some

What type of economic system does Norway have? Explain some of the benefits of this system to the country and some of the drawbacks,

Among the who imf and wto which of these governmental

Among the WHO, IMF, and WTO, which of these governmental institutions do you feel has most profoundly shaped healthcare outcomes in low-income countries and why? Please support your reasons with examples and research/doc ...

A real estate developer will build two different types of

A real estate developer will build two different types of apartments in a residential area: one- bedroom apartments and two-bedroom apartments. In addition, the developer will build either a swimming pool or a tennis cou ...

Question what some of the reasons that evolutionary models

Question : What some of the reasons that evolutionary models are considered by many to be the best approach to software development. The response must be typed, single spaced, must be in times new roman font (size 12) an ...

  • 4,153,160 Questions Asked
  • 13,132 Experts
  • 2,558,936 Questions Answered

Ask Experts for help!!

Looking for Assignment Help?

Start excelling in your Courses, Get help with Assignment

Write us your full requirement for evaluation and you will receive response within 20 minutes turnaround time.

Ask Now Help with Problems, Get a Best Answer

Why might a bank avoid the use of interest rate swaps even

Why might a bank avoid the use of interest rate swaps, even when the institution is exposed to significant interest rate

Describe the difference between zero coupon bonds and

Describe the difference between zero coupon bonds and coupon bonds. Under what conditions will a coupon bond sell at a p

Compute the present value of an annuity of 880 per year

Compute the present value of an annuity of $ 880 per year for 16 years, given a discount rate of 6 percent per annum. As

Compute the present value of an 1150 payment made in ten

Compute the present value of an $1,150 payment made in ten years when the discount rate is 12 percent. (Do not round int

Compute the present value of an annuity of 699 per year

Compute the present value of an annuity of $ 699 per year for 19 years, given a discount rate of 6 percent per annum. As