Ask Question, Ask an Expert

+61-413 786 465

info@mywordsolution.com

Ask Statistics and Probability Expert

Detailed Question: Have to use random forest prediction by using R or python programming langauage.

In this project, you are asked to study the general topic of time-series data mining, and specifically for time-series data trend prediction. Note that this is not a new topic in the literature, as studies were already around even way before the official advent of data mining research (e.g., in the literature of control theory or pattern recognition). On the other hand, in the literature of data mining, time-series data mining is considered as one of the advanced topics and has many important and hot applications in the real-world such as e-commerce, stock analysis, and weather forecast.

The specific problem in this project is about the time-series data trend prediction. The specific application scenario is in e-commerce. You are given a real dataset obtained from a real-world e-commerce application where there were 1000 products and 31490 customers (i.e., buyers) who bought these products. Of these 1000 products there are 100 key products (popular products). Also these 1000 products are in 15 categories. The specific data are given in the seven tables and the specific details of these tables are given below. The time window of this dataset is in 118 days with data documentation for each day. Hence, the time unit is one day where the timeline goes from the 0-th day to the 117-th day (17 weeks less one day in total). Now you are asked to do the sale quantity prediction for the 100 key products for each day between the 118-th day and the 146-th day (29 days).

-buyer_basic_info.txt: the basic attribute information of the buyers; in particular, the column names of this table are "buyer_id", "registration_time", "seller_level", "buyer_level", "age", and "gender". If we do not know the gender of a buyer, we set this buyer's gender attribute as -1.

-buyer_historical_category15_quantity.txt: the consumption quantities in the 15 categories for the buyers; in particular, the column names of this table are "buyer_id", "consumption quantity in the 1st category", ..., and "consumption quantity in the 15th category". The 15 categories are the ones of the products the customers bought in this dataset.

-buyer_historical_category15_money.txt: the consumption amounts in the 15 categories for the buyers; in particular, the column names of this table are "buyer_id", "consumption amount in the 1st category", ..., and "consumption amount in the 15th category".

-product_features.txt: the basic attribute information of the products; in particular, the column names of this table are "product_id", "attribute_1", "attribute_2", and "original price".

-Key_product_IDs.txt: the key product IDs

-trade_info_training.txt: the trade information between the key products and the buyers from the 0-th day to the 117-th day; in particular, the column names of this table are "product_id", "buyer_id", "trade_time", "trade_quantity", and "trade_price".

-product_distribution_training_set.txt: there are 119 columns, where the 1-st column shows the "product_id" and the 2-nd to the 119-th columns show the "quantities" of the key products from the 0-th day to the 117-th day; for example, the element at the 5-th row and the 10-th column in this table shows the quantity of the 5-th product at the 8-th day.

students are asked to do the prediction for the overall sale quantity of the 100 key products for each day of the time window from the 118-th day to the 146-th day, and also for each key product for each day of the time window.

You are given 10 minutes for the presentation. In the presentation, you must give the following information:

-Explain conceptually what time-series data mining is about
-Showcase the specific problem and the specific method you have implemented or developed as a solution to the problem you are given
-Demonstrate your implementation results in the prediction

The second phase is for the coding part of the project and concerns with the implementation of a time-series prediction method that you either take from the literature or you have developed by yourself as the result of your research in the first phase. You may use any programming language to implement the method and you may also use any existing libraries.

The first two phases begin at the beginning of the semester, and the due date of turning in the coding results is 24 April . Please make sure to follow the format requirement as the text output file specified here. The file puts each prediction as one line where the first prediction is for the overall prediction and each subsequent prediction is for a key product. Each prediction output line begins with the key product id where the overall prediction id is 0. There is a space between the prediction and the key product id. Then there is a space between a pair of the predictions of two neighboring days. The prediction lines in the output file begin with the first line as the overall prediction where the product id is 0, and then the first key product prediction with the smallest product id (i.e., 1), all the way to the last line as the prediction for the last key product prediction (i.e., id = 964). Also note that for undergrad students your output file only has one line prediction just for the overall prediction beginning with the product id = 0.

What you need to turn in: you shall turn in a zipped package containing the source code of your implementation of the prediction method with appropriate comments and documentations in the code, a README file to explain how to compile and run your code under what specific environment, and a text file containing the output matrix following exactly the format requirement stated above.

Statistics and Probability, Statistics

  • Category:- Statistics and Probability
  • Reference No.:- M91756056
  • Price:- $70

Priced at Now at $70, Verified Solution

Have any Question?


Related Questions in Statistics and Probability

You are considering an investment in a 40-year security the

You are considering an investment in a 40-year security. The security will pay $25 a year at the end of each of the first three years. The security will then pay $30 a year at the end of each of the next 20 years. The no ...

A gambler plans to play the casino dice game called craps

A gambler plans to play the casino dice game called craps, and he plans to place a bet on the "pass line." Let A be the event of winning. Based on the rules used in almost all casinos, P(A) = 244/495. Describe the event ...

Taylor found that 8 of the recipients of loans form a

Taylor found that 8% of the recipients of loans form a particular mortgage lender default within 3 years. If he takes a random sample of 736 customers who received loans 3 years ago, what is the average number of custome ...

In a multiple choice exam there are 5 questions and 4

In a multiple choice exam, there are 5 questions and 4 choices for each question (a, b, c, d). Nancy has not studied for the exam at all and decides to randomly guess the answers. What is the probability that: (a) the fi ...

Jamie dimon changed the business model for jpmorgan chase

Jamie Dimon changed the business model for JPMorgan Chase in 2008. In the process, the bank gave enormous trading authority to one individual. What are the ERM strengths and weaknesses of this strategy?

You play the following game against your friend you have 2

You play the following game against your friend. You have 2 urns and 4 balls. One of the balls is black and the other 3 are white. You can place the balls in the urns any way that you'd like, including leaving an urn emp ...

The monthly sales demand for a new product is uncertain but

The monthly sales demand for a new product is uncertain, but it is considered to be adequately described by a normal random variable with mean 50,000 units and variance 100,000,000. (a) A factory to manufacture the new p ...

How do you calculate true population proportion if your

How do you calculate true population proportion if your defective sample is too small? For example, if you have 50 and defect is 4. Please look below for more details

Assume that the class consists of 45 percent freshmen 20

Assume that the class consists of 45 percent freshmen, 20 percent sophomores, 20 percent juniors, and 15 percent seniors. Assume further that 50 percent of the freshmen, 50 percent of the sophomores, 25 percent of the ju ...

Peter and maryam take turns throwing one dart at the

Peter and Maryam take turns throwing one dart at the dartboard. Peter hits the bullseye with probability 1/5 and Maryam hits the bullseye with probability 1/3. Each throw is independent of the last. Whoever hits the bull ...

  • 4,153,160 Questions Asked
  • 13,132 Experts
  • 2,558,936 Questions Answered

Ask Experts for help!!

Looking for Assignment Help?

Start excelling in your Courses, Get help with Assignment

Write us your full requirement for evaluation and you will receive response within 20 minutes turnaround time.

Ask Now Help with Problems, Get a Best Answer

Why might a bank avoid the use of interest rate swaps even

Why might a bank avoid the use of interest rate swaps, even when the institution is exposed to significant interest rate

Describe the difference between zero coupon bonds and

Describe the difference between zero coupon bonds and coupon bonds. Under what conditions will a coupon bond sell at a p

Compute the present value of an annuity of 880 per year

Compute the present value of an annuity of $ 880 per year for 16 years, given a discount rate of 6 percent per annum. As

Compute the present value of an 1150 payment made in ten

Compute the present value of an $1,150 payment made in ten years when the discount rate is 12 percent. (Do not round int

Compute the present value of an annuity of 699 per year

Compute the present value of an annuity of $ 699 per year for 19 years, given a discount rate of 6 percent per annum. As