Write and execute a MapReduce program to figure out the top 100 trending songs from the stream data, on a daily basis, for the week December 25-31. Although this is a real-time streaming problem, you may use all the data till the
(
n
−
1
)
th
day to calculate your output for the
n
th
day, i.e. you may consider all the stream data till 24 December (included) in your program to find the trending songs for 25 December and so on.
A stream is a record of a user playing a song. Each stream is represented as a tuple with the following attributes:
(song ID, user ID, timestamp, hour, date)
Each tuple consists of the song ID of the streamed song, the user ID of the user who streamed the song, the timestamp (Unix) of the stream, the hour of streaming, and the date of streaming.
A sample data file is uploaded containing a sample of stream records from the original dataset.
The program should run for any sample data file which contains the mentioned tuple attributes. The attached file is just sample (doesn't have huge dataset but the program should run for the above mentioned requirement)
Hi there
Greetings from Jaydeep.
I read complete description and seen as well attach file that contains raw data of song id, user id, timestamp details etc.
I can write Map Reduce Program in Java for Hadoop Framework for this sample.
For large data set we have to move those file to hdfs than can run map reduce program and find out trending songs as in list of their frequency count.
(Considering song counts.)