1 min readMay 14, 2020
Hi, your answer is there in following calculation.
Even we have an ideal movie-genre dataset (40K samples), where all genres are equal in numbers. And each movie has an average of 2 genres. Then each genre will occur around (40000*2)/16 = 5000 times. We still have an imbalanced dataset because the network is seeing each genre only 12.5% of the time. In this case, the network just learns to predict no genre at all.
(5000/40000)*100 => 50/4 => 12.5 %
If in 40K samples, each genre is occurring around 5K times.