Mislabeled emotions. Apparently 30% of Google's GoEmotions dataset, "a human-labeled dataset of 58K Reddit comments categorized according to 27 emotions", are incorrect.

The page has numerous funny examples.

"daaaaaamn girl! -- mislabeled as ANGER"

"My man! -- mislabeled as NEUTRAL, likely because labelers don't know what this phrase means"

"Yay, cold McDonald's. My favorite. -- mislabeled as LOVE"

In short, the labelers didn't understand profanity, English idioms (the labels were probably Mechanical Turk users in foreign countries), sarcasm, US politics and culture, and Reddit memes.

30% of Google's emotions dataset is mislabeled

#solidstatelife #ai #supervisedlearning #emotions #sentimentanalysis

1

There are no comments yet.