Classifying Toxic Comments
Our trainees face real-world challenges with the Kaggle competitions. This time our trainees were challenged to build models to classify toxic comments. We always like to participate in the Kaggle challenges because it stimulates knowledge-sharing and it is a unique experience in their traineeship.
Yu Ri, trainee data science, how was the Kaggle day for you?
‘It is a healthy competition where everyone wants to win. The Kaggle day is not only fun but you also learn a lot from your team members since they are experienced with different modelling techniques.’
Chiel you organized the Kaggle challenge together with David, what do you think you of the challenge?
‘We let the trainees participate because it stimulates knowledge sharing of all of their different technical skills. We make sure every team consists of trainees with different skills so each member complements the team in their own way.’
About the Kaggle Challenge
This time the contestants were required to build a multi-headed model that is capable of detecting different types of toxicity like threats, obscenity, insults and identity-based hate to improve current models of Jigsaw. The goal of this Kaggle Challenge is to make online discussion more respectful and productive. Following are some examples of questions that the participants try to figure out:
- What is the probability of each type of toxicity for each comment?
- Do some types of toxic messages have longer sentences?
- Do the toxic messages use more exclamation marks?
- Does a toxic message contain more spelling mistakes?
- Do they contain less or no punctuation at all?
- Are there more swear words that get used more often than others?