About This Hackathon

Welcome to Week 11 of the Weekly MachineHack Hackathon series! This week’s challenge is to develop a model to identify if a text input contains hateful context using the training data provided. This is a pivotal problem statement under the big requirement of moderating inappropriate and hateful internet content.  Dataset Description:<ul><li>Train.csv: The training dataset with text and corresponding labels.</li><li>Test.csv: The dataset for which you will generate predictions.</li><li>Submission.csv: The format for submitting your predictions.</li></ul>Participation and Benefits:<ul><li>Skill Level: This challenge is designed for participants with experience in text classification and natural language processing.</li><li>Community Engagement: Join our Telegram group to connect with other participants, seek advice, and share insights.</li><li>Recognition: All participants will receive a MachineHack certificate, and top performers will be highlighted on the leaderboard.</li><li>Live Walkthrough: A live session will be held on 24th September 2024 at 5:30 PM IST to guide you through the challenge and provide expert tips.</li></ul>Submission and Evaluation:<ul><li>Submission Format: Submit your predictions in the provided submission.csv file.</li><li>Evaluation Metric: Submissions will be evaluated based on the "F1 Score", which balances precision and recall for the classification task.</li><li>Leaderboard: Track your performance and strive to be at the top of the leaderboard.</li></ul>How to Approach the Challenge:<ol><li>Data Preprocessing: Clean the text data by handling missing values, removing noise, and normalizing text (e.g., lowercasing, stemming/lemmatization).</li><li>Feature Engineering: Extract meaningful features such as word embeddings (e.g., Word2Vec, GloVe), TF-IDF vectors, and n-grams.</li><li>Modeling Techniques: Experiment with models including Logistic Regression, Naive Bayes, Support Vector Machines (SVM), and advanced models like BERT, RoBERTa, or other transformer-based architectures.</li><li>Validation and Tuning: Use techniques such as k-fold cross-validation to assess model performance and fine-tune hyperparameters.</li></ol>A starter notebook will be available to help you get started, providing a basic framework for data preprocessing and initial modeling.Getting Started:<ul><li>Register Now: Ensure you're registered to participate and receive all necessary updates.</li><li>Download the Dataset: Access the dataset from the MachineHack platform to begin working on your solution.</li><li>Join the Community: Connect with fellow participants and mentors via our Telegram group for support and collaboration.</li></ul>Support and ResourcesFor any questions or assistance, contact our support team at support@machinehack.com. Stay updated by subscribing to our newsletter for the latest news and announcements.We’re excited to see your innovative approaches to identifying and addressing hate speech. Good luck and happy hacking! 🚀

Welcome to Week 11 of the Weekly MachineHack Hackathon series! This week’s challenge is to develop a model to identify if a text input contains hateful context using the training data provided. This is a pivotal problem statement under the big requirement of moderating inappropriate and hateful internet content.

Dataset Description:

Train.csv: The training dataset with text and corresponding labels.
Test.csv: The dataset for which you will generate predictions.
Submission.csv: The format for submitting your predictions.

Participation and Benefits:

Skill Level: This challenge is designed for participants with experience in text classification and natural language processing.
Community Engagement: Join our Telegram group to connect with other participants, seek advice, and share insights.
Recognition: All participants will receive a MachineHack certificate, and top performers will be highlighted on the leaderboard.
Live Walkthrough: A live session will be held on 24th September 2024 at 5:30 PM IST to guide you through the challenge and provide expert tips.

Submission and Evaluation:

Submission Format: Submit your predictions in the provided submission.csv file.
Evaluation Metric: Submissions will be evaluated based on the "F1 Score", which balances precision and recall for the classification task.
Leaderboard: Track your performance and strive to be at the top of the leaderboard.

How to Approach the Challenge:

Data Preprocessing: Clean the text data by handling missing values, removing noise, and normalizing text (e.g., lowercasing, stemming/lemmatization).
Feature Engineering: Extract meaningful features such as word embeddings (e.g., Word2Vec, GloVe), TF-IDF vectors, and n-grams.
Modeling Techniques: Experiment with models including Logistic Regression, Naive Bayes, Support Vector Machines (SVM), and advanced models like BERT, RoBERTa, or other transformer-based architectures.
Validation and Tuning: Use techniques such as k-fold cross-validation to assess model performance and fine-tune hyperparameters.

A starter notebook will be available to help you get started, providing a basic framework for data preprocessing and initial modeling.

Getting Started:

Register Now: Ensure you're registered to participate and receive all necessary updates.
Download the Dataset: Access the dataset from the MachineHack platform to begin working on your solution.
Join the Community: Connect with fellow participants and mentors via our Telegram group for support and collaboration.

Support and Resources

For any questions or assistance, contact our support team at support@machinehack.com. Stay updated by subscribing to our newsletter for the latest news and announcements.

We’re excited to see your innovative approaches to identifying and addressing hate speech. Good luck and happy hacking! 🚀

HateSpeechIdentification

Hate Speech Identification

Similar Challenges

Never Miss a Hackathon