Hate Speech Identification
About This Hackathon
<p>Welcome to Week 11 of the Weekly MachineHack Hackathon series! This week’s challenge is to develop a model to identify if a text input contains hateful context using the training data provided. This is a pivotal problem statement under the big requirement of moderating inappropriate and hateful internet content.<br> </p><p><strong>Dataset Description:</strong></p><ul><li>Train.csv: The training dataset with text and corresponding labels.</li><li>Test.csv: The dataset for which you will generate predictions.</li><li>Submission.csv: The format for submitting your predictions.</li></ul><p><strong>Participation and Benefits:</strong></p><ul><li>Skill Level: This challenge is designed for participants with experience in text classification and natural language processing.</li><li>Community Engagement: Join our Telegram group to connect with other participants, seek advice, and share insights.</li><li>Recognition: All participants will receive a MachineHack certificate, and top performers will be highlighted on the leaderboard.</li><li>Live Walkthrough: A live session will be held on <strong>24th September 2024 at 5:30 PM IST</strong> to guide you through the challenge and provide expert tips.</li></ul><p><strong>Submission and Evaluation:</strong></p><ul><li>Submission Format: Submit your predictions in the provided submission.csv file.</li><li>Evaluation Metric: Submissions will be evaluated based on the <strong>"F1 Score"</strong>, which balances precision and recall for the classification task.</li><li>Leaderboard: Track your performance and strive to be at the top of the leaderboard.</li></ul><p><strong>How to Approach the Challenge:</strong></p><ol><li>Data Preprocessing: Clean the text data by handling missing values, removing noise, and normalizing text (e.g., lowercasing, stemming/lemmatization).</li><li>Feature Engineering: Extract meaningful features such as word embeddings (e.g., Word2Vec, GloVe), TF-IDF vectors, and n-grams.</li><li>Modeling Techniques: Experiment with models including Logistic Regression, Naive Bayes, Support Vector Machines (SVM), and advanced models like BERT, RoBERTa, or other transformer-based architectures.</li><li>Validation and Tuning: Use techniques such as k-fold cross-validation to assess model performance and fine-tune hyperparameters.</li></ol><p>A starter notebook will be available to help you get started, providing a basic framework for data preprocessing and initial modeling.</p><p><strong>Getting Started:</strong></p><ul><li>Register Now: Ensure you're registered to participate and receive all necessary updates.</li><li>Download the Dataset: Access the dataset from the MachineHack platform to begin working on your solution.</li><li>Join the Community: Connect with fellow participants and mentors via our Telegram group for support and collaboration.</li></ul><p><strong>Support and Resources</strong></p><p>For any questions or assistance, contact our support team at support@machinehack.com. Stay updated by subscribing to our newsletter for the latest news and announcements.</p><p>We’re excited to see your innovative approaches to identifying and addressing hate speech. Good luck and happy hacking! 🚀</p>
Key Information
- Category: Hackathon
- Difficulty Level: Intermediate
- Status: Expired
- Start Date: 2024-09-19T14:30:00Z
- End Date: 2024-10-15T23:23:59Z
- Current Participants: 172
Prizes and Awards
Knowledge
Rules and Guidelines
<ul><li>The participants are required to provide the code for the work done.</li><li>The output of the code should match the submission file with the "Best Score" achieved by the participant.</li></ul>
Evaluation Criteria
<p>Evaluation will be done based on "F1 Score" metric between submitted file and the ground truth.</p>
Quick Summary
Hate Speech Identification is a intermediate level hackathon currently expired. It has 172 participants. Prizes include: Knowledge. The event runs from 2024-09-19T14:30:00Z to 2024-10-15T23:23:59Z.Registration is free and open to all skill levels.
