Entity Recognition Challenge
About This Hackathon
<h3>Welcome to Week 19 of the Weekly MachineHack Hackathon Series!</h3><p>This week’s challenge focuses on <strong>extracting key entities from product descriptions written in the Persian language</strong>. The task is to develop a solution that identifies the main product being discussed in the title and description, drawn from data provided by a mart. Participants must submit the extracted product corresponding to each entry in the test file.</p><h3><strong>Participation and Benefits</strong></h3><ul><li><strong>Skill Level</strong>:<br>Ideal for participants with experience in natural language processing (NLP), entity recognition, and working with multilingual text data.</li><li><strong>Community Engagement</strong>:<br>Join our Telegram group to collaborate with peers, ask questions, and share insights.</li><li><strong>Recognition</strong>:<br>All participants will receive a MachineHack certificate, and top performers will be highlighted on the leaderboard.</li></ul><h3><strong>Submission and Evaluation</strong></h3><ul><li><strong>Submission Format</strong>:<br>Submit the extracted product names in the provided Submission.csv file, ensuring that they align with the respective title/description in the test set.</li><li><strong>Evaluation Metric</strong>:<br>Submissions will be evaluated based on <strong>accuracy</strong> in correctly identifying the primary product.</li><li><strong>Leaderboard</strong>:<br>Track your ranking in real-time and compete for the top spot!</li></ul><h3><strong>How to Approach the Challenge</strong></h3><h4><strong>Data Preprocessing</strong></h4><ul><li><strong>Handle Text in Persian</strong>:<br>Ensure the dataset encoding supports Persian (UTF-8).</li><li><strong>Tokenization</strong>:<br>Use language-specific tokenizers such as Hazm or Parsivar to process Persian text effectively.</li><li><strong>Cleaning</strong>:<br>Remove irrelevant words, stopwords, and symbols that may not contribute to the product identification.</li></ul><h4><strong>Feature Engineering</strong></h4><ul><li><strong>Title and Description Alignment</strong>:<br>Focus on the alignment between the title and description for common product references.</li><li><strong>Keywords Extraction</strong>:<br>Apply techniques like TF-IDF or attention mechanisms to highlight key phrases.</li></ul><h4><strong>Modeling Techniques</strong></h4><ul><li><strong>Entity Recognition</strong>:<br>Utilize Named Entity Recognition (NER) models fine-tuned for Persian, such as ParsBERT.</li><li><strong>Transformers</strong>:<br>Experiment with transformer-based architectures like BERT or RoBERTa trained on Farsi data for accurate entity extraction.</li></ul><h4><strong>Validation and Tuning</strong></h4><ul><li><strong>Manual Validation</strong>:<br>Spot-check the extracted entities to ensure meaningful results.</li><li><strong>Hyperparameter Tuning</strong>:<br>Optimize model parameters using grid search or Bayesian optimization.</li></ul><h3><strong>Resources and Support</strong></h3><ul><li><strong>Starter Notebook</strong>:<br>A starter notebook will be available to help you begin with data exploration and model prototyping. Accessible for premium users.</li><li><strong>Expert Guidance</strong>:<br>A live walkthrough session will be held on <strong>9th December 2024</strong> at <strong>4:00 PM IST </strong>to provide tips and strategies for the challenge.</li></ul><h3><strong>Getting Started</strong></h3><ol><li><strong>Register Now</strong>:<br>Ensure you’re registered to receive updates and access challenge materials.</li><li><strong>Download the Dataset</strong>:<br>Access the dataset from the MachineHack platform and begin your analysis.</li><li><strong>Join the Community</strong>:<br>Collaborate and exchange ideas with other participants via our <a target="_blank" rel="noopener noreferrer" href="https://t.me/joinchat/NJLxnlWiz9lFnEJU20Sccw">Telegram group</a>.</li></ol><h3><strong>Support and Queries</strong></h3><p>For any questions, reach out to our support team at support@machinehack.com.</p><p>We’re excited to see your innovative solutions for this Persian-language NLP challenge! Best of luck!</p>
Key Information
- Category: Hackathon
- Difficulty Level: Advanced
- Status: Expired
- Start Date: 2024-12-05T14:00:00Z
- End Date: 2025-01-19T23:59:59Z
- Current Participants: 82
Prizes and Awards
Knowledge
Rules and Guidelines
<ul><li>The participants are required to provide the code for the work done.</li><li>The output of the code should match the submission file with the "Best Score" achieved by the participant.</li></ul>
Evaluation Criteria
<p>Na</p>
Quick Summary
Entity Recognition Challenge is a advanced level hackathon currently expired. It has 82 participants. Prizes include: Knowledge. The event runs from 2024-12-05T14:00:00Z to 2025-01-19T23:59:59Z.Registration is free and open to all skill levels.
