About This Hackathon

Welcome to the MLDS 2025 Hackathon!Problem Statement:We’re excited to launch a unique challenge in the lead-up to <a target="_blank" rel="noopener noreferrer" href="https://mlds.analyticsindiamag.com/">MLDS 2025</a>, where your skills in fine-tuning Small language models (SLMs) will be tested. This hackathon focuses on multi-class classification—your task is to fine-tune an SLM to classify data into multiple categories using the provided dataset accuratelyParticipation and BenefitsSkill Level: Ideal for participants experienced in LLM fine-tuning, classification tasks, and exploring deep learning-based NLP solutions.Community Engagement: Be part of the MLDS community—engage with peers in our <a target="_blank" rel="noopener noreferrer" href="https://t.me/joinchat/NJLxnlWiz9lFnEJU20Sccw">Telegram group</a>, ask questions, and share insights during the competition.Recognition:<ul><li>All participants will receive a MachineHack certificate of participation.</li><li>The top 3 performers will not only earn bragging rights but also exclusive <a target="_blank" rel="noopener noreferrer" href="https://mlds.analyticsindiamag.com/">MLDS 2025 tickets </a>giving them access to one of the largest gatherings of machine learning and data science professionals.</li></ul>Submission and EvaluationSubmission Format: Please submit the fine-tuned SLM model after testing its support and execution on the provided test script (<a target="_blank" rel="noopener noreferrer" href="https://colab.research.google.com/drive/1xm40olEtRp01c6C5-yzC3mPCnrL5BPlQ?usp=sharing">link</a>), and its dependencies before uploading to the portal. The LLM files will be accepted in .safetensors & .json formats.Evaluation Metric: Submissions will be evaluated based on classification accuracy, rewarding precise and consistent predictions.Leaderboard: Track your ranking live and aim for the top spot on the leaderboard!How to Approach the ChallengeNote: Please train your model to predict the "label_model" column given in the train file from the inference approach as per this script (<a target="_blank" rel="noopener noreferrer" href="https://colab.research.google.com/drive/1xm40olEtRp01c6C5-yzC3mPCnrL5BPlQ?usp=sharing">link</a>).Data Preprocessing<ul><li>Text Cleaning: Remove unnecessary characters, noise, and symbols for cleaner input to your LLM.</li><li>Tokenization: Use LLM-specific tokenizers like Hugging Face’s AutoTokenizer for efficient encoding.</li></ul>Feature Engineering<ul><li>Label Encoding: Ensure proper encoding of class labels for seamless integration with model outputs.</li><li>Handling Imbalanced Data: Consider techniques like oversampling or weighted loss functions to address class imbalances.</li></ul>Modeling Techniques<ul><li>Fine-Tuning LLMs: Use models such as BERT, RoBERTa, or GPT for multi-class classification, fine-tuned on your dataset.</li><li>Transfer Learning: Leverage pre-trained weights to kickstart training and improve generalization.</li></ul>Validation and Tuning<ul><li>Cross-Validation: Implement robust k-fold validation for consistent performance.</li><li>Hyperparameter Tuning: Experiment with parameters like learning rate, batch size, and epochs to optimize results.</li></ul>Getting StartedDownload the Dataset: As the competition starts, the training dataset will be ready for you to dive into.Join the Community: Collaborate, brainstorm, and troubleshoot with fellow participants in our <a target="_blank" rel="noopener noreferrer" href="https://t.me/joinchat/NJLxnlWiz9lFnEJU20Sccw">Telegram group</a>.Support and QueriesFor assistance, feel free to reach out to our team at support@machinehack.com. Wishing you the best!

Welcome to the MLDS 2025 Hackathon!

Problem Statement:

We’re excited to launch a unique challenge in the lead-up to MLDS 2025, where your skills in fine-tuning Small language models (SLMs) will be tested. This hackathon focuses on multi-class classification—your task is to fine-tune an SLM to classify data into multiple categories using the provided dataset accurately

Participation and Benefits

Skill Level:
Ideal for participants experienced in LLM fine-tuning, classification tasks, and exploring deep learning-based NLP solutions.

Community Engagement:
Be part of the MLDS community—engage with peers in our Telegram group, ask questions, and share insights during the competition.

Recognition:

All participants will receive a MachineHack certificate of participation.
The top 3 performers will not only earn bragging rights but also exclusive MLDS 2025 tickets giving them access to one of the largest gatherings of machine learning and data science professionals.

Submission and Evaluation

Submission Format:
Please submit the fine-tuned SLM model after testing its support and execution on the provided test script (link), and its dependencies before uploading to the portal. The LLM files will be accepted in .safetensors & .json formats.

Evaluation Metric:
Submissions will be evaluated based on classification accuracy, rewarding precise and consistent predictions.

Leaderboard:
Track your ranking live and aim for the top spot on the leaderboard!

How to Approach the Challenge

Note: Please train your model to predict the "label_model" column given in the train file from the inference approach as per this script (link).

Data Preprocessing

Text Cleaning: Remove unnecessary characters, noise, and symbols for cleaner input to your LLM.
Tokenization: Use LLM-specific tokenizers like Hugging Face’s AutoTokenizer for efficient encoding.

Feature Engineering

Label Encoding: Ensure proper encoding of class labels for seamless integration with model outputs.
Handling Imbalanced Data: Consider techniques like oversampling or weighted loss functions to address class imbalances.

Modeling Techniques

Fine-Tuning LLMs: Use models such as BERT, RoBERTa, or GPT for multi-class classification, fine-tuned on your dataset.
Transfer Learning: Leverage pre-trained weights to kickstart training and improve generalization.

Validation and Tuning

Cross-Validation: Implement robust k-fold validation for consistent performance.
Hyperparameter Tuning: Experiment with parameters like learning rate, batch size, and epochs to optimize results.

Getting Started

Download the Dataset:
As the competition starts, the training dataset will be ready for you to dive into.

Join the Community:
Collaborate, brainstorm, and troubleshoot with fellow participants in our Telegram group.

Support and Queries

For assistance, feel free to reach out to our team at support@machinehack.com.
Wishing you the best!

MLDS2025|
SequenceClassification

MLDS 2025 | Sequence Classification

Similar Challenges

Never Miss a Hackathon

MLDS2025|SequenceClassification

MLDS 2025 | Sequence Classification

Similar Challenges

Never Miss a Hackathon

MLDS2025|
SequenceClassification