DS@GT Applied Research and Competitions
The Data Science @ Georgia Tech (DS@GT) Applied Research Competitions (ARC) is a student-run research group focused on machine learning, information retrieval, and data-driven scientific modeling through participation in competitive research challenges. The group organizes participation in the Conference and Labs of the Evaluation Forum (CLEF) but also leverages platforms like Kaggle and PACE for training and internal skill development. Our code is available on GitHub.

We recently wrapped up our Spring 2025 semester with 22 accepted working note papers and 45 authors at CLEF 2025. We are preparing for the Fall 2025 Interest Group and Spring 2026 competition season. In addition to CLEF 2026, we plan to participate in TREC 2025, MediaEval 2025, and NTCIR 19.
Membership
ARC operates as a project group within the broader Data Science @ Georgia Tech student organization. Members are expected to be members of DS@GT and adhere to its general guidelines.
- Eligibility: Open to all Georgia Tech students (undergraduate, graduate - including OMSCS/OMSA, PhD), and alumni with student status (e.g., enrolled in a for-credit seminar). Both on-campus and online students participate actively.
- Requirements: Members must be part of the parent DS@GT organization (including paying dues). Active participation, especially in the Fall, is crucial for Spring team placement.
- Minimum Technical Expectations: Proficiency in Python (SciPy stack: NumPy, Pandas, Matplotlib) and Git version control. Familiarity with ML concepts is highly beneficial. Prior ML/IR project experience (non-trivial complexity) or software engineering experience is expected. Completion or enrollment in a project-heavy course (e.g., ML, DL) is recommended.
Group Structure and Schedule
Recordings of our Fall 2024 Interest Group can be found below and provide an overview of the group’s structure and expectations.
The group operates on a two-semester academic cycle.
- Fall Semester: Interest Group & Preparation
- Focus: Introduction to competitive data science (Kaggle) and research competitions (CLEF).
- Activities: Weekly meetings, EDA assignments, paper discussions, internal Kaggle competition, foundational skills training (e.g., PACE usage).
- Outcome: Formation of motivated and prepared teams for the Spring semester.
- Time Commitment: ~2-3 hours/week (equivalent to a 1-unit seminar).
- Spring Semester: Competition Execution & Publication
- Focus: Deep dive into specific CLEF tasks within dedicated teams.
- Activities: Team-based research and development, model building, experimentation, result submission, writing and submitting working notes papers.
- Outcome: Competition submissions, published papers, presentation of work.
- Time Commitment: ~100-150+ hours total (equivalent to a 2-3 unit course), varies by role and project intensity.
Team Structure
- Size: Typically 3-5 members per team (including the lead). No more than 5 people on a single task due to complexity of sharding work.
- Composition: Aim for a mix of skills and experience levels where possible, fostering inclusivity and knowledge sharing (e.g., pairing experienced members with newer ones).
- Roles:
- Task Lead: Responsible for managing a task team in the Spring. Duties include registering the team, defining the technical approach/plan, conducting weekly meetings, delegating tasks, tracking progress, reporting updates, leading paper writing/submission, and confirming team members. Requires significant time commitment and technical/project management skills.
- Task Member: Responsible for actively contributing (coding, analysis, experiments), attending weekly meetings, reporting progress, contributing to the paper, and potentially presenting updates. Expected to have relevant technical skills (Python, Git, ML/data analysis basics) and commit sufficient time.
- Goal: One team per competition task, with no overlap in tasks between teams.