EDTT Dataset

a Clinically-Grounded and Multimodal Feature Annotated Dataset of Pro-Eating Disorder Content on TikTok

About

TikTok has emerged as a key venue for the circulation of pro-eating disorder (pro-ED) content. Identifying such content remains challenging as it is often deliberately elusive and conveyed through multimodal cues that go beyond text. Existing datasets of ED-related TikTok posts rely primarily on hashtags, descriptions, or titles for identification, leading to potentially irrelevant or non-representative samples. To address this gap, we introduce EDTT, the first clinically grounded dataset of pro-ED TikTok posts annotated for multimodal features, curated through text-based retrieval and recommendation-based surfacing.

For more information, refer to our abstract, which was presented at the 47th Annual Meeting & Scientific Sessions of the Society of Behavioral Medicine and published in the 2026 SBM Annual Meeting Supplement issue of Annals of Behavioral Medicine (Abstract citation ID: kaag012.427).

πŸ† 2026 Runner-Up Trainee Digital Health Abstract Award, Digital Health SIG, Society of Behavioral Medicine

Our Curation Methods

Curation Methods

Data Dictionary

Item Definition Availability
Post ID TikTok post identifier (URL structure: https://www.tiktok.com/@user/type/videoid) private
Type Video or photo slideshow public
Post Description Description of post private
Transcript Transcription of spoken words or lyrics of song playing over post private
On Screen Text Text overlayed on screen of post private
Summary Summary of what's happening in the post public
EDE Subscale Items Relevant eating disorder psychopathology items based on the Eating Disorder Examination (EDE) subscales public
EDE Diagnostic Items Relevant eating disorder psychopathology items based on diagnostic features described in the Eating Disorder Examination (EDE) public
Visual Elements Visual content annotations public
Audio Elements A description of the post's audio public
Audio Tempo (BPM) The tempo of the posts' audio (in BPM) public
Audio Valence A heuristic measure of emotional positivity or pleasantness of the music (0 = negative/sad, 1 = positive/happy) public
Audio Valence Label <0.33 = low; 0.33–0.66 = medium; >0.66 = high public
Audio Arousal A heuristic measure of energy, intensity, or emotional activation of the music (0 = calm/low-energy, 1 = energetic/high-intensity) public
Audio Arousal Label <0.33 = low; 0.33–0.66 = medium; >0.66 = high public
Harmonic Ratio Proportion of harmonic sound vs. percussive sound (higher values = more melodic; lower values = less melodic) public
Mode Major major = 1; minor = 0 public

Dataset

Loading data…

Explore the Data

Loading data…

Private Variables

The current, public-facing EDTT dataset is available for immediate download and use. For privacy preservation, the full unpublished dataset includes private variables which are not publicly available, in accordance with the TikTok Research API Terms of Service. No identifiable information is included in the dataset to protect the privacy of TikTok users, and no attempts were made to identify individual users or access private data. Please contact us if you have any questions.

How to Cite

Dataset

Shaveet, E., Viranda, T., & Choudhury, T. (2026). EDTT Dataset [Dataset]. Zenodo. https://doi.org/10.5281/zenodo.18895017

Abstract

Shaveet, E., Viranda, T., & Choudhury, T. (2026). EDTT: A Clinically-Grounded and Multimodal Feature Annotated Dataset of Pro-Eating Disorder Content on TikTok. In 2026 SBM Annual Meeting Abstracts Supplement. Annals of Behavioral Medicine, 60(Supplement 1), kaag012.427. https://doi.org/10.1093/abm/kaag012

What's Next?

We have a preprint out describing our next project. Check it out on arXiv.