Home Publications Patents Presentations PRES


YouTube NBA Collection


This is a collection of 61,340 video pages from YouTube on NBA. Video pages are saved as XML files that contains all information about the video (title, tags, description, length, comments ...). The video pages were crawled on the 3rd of March 2009, which makes all video pages has the online information about the video till this date.

Associated with the data collection, there are a list of 40 topics with their relevance assessment. In addition, about 250,000 user profiles are provided. These profiles are for all users interacted with the video on the web by posting the video or commenting on it.

For more information about the collection, please refer to the published paper in LREC 2010.





NBA Collection


A collection of 61,340 XML files. Each XML file represents a video page on YouTube about an NBA video. The XML file name is the video ID, and it contains all the information about the video, which are:
1. Title, tags, description.
2. Titles and links to the top 20 related videos
3. Titles and links to up to the first 20 responded videos
4. Up to the first 500 comments on the video with the user ID of each comment
5. Numeric metadata such as video length, rating, number of views, comments, favorited, responses.
6. Other metadata, such as posting data, downloading data, posting user ID

Topics + Relevance Assessments


A list of 40 topics on NBA, and the relevance assessment of each topic.
Topics are saved in an XML file, and each topic has a number as an identifier, title, description, and narrative.
Relevance assessment is a txt file, that contains all relevant documents for each topic

Users Profiles


245,545 user profiles. The profiles are for all user IDs that appeared in the collection, which can be the users who posted videos, or the users who commented on the videos.
Each profile is saved in an XML file, and contains some information about each user, such as age, location, interests, joining date, number of subscribers to this user, number of video uploaded, ... etc.

List of video IDs


This is a text file that contains a list of all the video IDs in the collection. This file contains also the information of video pages in the collection that have been removed from the web (YouTube) for a reason of terms of use violation or removing by the user. The list identify more than 12k videos to be removed from the web among the 61k video pages collection



W. Magdy, J. Min, J. Leveling, and G. J. F. Jones. Building a Domain-Specific Document Collection for Evaluating Metadata Effect on Information Retrieval. LREC 2010




[ Home | Publications | Patents | Presentations]


Last Modified: March 2010