Skip to the content.

HOMEPAGE

Data Descriptions:

File organization:

  KuaiLive
  ├── streamer.csv          
  ├── room.csv
  ├── user.csv
  ├── comment.csv
  ├── gift.csv
  ├── like.csv
  ├── click.csv
  ├── negative.csv
  └── title_embeddings.npy

1. Descriptions of the fields in streamer.csv

The file streamer.csv contains comprehensive information about all streamers, including streamer_id and associated side information, such as age, country, and a set of binary features.

Field Name Description Type Example
streamer_id The ID of the streamer. int64 56006
gender The gender of the streamer. string M
age The age range of the streamer. string (range) 24-30
country The country where the streamer resides. string China
device_brand The brand of the device used by the streamer. string APPLE
device_price The price range of the device. string (range) 5000+
live_operation_tag The operational category of the streamer. string Relationship
fans_user_num The number of users who have followed the streamer. string (range) 10000-100000
fans_group_fans_num The number of fans from the streamer’s fans group. string (range) 0-10
follow_user_num The number of users followed by the streamer. string (range) 1000-10000
first_live_timestamp The date when the streamer started their first live room. timestamp 2018-02-04
accu_live_cnt The total number of live rooms the streamer has hosted. string (range) 100-500
accu_live_duration The cumulative duration of all live rooms, in milliseconds. string(range) 500000000-1000000000
accu_play_cnt The total number of times the streamer’s live rooms have been viewed. string (range) 100000-500000
accu_play_duration The cumulative duration of all live rooms, in milliseconds. string (range) 50000000000-100000000000
reg_timestamp The date when the streamer registered on the platform. timestamp 2014-07-03
onehot_feat0 A feature with binary values (e.g., 0 or 1). int64 1
onehot_feat1 A feature with binary values (e.g., 0 or 1). int64 0
onehot_feat2 A feature with binary values (e.g., 0 or 1). int64 0
onehot_feat3 A feature with binary values (e.g., 0 or 1). int64 0
onehot_feat4 A feature with binary values (e.g., 0 or 1). int64 0
onehot_feat5 A feature with binary values (e.g., 0 or 1). int64 0
onehot_feat6 A feature with binary values (e.g., 0 or 1). int64 0

2. Descriptions of the fields in room.csv

The file room.csv contains detailed information about all live rooms, including basic fields such as the date, room id, the corresponding streamer, as well as temporal data and categorical descriptors.

Field Name Description Type Example
p_date The date of the live room. int64 20250525
live_id The id of the live room. int64 7336601
streamer_id The id of the streamer. int64 252634
live_type The type the live room, represented as a categorical code. int64 1
start_timestamp The start time of the live room. int64 1748131180878
end_timestamp The end time of the live room. int64 1748275200000
live_content_category The content category of the live room. string shop
live_name_id The id associated with the live room title, used to index the encoded title embeddings. int64 0

3. Descriptions of the fields in user.csv

The file user.csv contains comprehensive information about all users, including the user_id and side information such as age, country, and a set of binary features.

Field Name Description Type Example
user_id The ID of the user. int64 22733
age The age range of the user. string (range) 18-23
gender The gender of the user. string M
country The country where the user resides. string China
device_brand The brand of the device used by the user. string DESKTOP
device_price The price range of the device. string (range) 0
reg_timestamp The date when the user registered on the platform. timestamp 2023-05-03
fans_num The number of fans who have followed the user. string (range) 0-10
follow_num The number of users followed by the user. string (range) 10-100
first_watch_live_timestamp The date when the user first watched a live room. timestamp 2023-05-03
accu_watch_live_cnt The total number of live rooms the user has watched. string (range) 0-50000
accu_watch_live_duration The cumulative duration of all live rooms the user has watched, in milliseconds. string (range) 0-1000000000
is_live_streamer A binary indicator showing whether the user is a live streamer (1 = yes, 0 = no). int64 0
is_photo_author A binary indicator showing whether the user is a photo content author (1 = yes, 0 = no). int64 0
onehot_feat0 A feature with binary values (e.g., 0 or 1). int64 0
onehot_feat1 A feature with binary values (e.g., 0 or 1). int64 0
onehot_feat2 A feature with binary values (e.g., 0 or 1). int64 0
onehot_feat3 A feature with binary values (e.g., 0 or 1). int64 0
onehot_feat4 A feature with binary values (e.g., 0 or 1). int64 0
onehot_feat5 A feature with binary values (e.g., 0 or 1). int64 0
onehot_feat6 A feature with binary values (e.g., 0 or 1). int64 0

4. Descriptions of the fields in comment.csv

The file comment.csv records user interaction in the form of comments. Each record corresponds to a single commenting event and includes the associated user, live room, streamer, and timestamp.

Field Name Description Type Example
user_id The id of the user. int64 23154
live_id The id of the live room. int64 7865151
streamer_id The id of the streamer. int64 433825
timestamp The timestamp when this interaction occurred. int64 1746374400819

5. Descriptions of the fields in gift.csv

The file gift.csv records user interactions in the form of sending gifts. Each record corresponds to a single gifting event and includes the associated user, live room, streamer, timestamp, and the price of the gift.

Field Name Description Type Example
user_id The id of the user. int64 11504
live_id The id of the live room. int64 6086847
streamer_id The id of the streamer. int64 114419
timestamp The timestamp when this interaction occurred. int64 1746374441260
gift_price The total price of gifts sent during this interaction. int64 2

6. Descriptions of the fields in like.csv

The file like.csv records user interaction in the form of liking. Each record corresponds to a single liking event and includes the associated user, live room, streamer, and timestamp.

Field Name Description Type Example
user_id The id of the user. int64 5222
live_id The id of the live room. int64 541927
streamer_id The id of the streamer. int64 244121
timestamp The timestamp when this interaction occurred. int64 1746374414059

7. Descriptions of the fields in click.csv

The file click.csv contains records of user interactions in the form of click-to-watch behavior. Each record corresponds to a single click-to-watching event and includes the user id, the associated live room and streamer, the timestamp of the interaction, and the watch time.

Field Name Description Type Example
user_id The id of the user. int64 8505
live_id The id of the live room. int64 9342705
streamer_id The id of the streamer. int64 392199
timestamp The timestamp when this interaction occurred. int64 1746374400022
watch_live_time The user’s watch time for this interaction, in milliseconds. int64 2852

8. Descriptions of the fields in negative.csv

The file negative.csv contains records of all exposures that were presented to users but not clicked. Each record includes the corresponding user, live room, streamer, and the timestamp of the exposure.

Field Name Description Type Example
user_id The id of the user. int64 9810
live_id The id of the live room. int64 10816308
streamer_id The id of the streamer. int64 17452
timestamp The timestamp when this interaction occurred. int64 1746525926498

9. Descriptions of the fields in title_embeddings.npz

The file title_embeddings.npz contains embeddings of live room titles, obtained by encoding the titles using the bge-base-zh-v1.5 model and applying PCA for dimensionality reduction to 128 dimensions. The embeddings are indexed by live_name_id.