Data Descriptions:
File organization:
KuaiLive
├── streamer.csv
├── room.csv
├── user.csv
├── comment.csv
├── gift.csv
├── like.csv
├── click.csv
├── negative.csv
└── title_embeddings.npy
1. Descriptions of the fields in streamer.csv
The file streamer.csv
contains comprehensive information about all streamers, including streamer_id
and associated side information, such as age
, country
, and a set of binary features.
Field Name | Description | Type | Example |
---|---|---|---|
streamer_id | The ID of the streamer. | int64 | 56006 |
gender | The gender of the streamer. | string | M |
age | The age range of the streamer. | string (range) | 24-30 |
country | The country where the streamer resides. | string | China |
device_brand | The brand of the device used by the streamer. | string | APPLE |
device_price | The price range of the device. | string (range) | 5000+ |
live_operation_tag | The operational category of the streamer. | string | Relationship |
fans_user_num | The number of users who have followed the streamer. | string (range) | 10000-100000 |
fans_group_fans_num | The number of fans from the streamer’s fans group. | string (range) | 0-10 |
follow_user_num | The number of users followed by the streamer. | string (range) | 1000-10000 |
first_live_timestamp | The date when the streamer started their first live room. | timestamp | 2018-02-04 |
accu_live_cnt | The total number of live rooms the streamer has hosted. | string (range) | 100-500 |
accu_live_duration | The cumulative duration of all live rooms, in milliseconds. | string(range) | 500000000-1000000000 |
accu_play_cnt | The total number of times the streamer’s live rooms have been viewed. | string (range) | 100000-500000 |
accu_play_duration | The cumulative duration of all live rooms, in milliseconds. | string (range) | 50000000000-100000000000 |
reg_timestamp | The date when the streamer registered on the platform. | timestamp | 2014-07-03 |
onehot_feat0 | A feature with binary values (e.g., 0 or 1). | int64 | 1 |
onehot_feat1 | A feature with binary values (e.g., 0 or 1). | int64 | 0 |
onehot_feat2 | A feature with binary values (e.g., 0 or 1). | int64 | 0 |
onehot_feat3 | A feature with binary values (e.g., 0 or 1). | int64 | 0 |
onehot_feat4 | A feature with binary values (e.g., 0 or 1). | int64 | 0 |
onehot_feat5 | A feature with binary values (e.g., 0 or 1). | int64 | 0 |
onehot_feat6 | A feature with binary values (e.g., 0 or 1). | int64 | 0 |
2. Descriptions of the fields in room.csv
The file room.csv
contains detailed information about all live rooms, including basic fields such as the date, room id, the corresponding streamer, as well as temporal data and categorical descriptors.
Field Name | Description | Type | Example |
---|---|---|---|
p_date | The date of the live room. | int64 | 20250525 |
live_id | The id of the live room. | int64 | 7336601 |
streamer_id | The id of the streamer. | int64 | 252634 |
live_type | The type the live room, represented as a categorical code. | int64 | 1 |
start_timestamp | The start time of the live room. | int64 | 1748131180878 |
end_timestamp | The end time of the live room. | int64 | 1748275200000 |
live_content_category | The content category of the live room. | string | shop |
live_name_id | The id associated with the live room title, used to index the encoded title embeddings. | int64 | 0 |
3. Descriptions of the fields in user.csv
The file user.csv
contains comprehensive information about all users, including the user_id
and side information such as age
, country
, and a set of binary features.
Field Name | Description | Type | Example |
---|---|---|---|
user_id | The ID of the user. | int64 | 22733 |
age | The age range of the user. | string (range) | 18-23 |
gender | The gender of the user. | string | M |
country | The country where the user resides. | string | China |
device_brand | The brand of the device used by the user. | string | DESKTOP |
device_price | The price range of the device. | string (range) | 0 |
reg_timestamp | The date when the user registered on the platform. | timestamp | 2023-05-03 |
fans_num | The number of fans who have followed the user. | string (range) | 0-10 |
follow_num | The number of users followed by the user. | string (range) | 10-100 |
first_watch_live_timestamp | The date when the user first watched a live room. | timestamp | 2023-05-03 |
accu_watch_live_cnt | The total number of live rooms the user has watched. | string (range) | 0-50000 |
accu_watch_live_duration | The cumulative duration of all live rooms the user has watched, in milliseconds. | string (range) | 0-1000000000 |
is_live_streamer | A binary indicator showing whether the user is a live streamer (1 = yes, 0 = no). | int64 | 0 |
is_photo_author | A binary indicator showing whether the user is a photo content author (1 = yes, 0 = no). | int64 | 0 |
onehot_feat0 | A feature with binary values (e.g., 0 or 1). | int64 | 0 |
onehot_feat1 | A feature with binary values (e.g., 0 or 1). | int64 | 0 |
onehot_feat2 | A feature with binary values (e.g., 0 or 1). | int64 | 0 |
onehot_feat3 | A feature with binary values (e.g., 0 or 1). | int64 | 0 |
onehot_feat4 | A feature with binary values (e.g., 0 or 1). | int64 | 0 |
onehot_feat5 | A feature with binary values (e.g., 0 or 1). | int64 | 0 |
onehot_feat6 | A feature with binary values (e.g., 0 or 1). | int64 | 0 |
4. Descriptions of the fields in comment.csv
The file comment.csv
records user interaction in the form of comments. Each record corresponds to a single commenting event and includes the associated user, live room, streamer, and timestamp.
Field Name | Description | Type | Example |
---|---|---|---|
user_id | The id of the user. | int64 | 23154 |
live_id | The id of the live room. | int64 | 7865151 |
streamer_id | The id of the streamer. | int64 | 433825 |
timestamp | The timestamp when this interaction occurred. | int64 | 1746374400819 |
5. Descriptions of the fields in gift.csv
The file gift.csv
records user interactions in the form of sending gifts. Each record corresponds to a single gifting event and includes the associated user, live room, streamer, timestamp, and the price of the gift.
Field Name | Description | Type | Example |
---|---|---|---|
user_id | The id of the user. | int64 | 11504 |
live_id | The id of the live room. | int64 | 6086847 |
streamer_id | The id of the streamer. | int64 | 114419 |
timestamp | The timestamp when this interaction occurred. | int64 | 1746374441260 |
gift_price | The total price of gifts sent during this interaction. | int64 | 2 |
6. Descriptions of the fields in like.csv
The file like.csv
records user interaction in the form of liking. Each record corresponds to a single liking event and includes the associated user, live room, streamer, and timestamp.
Field Name | Description | Type | Example |
---|---|---|---|
user_id | The id of the user. | int64 | 5222 |
live_id | The id of the live room. | int64 | 541927 |
streamer_id | The id of the streamer. | int64 | 244121 |
timestamp | The timestamp when this interaction occurred. | int64 | 1746374414059 |
7. Descriptions of the fields in click.csv
The file click.csv
contains records of user interactions in the form of click-to-watch behavior. Each record corresponds to a single click-to-watching event and includes the user id, the associated live room and streamer, the timestamp of the interaction, and the watch time.
Field Name | Description | Type | Example |
---|---|---|---|
user_id | The id of the user. | int64 | 8505 |
live_id | The id of the live room. | int64 | 9342705 |
streamer_id | The id of the streamer. | int64 | 392199 |
timestamp | The timestamp when this interaction occurred. | int64 | 1746374400022 |
watch_live_time | The user’s watch time for this interaction, in milliseconds. | int64 | 2852 |
8. Descriptions of the fields in negative.csv
The file negative.csv
contains records of all exposures that were presented to users but not clicked. Each record includes the corresponding user, live room, streamer, and the timestamp of the exposure.
Field Name | Description | Type | Example |
---|---|---|---|
user_id | The id of the user. | int64 | 9810 |
live_id | The id of the live room. | int64 | 10816308 |
streamer_id | The id of the streamer. | int64 | 17452 |
timestamp | The timestamp when this interaction occurred. | int64 | 1746525926498 |
9. Descriptions of the fields in title_embeddings.npz
The file title_embeddings.npz
contains embeddings of live room titles, obtained by encoding the titles using the bge-base-zh-v1.5
model and applying PCA for dimensionality reduction to 128 dimensions. The embeddings are indexed by live_name_id
.