请教 query - Database版 - 未名存档

本页内容为未名空间相应帖子的节选和存档，一周内的贴子最多显示50字，超过一周显示500字访问原贴

Database版 - 请教 query

相关主题
● 我一直以为beijing是个mm	● MySQL Question
● sql面试题1	● mysql challenge
● 用SSIS EXPORT 到 EXCEL 2010 有2000个COLUMN,可能吗	● Re: question about CAST and Multiset
● 问一个query	● ORA0-01422
● 问一个linq to sql编程问题	● [转载] Hehe,不好意思,another question:)
● generate unique integer ID from columns in SQL table (转载	● Access database size
● BJ，那个COLUMNSTORE INDEX是不是BITMAP INDEX啊	● how to get the result in the middle of resultset?
● 请教个VBA问题	● what is optimizer statistics in Oracle?

相关话题的讨论汇总
话题: hobby话题: hobbies话题: users话题: age话题: table

进入Database版参与讨论

1

(共1页)

g*********n 发帖数: 43	1 Two tables, the first one "Users" saves the user name and some properties of the user, age, gender, . . and the last column is "hobbies", to save the hobbies for this particular user. Since hobbies can be quite diverse, in this table we put an id of the hobbies. name age gender hobby_id A 20 M 1 B 45 F 1 C 52 M 2 . . . And we define another table to map this id into real hobbies: hobby_id hobby 1 swimming 1 pingpong 1 soccer . . . other hobbies associated with id "1" 2 badminton 2 travel 2 pingpong (a hobby can be associated with multiple id) . . . other hobbies associated with id "2" 3 TV 3 movie . . . Requirement: 1. for all combinations of (age, gender), return how many users have the corresponding age and gender. This can be achieved by "select age, gender, count() from Users group by 1, 2;" 2. for all combinations of (age, hobby), return how many users have the corresponding age and hobby. Theoretically, this can be achieved by select age, hobbies.hobby, count() from Users join hobbies on Users.hobby_id = hobbies.hobby_id group by 1,2; But the problem is, the join is just too expensive: (1) the Users table has 200k rows (2) each hobby_id may map to 200 hobbies thus the join will result in 200k * 200 = 40 million rows and causes connection timeout. It looks like we cannot use one single query to return all the results, and have to break down the query into smaller ones?
d***e 发帖数: 793	2 Can you do this? select a.age, hobbies.hobby, a.ct ( select age,hobby_id, count(*) ct from Users group by 1,2; )a join hobbies on a.hobby_id = hobbies.hobby_id
g*********n 发帖数: 43	3 Good point. The subquery select age,hobby_id, count() ct from Users group by 1,2; can compress table Users into a, then we join "a" and "hobbies". This indeed helps. But the thing is, even after the compression, table a still has 25k rows:), and the join is still 25k 200 = 5 million rows. 【在 d**e 的大作中提到】 : Can you do this? : select a.age, hobbies.hobby, a.ct : ( : select age,hobby_id, count() ct : from Users : group by 1,2; : )a : join hobbies : on a.hobby_id = hobbies.hobby_id
i*****w 发帖数: 75	4 I am a bit confused by the question: If multiple hobbies can share the same hobby_id, then, for name A, it's hobby_id =1 in the user table, does that mean person A has all the hobbies defined in the hobby table with hobby_id =1? If so, then person A and B have exactly the same hobbies? Based on your design, then when you join your user table and the hobby table by hobby_id, if one hobby_id has n hobbies, then for a given user, you will have n rows of data, which caused the CROSS JOIN effect. Since you mentioned that one hobby can belong to multiple hobby_id, so it makes me feel like that you actually wanted to find Hobby Groups. For example, Person A and B share the same hobbies of swimming, pingpong and soccer, so you may try to store the data like below: User table: User Age Gender HobbyGroupID A 20 M 100 B 45 F 100 C 52 M 101 Then have a HobbyGroup table: HobbyGroupID HobbyGroupTypeDes 100 Hobby Group 100 101 Hobby Group 101 and a Hobby table: HobbyID HobbyDescription 1 Swimming 2 Pingpong 3 Soccer 4 Travel Then you can have a bridge table: HobbyGroupID HobbyID 100 1 100 2 100 3 101 1 101 3 Note that HobbyID 1 and 3 are in multiple HobbyGroups in the previous bridge table. Doing this way, you can group by in the user table, then make the calculations in the bridge table, and finally put these two together. Just my 2 cents. of 【在 g*********n 的大作中提到】 : Two tables, the first one "Users" saves the user name and some properties of : the user, age, gender, . . and the last column is "hobbies", to save the : hobbies for this particular user. Since hobbies can be quite diverse, in : this table we put an id of the hobbies. : name age gender hobby_id : A 20 M 1 : B 45 F 1 : C 52 M 2 : . . . : And we define another table to map this id into real hobbies:
p********n 发帖数: 11	5 HobbyGroup is kind of no much use. Why not simply create a user table and a hobby table, and create a (mapping) table to link a user with hobbies? User Hobby UserHobby
P********8 发帖数: 12	6 discretizing age might help, such as 21-25,26-30..... anyway it still depends on what's the business requirements. 【在 g********n 的大作中提到】 : Good point. The subquery : select age,hobby_id, count() ct : from Users : group by 1,2; : can compress table Users into a, then we join "a" and "hobbies". : This indeed helps. But the thing is, even after the compression, table a : still has 25k rows:), and the join is still 25k * 200 = 5 million rows.

1

(共1页)

进入Database版参与讨论

相关主题
● what is optimizer statistics in Oracle?	● 问一个linq to sql编程问题
● a question for JDBC	● generate unique integer ID from columns in SQL table (转载
● about joins	● BJ，那个COLUMNSTORE INDEX是不是BITMAP INDEX啊
● Question 3: merging data, getting rid of dup?	● 请教个VBA问题
● 我一直以为beijing是个mm	● MySQL Question
● sql面试题1	● mysql challenge
● 用SSIS EXPORT 到 EXCEL 2010 有2000个COLUMN,可能吗	● Re: question about CAST and Multiset
● 问一个query	● ORA0-01422

相关话题的讨论汇总
话题: hobby话题: hobbies话题: users话题: age话题: table

未名新帖统计// 7月16日

#	版面	帖数(主题数)
-	全站	4871 (796)
1	Military	3777 (569)
2	Stock	341 (51)
3	Joke	117 (17)
4	History	116 (3)
5	Automobile	100 (9)
6	USANews	55 (9)
7	Midlife	45 (1)
8	Headline	41 (41)
9	Dreamer	33 (13)
10	FleaMarket	32 (20)
11	Living	30 (7)

* 这里只显示发帖超过25的版面，努力灌水吧:-)