某大公司面试题 - JobHunting版 - 未名存档

本页内容为未名空间相应帖子的节选和存档，一周内的贴子最多显示50字，超过一周显示500字访问原贴

JobHunting版 - 某大公司面试题

相关主题
● most clicked urls in the last 5 mins, 1hr, 24 hrs?	● 问一道面试题，现在好像很流行这种题
● 新鲜面试题	● 面试题分享：把数字翻译成字符串
● 在线紧急求助一道system design面试题，面经内附	● 面试题
● 贡献一下：本版上搜集的 Google 面试题	● 贡献两道google面试题
● 请教一个面试题，在大量URL中找一个有wildcard的pattern	● Data scientist / Machine Learning Engineer 相关面试题
● 面试题，大规模url求重复讨论	● CS 面试题总结(1)
● 请问一个面试题："System described. Blank page returned. How would you test it?"	● Amazon Second phone
● G面试题请教	● 贴华人版程序员简历，大家帮忙拍砖成印度版

相关话题的讨论汇总
话题: results话题: click话题: search话题: url话题: search2

进入JobHunting版参与讨论

1

(共1页)

a********r 发帖数: 218	1 Open google.com, you type some words in the edit box to search something, it will return lots of search results. Among these returning results (that is , website link), you may only CLICK some results that are interesting to you . The system will record the "CLICK"action. Finally, you will have the search results (i.e. url) and "CLICK" informatin at hand. Question: how do you find the similarity of these searching? That's exactly what the interviewer told me, he didn't want to supply any additional information. 请大牛指点，多谢！
k****n 发帖数: 369	2 so you have a set of (query => list of search results with "Click" info), and he want to find the similarities between queries, right? The first straightforward one is, if Qa => Cx, Qb => Cx, then Qa is similar to Qb. Furthermore, the Query and URLs can form a graph, with edges defined as E(Q, URL) where Query lead to a URL. The URLs can be connected if they are returned in one query using certain weighting strategy. After this simplification, we can run an all-source shortest path in this graph in O(n^3) time, and the distance between Qs can be treated as the (reverse of) similarities of the queries. Another way. To vectorize the queries in a big "URL" space, each URL is a dimension in the vector space, define some weighting strategy. Then calcu -late the similaries between the URL vectors. There must also exist other solutions, since this is a very open problem. it is you 【在 a********r 的大作中提到】 : Open google.com, you type some words in the edit box to search something, it : will return lots of search results. Among these returning results (that is : , website link), you may only CLICK some results that are interesting to you : . The system will record the "CLICK"action. Finally, you will have the : search results (i.e. url) and "CLICK" informatin at hand. : Question: how do you find the similarity of these searching? : That's exactly what the interviewer told me, he didn't want to supply any : additional information. : 请大牛指点，多谢！
a********r 发帖数: 218	3 @Kevinn, Thanks for your solution. The interviewer particularly mentioned: some search results clicked, some not. That is, for each query, the search results can be divided into two groups: "clicked" and "non-clicked". How do you handle this? assign more weight to"clicked"? similar 【在 k****n 的大作中提到】 : so you have a set of (query => list of search results with "Click" info), : and he want to find the similarities between queries, right? : The first straightforward one is, if Qa => Cx, Qb => Cx, then Qa is similar : to Qb. : Furthermore, the Query and URLs can form a graph, with edges defined as : E(Q, URL) where Query lead to a URL. The URLs can be connected if they : are returned in one query using certain weighting strategy. : After this simplification, we can run an all-source shortest path : in this graph in O(n^3) time, and the distance between Qs can be : treated as the (reverse of) similarities of the queries.
k*j 发帖数: 153	4 i think i was asked the same question long time ago when i interviewed with them my answer was like: for two searches, you get something like the following, each is a link, and they are in order. search 1: ABCDEF search 2 : FABEDG and for each link in the search1, you find its position in search2, and then calculate the absolute difference of two positions. for example, for link "A", it is position 0 in search 1, and position 1 in search2. the diff is abs(1-0)=1. another example, for link "F", in seach1, it is position5, for search2, it is position0, so diff is abs(0-5) = 5. you compute the abs diff for each link in seach1, if can't find it in search2, you put N, N is the length of the array results.(in this case is 6). You then sum up all the differences to get a total score to indicate the similarity between these two searches. The smaller the more similar they are. we can also improve the results by using the "click" information, we can put more weight on the "clicked" links when summing up the score. you can tune this parameter using some training sets

1

(共1页)

进入JobHunting版参与讨论

相关主题
● 贴华人版程序员简历，大家帮忙拍砖成印度版	● 请教一个面试题，在大量URL中找一个有wildcard的pattern
● g电面，新鲜面经	● 面试题，大规模url求重复讨论
● 一道大数据的题，讨论一下	● 请问一个面试题："System described. Blank page returned. How would you test it?"
● 面试: Take home project	● G面试题请教
● most clicked urls in the last 5 mins, 1hr, 24 hrs?	● 问一道面试题，现在好像很流行这种题
● 新鲜面试题	● 面试题分享：把数字翻译成字符串
● 在线紧急求助一道system design面试题，面经内附	● 面试题
● 贡献一下：本版上搜集的 Google 面试题	● 贡献两道google面试题

相关话题的讨论汇总
话题: results话题: click话题: search话题: url话题: search2

未名新帖统计// 7月16日

#	版面	帖数(主题数)
-	全站	4871 (796)
1	Military	3777 (569)
2	Stock	341 (51)
3	Joke	117 (17)
4	History	116 (3)
5	Automobile	100 (9)
6	USANews	55 (9)
7	Midlife	45 (1)
8	Headline	41 (41)
9	Dreamer	33 (13)
10	FleaMarket	32 (20)
11	Living	30 (7)

* 这里只显示发帖超过25的版面，努力灌水吧:-)