a********r 发帖数: 218 | 1 Open google.com, you type some words in the edit box to search something, it
will return lots of search results. Among these returning results (that is
, website link), you may only CLICK some results that are interesting to you
. The system will record the "CLICK"action. Finally, you will have the
search results (i.e. url) and "CLICK" informatin at hand.
Question: how do you find the similarity of these searching?
That's exactly what the interviewer told me, he didn't want to supply any
additional information.
请大牛指点, 多谢! | k****n 发帖数: 369 | 2 so you have a set of (query => list of search results with "Click" info),
and he want to find the similarities between queries, right?
The first straightforward one is, if Qa => Cx, Qb => Cx, then Qa is similar
to Qb.
Furthermore, the Query and URLs can form a graph, with edges defined as
E(Q, URL) where Query lead to a URL. The URLs can be connected if they
are returned in one query using certain weighting strategy.
After this simplification, we can run an all-source shortest path
in this graph in O(n^3) time, and the distance between Qs can be
treated as the (reverse of) similarities of the queries.
Another way. To vectorize the queries in a big "URL" space, each URL is
a dimension in the vector space, define some weighting strategy. Then calcu
-late the similaries between the URL vectors.
There must also exist other solutions, since this is a very open problem.
it
is
you
【在 a********r 的大作中提到】 : Open google.com, you type some words in the edit box to search something, it : will return lots of search results. Among these returning results (that is : , website link), you may only CLICK some results that are interesting to you : . The system will record the "CLICK"action. Finally, you will have the : search results (i.e. url) and "CLICK" informatin at hand. : Question: how do you find the similarity of these searching? : That's exactly what the interviewer told me, he didn't want to supply any : additional information. : 请大牛指点, 多谢!
| a********r 发帖数: 218 | 3 @Kevinn,
Thanks for your solution.
The interviewer particularly mentioned: some search results clicked, some
not. That is, for each query, the search results can be divided into two
groups: "clicked" and "non-clicked". How do you handle this? assign more
weight to"clicked"?
similar
【在 k****n 的大作中提到】 : so you have a set of (query => list of search results with "Click" info), : and he want to find the similarities between queries, right? : The first straightforward one is, if Qa => Cx, Qb => Cx, then Qa is similar : to Qb. : Furthermore, the Query and URLs can form a graph, with edges defined as : E(Q, URL) where Query lead to a URL. The URLs can be connected if they : are returned in one query using certain weighting strategy. : After this simplification, we can run an all-source shortest path : in this graph in O(n^3) time, and the distance between Qs can be : treated as the (reverse of) similarities of the queries.
| k*j 发帖数: 153 | 4 i think i was asked the same question long time ago when i interviewed with
them
my answer was like:
for two searches, you get something like the following, each is a link, and
they are in order.
search 1: ABCDEF
search 2 : FABEDG
and for each link in the search1, you find its position in search2, and then
calculate the absolute difference of two positions. for example, for link "A", it is
position 0 in search 1, and position 1 in search2. the diff is abs(1-0)=1.
another example, for link "F", in seach1, it is position5, for search2, it is position0,
so diff is abs(0-5) = 5. you compute the abs diff for each link in seach1, if can't find it in search2,
you put N, N is the length of the array results.(in this case is 6). You then sum
up all the differences to get a total score to indicate the similarity between these two searches.
The smaller the more similar they are.
we can also improve the results by using the "click" information, we can put
more weight on the "clicked" links when summing up the score. you can tune
this parameter using some training sets |
|