GSEA questions - Biology版 - 未名存档

本页内容为未名空间相应帖子的节选和存档，一周内的贴子最多显示50字，超过一周显示500字访问原贴

Biology版 - GSEA questions

相关主题
● Sequencing 还能热多久呢？	● transcriptional factor
● 美华人女科学家重大研究突破:基因影响大脑记忆	● 请教RNA-Seq分析问题
● 紧急求老鼠各种生理特征入门实验速成	● Re: A question about KO
● 新手请教 'Genotype-phenotype correlation’	● 生命之初---CSH小鼠分子胚胎课侧记（十九）
● 求教：杂交F1代就出现形状，是怎样一种情形？	● 我来说两句关于诺贝尔医学奖的背景故事（有证据的野史哦）
● 大家如何看模式动物的high-throuput phenotyping	● [合集] 悬浮培养的cell怎么观察phenotype?
● 好的和差的genetic screen	● [合集] 有没有knock out phenotype database可查
● 应该控制生物新技术的大规模应用	● 求助: conditional knockout mice

相关话题的讨论汇总
话题: gsea话题: metric话题: phenotype话题: replicates话题: ranking

进入Biology版参与讨论

1

(共1页)

c****1 发帖数: 1095	1 1. The default “Metric for ranking genes” is “Signal2Noise”. I ran GSEA, and used this option, and got good results. But when I go through the manual, I found this note “The default metric for ranking genes is the signal-to-noise ratio. To use this metric, your phenotype file must define at least two categorical phenotypes and your expression dataset must contain at least three (3) samples for each phenotype. If you are using a continuous phenotype or your expression dataset contains fewer than three samples per phenotype, you must choose a different ranking metric.” For NGS data, we usually have two replicates, which means we cannot use Signal to noise as metric for ranking genes. Is that true for real practice? 2. Is it better to use whole transcriptome or filtered gene list (based on p value cutoff) as input?
c****1 发帖数: 1095	2 顶一下。没人懂这个么？
f******k 发帖数: 856	3 我只是根据我自己摸索的经验来说，我发现GSEA网站给的解释让人看得都晕晕的。 1. 我主要分析自己的RNAseq数据，我最多有三个样品，control,样品1，样品2，所以我一般都选择log2 ratio的metrics. 另外，你的replicates要merge（比如做average）以后再做input，GSEA不接受 replicates作为input，这一点在Q&A部分都有明确解释。 2. 我也有同样的疑问，我自己用filtered list作为input，但是由于filtered list基因数目要远小于整个transcriptome的基因数目，所以就要调整minimum size, 用默认 15的时候经常是出错，我都要改到10甚至5，才能出结果。我觉得整个transcriptome也可以作为input，因为GSEA会做样品间的互相比较和统计学分析，filtered list反而降低了GSEA进行gene set组合的可能性。但是transciptome 里是明显的垃圾信息或者false positive的基因，我想还是先过滤掉会更好，GSEA结果会更干净。不过我不是这方面的专家，坐等专家解答。
c****1 发帖数: 1095	4 欢迎讨论，多谢你意见。关于样品的数目，其实最好是把 biological replicates都input。你说的把 replicates merge，其实是针对technical replicates。你可以在phenotype labels这一栏，不同一组的replicates分到一组。网站有这段说明： Samples. Each sample must have a unique identifier. If you have technical replicates, you generally want to remove them by averaging or some other data reduction technique. For example, assume you have five tumor samples and five control samples each run three times (three replicate columns) for a total of 30 data columns. You would average the three replicate columns for each sample and create a dataset containing 10 data columns (five tumor and five control). transciptome 【在 f******k 的大作中提到】 : 我只是根据我自己摸索的经验来说，我发现GSEA网站给的解释让人看得都晕晕的。 : 1. 我主要分析自己的RNAseq数据，我最多有三个样品，control,样品1，样品2，所以 : 我一般都选择log2 ratio的metrics. : 另外，你的replicates要merge（比如做average）以后再做input，GSEA不接受 : replicates作为input，这一点在Q&A部分都有明确解释。 : 2. 我也有同样的疑问，我自己用filtered list作为input，但是由于filtered list基 : 因数目要远小于整个transcriptome的基因数目，所以就要调整minimum size, 用默认 : 15的时候经常是出错，我都要改到10甚至5，才能出结果。 : 我觉得整个transcriptome也可以作为input，因为GSEA会做样品间的互相比较和统计学 : 分析，filtered list反而降低了GSEA进行gene set组合的可能性。但是transciptome
G******n 发帖数: 289	5 可以把replicates直接放进去，不需要先merge成mean或者average transciptome 【在 f******k 的大作中提到】 : 我只是根据我自己摸索的经验来说，我发现GSEA网站给的解释让人看得都晕晕的。 : 1. 我主要分析自己的RNAseq数据，我最多有三个样品，control,样品1，样品2，所以 : 我一般都选择log2 ratio的metrics. : 另外，你的replicates要merge（比如做average）以后再做input，GSEA不接受 : replicates作为input，这一点在Q&A部分都有明确解释。 : 2. 我也有同样的疑问，我自己用filtered list作为input，但是由于filtered list基 : 因数目要远小于整个transcriptome的基因数目，所以就要调整minimum size, 用默认 : 15的时候经常是出错，我都要改到10甚至5，才能出结果。 : 我觉得整个transcriptome也可以作为input，因为GSEA会做样品间的互相比较和统计学 : 分析，filtered list反而降低了GSEA进行gene set组合的可能性。但是transciptome
d*******e 发帖数: 1649	6 1 你首先要确保有足够的sample数量。比如你要比较2组，每组2个，一共只有4个 sample，那么GSEA不是很适合，因为误差可能很大。如果一定要做，那么用什么方法都无所谓。不要merge sample。 2 不要filter out transcriptome，你可以选择不同定义的gene sets，就是gmt文件，来选择你要比较的gene是transcriptome里面的。 the contain NGS 【在 c****1 的大作中提到】 : 1. The default “Metric for ranking genes” is “Signal2Noise”. I ran : GSEA, and used this option, and got good results. But when I go through the : manual, I found this note “The default metric for ranking genes is the : signal-to-noise ratio. To use this metric, your phenotype file must define : at least two categorical phenotypes and your expression dataset must contain : at least three (3) samples for each phenotype. If you are using a : continuous phenotype or your expression dataset contains fewer than three : samples per phenotype, you must choose a different ranking metric.” For NGS : data, we usually have two replicates, which means we cannot use Signal to : noise as metric for ranking genes. Is that true for real practice?

1

(共1页)

进入Biology版参与讨论

相关主题
● 求助: conditional knockout mice	● 求教：杂交F1代就出现形状，是怎样一种情形？
● How To Choose a Good Scientific Problem by Uri Alon	● 大家如何看模式动物的high-throuput phenotyping
● 还是个简单统计问题	● 好的和差的genetic screen
● Journal questions	● 应该控制生物新技术的大规模应用
● Sequencing 还能热多久呢？	● transcriptional factor
● 美华人女科学家重大研究突破:基因影响大脑记忆	● 请教RNA-Seq分析问题
● 紧急求老鼠各种生理特征入门实验速成	● Re: A question about KO
● 新手请教 'Genotype-phenotype correlation’	● 生命之初---CSH小鼠分子胚胎课侧记（十九）

相关话题的讨论汇总
话题: gsea话题: metric话题: phenotype话题: replicates话题: ranking

未名新帖统计// 7月16日

#	版面	帖数(主题数)
-	全站	4871 (796)
1	Military	3777 (569)
2	Stock	341 (51)
3	Joke	117 (17)
4	History	116 (3)
5	Automobile	100 (9)
6	USANews	55 (9)
7	Midlife	45 (1)
8	Headline	41 (41)
9	Dreamer	33 (13)
10	FleaMarket	32 (20)
11	Living	30 (7)

* 这里只显示发帖超过25的版面，努力灌水吧:-)