目前常用的有三种方法:
- Compass
- scFEA
- scmetabolism: 相当于把KEGG和Reactome里面的代谢相关基因集找出来做类似于AUCell或者GSVA,比较简单,仅能用于人类数据,小鼠数据需要做同源转化
参考
Compass
- 发表于2021年Cell,可信度最高
简单运行Compass可以只设置3个参数,包含表达矩阵(--data)、进程个数(--num-processes)、物种(--species,当前支持的物种有人和小鼠。设置物种后,计算中会自动使用与对应物种适配的基因名),其余参数可以先使用默认值,后续可以根据研究需要进行修改。需要注意的是,Compass计算时间相对较长,每个细胞大约需要30分钟,而另一个用于单细胞代谢预测的工具——scFEA[2]计算速度相对快得多,例如使用8个线程预测100个细胞的测试数据大约只需要1分钟。因此针对较大的数据集,Compass可以通过微池(--microcluster-size)将细胞划分为簇,再以簇的平均值表征该簇,这里也可以使用其他类似方法,例如metaCell。此外,根据实际研究问题,需要考虑是否将不同表型的细胞分别进行微池。
- Github
- install
python3.9 -m pip install git+https://github.com/yoseflab/Compass.git --upgrade
在下载时出现了如下问题:
error: RPC failed; result=35, HTTP code = 0
解决方法:git config --global http.postBuffer 5242880000
成功下载之后会发现错误ModuleNotFoundError: No module named 'cplex'
需要安装cplex
官网教程上cplex的安装较为复杂
第一次我直接pip3.9 install cplex
,看起来安装成功了,但后续出错CplexSolverError: CPLEX Error 1016: Community Edition. Problem size limits exceeded. Purchase at http://ibm.biz/error1016.
看来pip直接下载的是社区版本,而Compass需要academic版本
Compass requires the full edition of CPLEX, which is free for academic use.
看了一下,不知道等IBM通过academic认证要多久,在某宝买了一个
因为我的Python为3.9,而cplex只有3.8对应的20.1版本,所以需要创建虚拟环境(需要再有java的基础上进行安装)
- Tutorial
Broadly speaking, Compass takes in a gene expression matrix scaled for library depth (e.g., CPM), and outputs a penalty reaction matrix, whereby higher scores correspond to a reaction being less likely.
输入文件为Scale之后的表达矩阵
The input gene expression matrix can be either a tab-delimited text file (tsv) or a matrix market format (mtx) containing gene expression estimates (CPM, TPM, or similar scaled units) with one row per gene, one column per sample.
Tab-delimited files need row and column labels corresponding to genes and sample names. Market matrix formats need a separate tab delimited file of of gene names and optionally a tab delimited file of cell names.
由于像之前说的,Compass非常耗时,所以可以进行Micropooling或者metacell
GitHub - tanaylab/metacells: metacells - Single-cell RNA Sequencing Analysis,基于python版本
the metacell approach groups together profiles of the "same" biological state into groups of cells of the "same" biological state, with the minimal number of profiles needed for computing robust statistics (in particular, mean gene expression). Each such group is a single "metacell".
By summing profiles of cells of the "same" state together, each metacell greatly reduces the sampling variance, and provides a more robust estimation of the transcription state. Note a metacell is not a cell type (multiple metacells may belong to the same "type", or even have the "same" state, if the data sufficiently over-samples this state). Also, a metacell is not a parametric model of the cell state. It is merely a more robust description of some cell state.
metacells Vignette — metacells 0.8.0 documentation
看了一下metacell的教程,感觉更适合在大规模数据分析前使用
Compass示范教程
Compass Micropooled Analysis — Compass 2021 documentation (yoseflab.github.io)
Compass/Demo.ipynb at docs · YosefLab/Compass · GitHub
Depending on the research question, it could make sense to micropool discrete phenotypes separately. This will result in micrpools made of only WT or KO cells, for example, but may conceal some of the overlapping cellular programs between the two.
Debug大全
Micropooling时出错
TypeError("cannot pickle 'SwigPyObject' object")
,把num_process
变少一点,解决,为多进程问题Micropooling时出错
File "/home/user/test/miniconda3/envs/compass/lib/python3.8/site-packages/compass/compass/microclustering.py", line 181, in pool_matrix_cols
return data.append(groups).T.groupby("compass_microcluster").mean().T.rename(mapper=lambda x: 'cluster_'+str(int(x)), axis=1)
File "/home/user/test/miniconda3/envs/compass/lib/python3.8/site-packages/pandas/core/generic.py", line 5989, in __getattr__
return object.__getattribute__(self, name)
AttributeError: 'Dataframe' object has no attribute 'append'
找到对应的文件
把return data.append(groups).T.groupby("compass_microcluster").mean().T.rename(mapper=lambda x: 'cluster_'+str(int(x)), axis=1)
改为return data._append(groups).T.groupby("compass_microcluster").mean().T.rename(mapper=lambda x: 'cluster_'+str(int(x)), axis=1)
- 注意,sklearn的
FutureWarning
不是Error,不用管