CCLE數據庫幾個知識點

Posted on 2016年1月11日

發表ccle的文獻：http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3320027/

Here we describe the Cancer Cell Line Encyclopedia (CCLE): a compilation of gene expression, chromosomal copy number, and massively parallel sequencing data from 947 human cancer cell lines.

收集了三種數據：

The mutational status of >1,600 genes was determined by targeted massively parallel sequencing, followed by removal of variants likely to be germline events .

Moreover, 392 recurrent mutations affecting 33 known cancer genes were assessed by mass spectrometric genotyping13 .

DNA copy number was measured using high-density single nucleotide polymorphism arrays (Affymetrix SNP 6.0; Supplementary Methods).

Finally, mRNA expression levels were obtained for each of the lines using Affymetrix U133 plus 2.0 arrays.

These data were also used to confirm cell line identities .

一般用得最多的就是表達數據，因為表達數據最簡單，大多數生物信息學分析著只會用這個數據！

而它的突變數據又不是通常意義的高通量測序得到的，snp6芯片數據很多人聽都沒聽過

文章的附件有對cell lines的具體描述。

CCLE的數據在broad institute里面可以下載，也放在GEO數據庫里面，我比較喜歡GEO里面的數據

http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE36139

This SuperSeries is composed of the following SubSeries:

GSE36133 Expression data from the Cancer Cell Line Encyclopedia (CCLE)

GSE36138 SNP array data from the Cancer Cell Line Encyclopedia (CCLE)

GSE36133這個study的metadata里面有對每個cellline來源的cancer進行描述！

有人喜歡把這個metadata叫做是clinical data。

library(GEOquery)

ccleFromGEO <- getGEO("GSE36133")

annotBlock1 <- pData(phenoData(ccleFromGEO[[1]]))

>dim(annotBlock1)

[1] 917 38

exprSet=exprs(ccleFromGEO[[1]])

> dim(exprSet)

[1] 18926 917

##它的表達數據矩陣，包含了18926個基因，列名是917個細胞系的名字，行是基因的entrez ID

keyColumns <- c("title", "source_name_ch1", "characteristics_ch1", "characteristics_ch1.1",

"characteristics_ch1.2")

options(stringsAsFactors = F)

allAnnot=annotBlock1[,keyColumns]

##這幾列信息是比較重要的metadata，里面詳細記錄了細胞系的收集公司單位，tissue，癌癥分類等信息

Cell line （1035個細胞系簡介）Gene Sets

1035 sets of genes with high or low expression in each cell line relative to other cell lines from the CCLE Cell Line Gene Expression Profiles dataset.

http://amp.pharm.mssm.edu/Harmonizome/dataset/CCLE+Cell+Line+Gene+Expression+Profiles

一些關于CCLE數據庫的文章：

http://cancerres.aacrjournals.org/content/73/8_Supplement/2409.short

http://cancerres.aacrjournals.org/content/74/22/6390.short

https://clincancerres.aacrjournals.org/content/19/19_Supplement/IA2.abstract

http://onlinelibrary.wiley.com/doi/10.1002/cncy.21471/pdf 介紹了幾個類似的數據庫資源

http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0088557 講解了high/low的知識

http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=7060697 藥物相關

Anticancer drug sensitivity analysis: An integrated approach applied to Erlotinib sensitivity prediction in the CCLE database

http://biorxiv.org/content/biorxiv/early/2015/10/02/028159.full.pdf 比較了CCLE和TCGA的數據

本站僅提供存儲服務，所有內容均由用戶發布，如發現有害或侵權內容，請點擊舉報。

精品伊人久久大香线蕉,开心久久婷婷综合中文字幕,杏田冲梨,人妻无码aⅴ不卡中文字幕