BrainCellR
provides researchers with a powerful and user-friendly package for efficient cell type classfication and nomination of single-cell transcriptomic data in the brain.
BrainCellR
process comprises the following steps:
In this tutorial, we will be using data from the mouse M1 cortex, as presented in Bakken TE et al.’s 2021 study, ‘Comparative Cellular Analysis of Motor Cortex in Human, Marmoset and Mouse’, published in Nature.
To expedite the tutorial, we have extracted only the first 1000 cells.
The input data should consist of single-cell transcriptome data, arranged with genes in rows and cells in columns. The clustering method recommends that the input data be log-transformed. Using a sparse matrix can help speed up the process.
library(scrattch.hicat)
library(BrainCellR)
##
library(Matrix)
library(matrixStats)
data = read.csv("G:/PFC/package_example/data.csv",row.names=1)
data = data[,1:1000]
print(data[1:5,1:5])
## mouse_pBICCNsMMrMOpRAiF007d190314_CAGGTATAGGTAGGCT
## 0610009D07Rik 1
## 0610009O20Rik 0
## 0610010F05Rik 2
## 0610010K14Rik 0
## 0610030E20Rik 0
## mouse_pBICCNsMMrMOpRMiM008d190320_CCGCAAGAGAACAGGA
## 0610009D07Rik 4
## 0610009O20Rik 0
## 0610010F05Rik 5
## 0610010K14Rik 0
## 0610030E20Rik 0
## mouse_pBICCNsMMrMOpRMiM008d190320_AACCAACAGTCGTTAC
## 0610009D07Rik 3
## 0610009O20Rik 0
## 0610010F05Rik 2
## 0610010K14Rik 0
## 0610030E20Rik 0
## mouse_pBICCNsMMrMOpRMiM008d190320_ATGGATCGTTTCCATT
## 0610009D07Rik 0
## 0610009O20Rik 0
## 0610010F05Rik 1
## 0610010K14Rik 0
## 0610030E20Rik 0
## mouse_pBICCNsMMrMOpRPiF008d190314_GTTCCGTGTCCAATCA
## 0610009D07Rik 1
## 0610009O20Rik 1
## 0610010F05Rik 0
## 0610010K14Rik 0
## 0610030E20Rik 0
print(dim(data))
## [1] 13888 1000
data = as.matrix(data)
norm.dat <- cpm(data)
norm.dat <- log2(norm.dat+1)
norm.dat <- Matrix(norm.dat, sparse = TRUE)
The parameters of de_param() function ban be seen from scrattch.hicat package(https://github.com/AllenInstitute/scrattch.hicat/tree/master).
padj.th: P-value after fdr adjustment .
q1.th: Proportion threshold of cells in current cell type.
q2.th: Proportion threshold of cells in other cell type.
de.score: Threshold for gene selection. min.cells: The minimum number of cells in each cluster.
set.seed(3456)
de.param <- de_param(padj.th=0.01,
q1.th = 0.4,
q.diff.th = 0.7,
de.score.th = 100,
min.cells=20)
file_path_root = "G:/PFC/package_example/"
dir.create(paste0(file_path_root,"run_cluster_m"))
result_1 <- RunConsensus1(norm.dat,
de.param = de.param,
output_dir = paste0(file_path_root,"run_cluster_m"),
mc.cores = 1)
group_meta = data.frame(names(result_1$cl),result_1$cl)
colnames(group_meta) = c("cell","group")
Print clustering result
print(head(group_meta))
## cell
## mouse_pBICCNsMMrMOpRAiF007d190314_CAGGTATAGGTAGGCT mouse_pBICCNsMMrMOpRAiF007d190314_CAGGTATAGGTAGGCT
## mouse_pBICCNsMMrMOpRMiM008d190320_CCGCAAGAGAACAGGA mouse_pBICCNsMMrMOpRMiM008d190320_CCGCAAGAGAACAGGA
## mouse_pBICCNsMMrMOpRMiM008d190320_AACCAACAGTCGTTAC mouse_pBICCNsMMrMOpRMiM008d190320_AACCAACAGTCGTTAC
## mouse_pBICCNsMMrMOpRMiM008d190320_ATGGATCGTTTCCATT mouse_pBICCNsMMrMOpRMiM008d190320_ATGGATCGTTTCCATT
## mouse_pBICCNsMMrMOpRPiF008d190314_GTTCCGTGTCCAATCA mouse_pBICCNsMMrMOpRPiF008d190314_GTTCCGTGTCCAATCA
## mouse_pBICCNsMMrMOpRPiM008d190320_CATTCTAAGCTGACAG mouse_pBICCNsMMrMOpRPiM008d190320_CATTCTAAGCTGACAG
## group
## mouse_pBICCNsMMrMOpRAiF007d190314_CAGGTATAGGTAGGCT 1
## mouse_pBICCNsMMrMOpRMiM008d190320_CCGCAAGAGAACAGGA 1
## mouse_pBICCNsMMrMOpRMiM008d190320_AACCAACAGTCGTTAC 10
## mouse_pBICCNsMMrMOpRMiM008d190320_ATGGATCGTTTCCATT 1
## mouse_pBICCNsMMrMOpRPiF008d190314_GTTCCGTGTCCAATCA 1
## mouse_pBICCNsMMrMOpRPiM008d190320_CATTCTAAGCTGACAG 10
print(dim(group_meta))
## [1] 1000 2
There should be reference data available for classification, which includes a gene expression matrix and metadata containing information on the major classes and subclasses of cell types.
method: Method to run classification, can be chosen from Seurat or SingleR.
ref_data = read.csv("G:/PFC/package_example/data.csv",row.names=1)
ref_meta = read.csv("G:/PFC/package_example/mouse_meta.csv",row.names=1)
new_data = data.frame(data)
anno_meta = RunSubclassClassify(new_data, group_meta, ref_data, ref_meta, method="Seurat")
Print classification result
print(head(anno_meta))
## cell
## mouse_pBICCNsMMrMOpRAiF007d190314_CAGGTATAGGTAGGCT mouse_pBICCNsMMrMOpRAiF007d190314_CAGGTATAGGTAGGCT
## mouse_pBICCNsMMrMOpRMiM008d190320_CCGCAAGAGAACAGGA mouse_pBICCNsMMrMOpRMiM008d190320_CCGCAAGAGAACAGGA
## mouse_pBICCNsMMrMOpRMiM008d190320_AACCAACAGTCGTTAC mouse_pBICCNsMMrMOpRMiM008d190320_AACCAACAGTCGTTAC
## mouse_pBICCNsMMrMOpRMiM008d190320_ATGGATCGTTTCCATT mouse_pBICCNsMMrMOpRMiM008d190320_ATGGATCGTTTCCATT
## mouse_pBICCNsMMrMOpRPiF008d190314_GTTCCGTGTCCAATCA mouse_pBICCNsMMrMOpRPiF008d190314_GTTCCGTGTCCAATCA
## mouse_pBICCNsMMrMOpRPiM008d190320_CATTCTAAGCTGACAG mouse_pBICCNsMMrMOpRPiM008d190320_CATTCTAAGCTGACAG
## class_label subclass_label
## mouse_pBICCNsMMrMOpRAiF007d190314_CAGGTATAGGTAGGCT GABAergic Sst Chodl
## mouse_pBICCNsMMrMOpRMiM008d190320_CCGCAAGAGAACAGGA GABAergic Sst Chodl
## mouse_pBICCNsMMrMOpRMiM008d190320_AACCAACAGTCGTTAC GABAergic Sst Chodl
## mouse_pBICCNsMMrMOpRMiM008d190320_ATGGATCGTTTCCATT GABAergic Sst Chodl
## mouse_pBICCNsMMrMOpRPiF008d190314_GTTCCGTGTCCAATCA GABAergic Sst Chodl
## mouse_pBICCNsMMrMOpRPiM008d190320_CATTCTAAGCTGACAG GABAergic Sst Chodl
## cluster_label
## mouse_pBICCNsMMrMOpRAiF007d190314_CAGGTATAGGTAGGCT 1
## mouse_pBICCNsMMrMOpRMiM008d190320_CCGCAAGAGAACAGGA 1
## mouse_pBICCNsMMrMOpRMiM008d190320_AACCAACAGTCGTTAC 10
## mouse_pBICCNsMMrMOpRMiM008d190320_ATGGATCGTTTCCATT 1
## mouse_pBICCNsMMrMOpRPiF008d190314_GTTCCGTGTCCAATCA 1
## mouse_pBICCNsMMrMOpRPiM008d190320_CATTCTAAGCTGACAG 10
The data will be processed into pseudo cells and the ROC method will be run to find differentially expressed (DE) genes. These DE genes are calculated between clusters that originate from the same subclass.
cell_number: Number of cells to be intergated within pseudo cells.
new_data = new_data[,rownames(anno_meta)]
result = RunFindDEGene(new_data, anno_meta,cell_number=10)
all_marker_list = result[[1]]
pse_meta = result[[2]]
pse_data = result[[3]]
Print marker result
print(head(pse_meta))
## cell cluster_label subclass_label class
## 1 1 1 Sst Chodl GABAergic
## 2 2 1 Sst Chodl GABAergic
## 3 3 1 Sst Chodl GABAergic
## 4 4 1 Sst Chodl GABAergic
## 5 5 1 Sst Chodl GABAergic
## 6 6 1 Sst Chodl GABAergic
Select the top three marker genes ranked by specific score.
The annotation table contains seven columns, which include class, subclass, and cluster information from previous steps. From this, select the top three markers for this cluster. The ‘cluster_new’ column contains the name obtained from our package.
cell_type_anno <- GetCellTypeName(pse_data, pse_meta, all_marker_list)
## [1] 8
## [1] 1
## [1] 2
## [1] 3
print(cell_type_anno)
## class subclass cluster gene1 gene2 gene3
## 2 GABAergic Sst Chodl 1 Trpm3 Slc4a4 Adamtsl1
## 3 GABAergic Sst Chodl 10 Hhip Nxph2 Prkg1
## 4 GABAergic Vip 13 Mybpc1 Col15a1 Pde3a
## 5 GABAergic Vip 15 Bcar3 Ptprk Bmper
## 6 GABAergic Vip 16 Car4 Cdh6 Asic2
## 7 GABAergic Vip 17 <NA> <NA> <NA>
## 8 GABAergic Vip 14 Col15a1 Kirrel3 <NA>
## 9 GABAergic Vip 12 Ror1 Col14a1 Chn2
## 10 GABAergic Pvalb 11 <NA> <NA> <NA>
## cluster_new
## 2 GABAergic_Sst Chodl_Trpm3
## 3 GABAergic_Sst Chodl_Hhip
## 4 GABAergic_Vip_Mybpc1
## 5 GABAergic_Vip_Bcar3
## 6 GABAergic_Vip_Car4
## 7 GABAergic_Vip_NA
## 8 GABAergic_Vip_Col15a1
## 9 GABAergic_Vip_Ror1
## 10 GABAergic_Pvalb_NA
We found that two clusters could not be annotated by our pipeline. One is Cluster 11, which belongs to the Pvalb subclass. Since it only contains one cluster in the Pvalb subclass, it cannot be calculated by our pipeline, as our pipeline calculates differentially expressed (DE) genes between clusters that belong to the same subclass. The other, Cluster 17, could not find markers using ROC methods. Therefore, we used other DE gene detection methods to find markers, and we will annotate the method used to find these markers.
anno_new <- GetCellTYpeNameExtend(cell_type_anno, pse_data, pse_meta)
## Calculating cluster 13
## Calculating cluster 15
## Calculating cluster 16
## Calculating cluster 17
## Calculating cluster 14
## Calculating cluster 12
## [1] 6
## [1] 1
## Calculating cluster 11
## [1] 0
## [1] 1
## Calculating cluster 11
## [1] 0
## [1] 1
## Calculating cluster 11
## [1] 0
## [1] 1
print(anno_new)
## class subclass cluster gene1 gene2 gene3
## 2 GABAergic Sst Chodl 1 Trpm3 Slc4a4 Adamtsl1
## 3 GABAergic Sst Chodl 10 Hhip Nxph2 Prkg1
## 4 GABAergic Vip 13 Mybpc1 Col15a1 Pde3a
## 5 GABAergic Vip 15 Bcar3 Ptprk Bmper
## 6 GABAergic Vip 16 Car4 Cdh6 Asic2
## 7 GABAergic Vip 17 Scml4 Frmd4b Anks1b
## 8 GABAergic Vip 14 Col15a1 Kirrel3 <NA>
## 9 GABAergic Vip 12 Ror1 Col14a1 Chn2
## 10 GABAergic Pvalb 11 <NA> <NA> <NA>
## cluster_new quality
## 2 GABAergic_Sst Chodl_Trpm3 ROC
## 3 GABAergic_Sst Chodl_Hhip ROC
## 4 GABAergic_Vip_Mybpc1 ROC
## 5 GABAergic_Vip_Bcar3 ROC
## 6 GABAergic_Vip_Car4 ROC
## 7 GABAergic_Vip_Scml4 wilcox
## 8 GABAergic_Vip_Col15a1 ROC
## 9 GABAergic_Vip_Ror1 ROC
## 10 GABAergic_Pvalb_NA ROC