spaTrack adapt to scRNA-seq data¶

Besides spatial transcriptomics data, spaTrack can also apply trajectory inference to scRNA-seq data by considering only UMAP coordinates.This notebook demonstrates how spaTrack performs trajectory inference on scRNA-seq data using human hematopoiesis cells (HSPC) as an example. The example data can be donwloaded from Baidu Netdisk and Google Drive.

[1]:

import warnings

warnings.filterwarnings("ignore")

[2]:

import spaTrack as spt
import scanpy as sc
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

sc.settings.verbosity = 0
plt.rcParams['figure.dpi'] = 200 #分辨率

1. Import the HSC data¶

Convert tsv files to scanpy adata. When run spaTrack on scRNA-seq data,we recommended that cell type information can placed in ‘adata.obs[’cluster’]’, and UMAP coordinates can be placed in ‘X_UMAP’ or ‘X_umap’.

[3]:

adata = sc.read("../../../../data/HSPC.Sc-RNA.data/HSPC.sc.expression.tsv")
anno = pd.read_table("../../../../data/HSPC.Sc-RNA.data/UMAP_coor.annot.tsv",)

adata.obs.index = anno["Cell_id"].values
adata.obs["cluster"] = anno["Annot"].values
adata.obsm["X_UMAP"] = anno[['UMAP_1','UMAP_2']].values

adata.var_names_make_unique()

Perform basic preprocessing and then visualize the data.

[4]:

sc.pp.filter_genes(adata,min_cells=30)
sc.pp.normalize_total(adata,target_sum=1e4)
sc.pp.log1p(adata)
fig, axs = plt.subplots(figsize=(6, 6))
ax = sc.pl.embedding(adata,  basis='X_UMAP',show=False,color='cluster',ax=axs,frameon=False,title=' ',palette='tab20b',size=150, legend_fontsize=20)

../_images/notebooks_05.scRNAseq_data_of_HSC_8_0.png

2. Choose start cells¶

Based on prior biological knowledge, HSC(hematopoietic stem cells) have the ability to self-renew and differentiate into various blood cell types, making them essential for the production and replenishment of blood cells in the body. We set HSC as the starting cluster in this data.

[5]:

start_cells=spt.set_start_cells(adata,select_way='cell_type',cell_type='HSC')
plt.scatter(adata.obsm['X_UMAP'][:,0],adata.obsm['X_UMAP'][:,1],c='#D3D3D3',s=20)
plt.scatter(adata.obsm['X_UMAP'][start_cells][:,0],adata.obsm['X_UMAP'][start_cells][:,1],c='orange',s=25)

[5]:

<matplotlib.collections.PathCollection at 0x7fa1b9deca60>

../_images/notebooks_05.scRNAseq_data_of_HSC_11_1.png

3. Calculate cell transition probability¶

Calculate cell transition probability matrix between cells.

[6]:

adata.obsp['trans']=spt.get_ot_matrix(adata,data_type='single-cell')

X_pca is not in adata.obsm, automatically do PCA first.
alpha1(gene expression): 0.5   alpha2(spatial information): 0.5

4. Caculate cell pseudotime¶

Infer the cell pseudotime based on the OT matrix and start cells.

[7]:

adata.obs['ptime']=spt.get_ptime(adata,start_cells)

5. Calculate vector field velocity.¶

[8]:

adata.uns['E_grid'],adata.uns['V_grid']=spt.get_velocity(adata,basis='UMAP',n_neigh_pos=100,n_neigh_gene=0)

The velocity of cells store in 'velocity_UMAP'.

Display the UMAP of cells, starting cells, the cell pseudotime and the inferred cell trajectory respectively.

6. Plot results¶

Display the UMAP plots of cells, starting cells, cell pseudotime, and the inferred cell trajectory.

[9]:

fig, axs = plt.subplots(ncols=2, nrows=1, figsize=(10, 5))
sc.pl.embedding(adata, basis='X_UMAP', color='cluster', size=20, legend_loc='on data', ax=axs[0], legend_fontoutline=3, show=False, s=100)

start_points = sc.pl.embedding(adata, basis='X_UMAP',ax=axs[1], show=False, title='start cells')
points = adata.obsm['X_UMAP'][adata.obs['cluster'] == 'HSC']
start_points.scatter(points[:, 0], points[:, 1], s=15, color='orange')

[9]:

<matplotlib.collections.PathCollection at 0x7fa1b9cdfa90>

../_images/notebooks_05.scRNAseq_data_of_HSC_23_1.png

[10]:

fig, axs = plt.subplots(ncols=2, nrows=1, figsize=(10, 4))
sc.pl.embedding(adata, basis='X_UMAP', color='ptime', show=False, ax=axs[0], color_map='Reds', title='ptime')

vf_velocity = sc.pl.embedding(adata, basis='X_UMAP', show=False, ax=axs[1],color='cluster',title='streamplot',  legend_loc=None, alpha=0.4, size=120)
vf_velocity.streamplot(adata.uns['E_grid'][0], adata.uns['E_grid'][1], adata.uns['V_grid'][0], adata.uns['V_grid'][1], color='black', linewidth=1.5,density=1.8,arrowsize=1.2,)

[10]:

<matplotlib.streamplot.StreamplotSet at 0x7fa1b6001640>

../_images/notebooks_05.scRNAseq_data_of_HSC_24_1.png

7. Downstream analysis¶

7.1 Least Action Path (LAP)¶

[11]:

VecFld=spt.VectorField(adata,basis='UMAP')

Select the start point and the end point of the least action path.

[12]:

fig, ax = plt.subplots(figsize=(5, 5))

LAP_start_point=[-20,10]
LAP_end_point=[-30,-10]
LAP_start_cell=spt.nearest_neighbors(LAP_start_point,adata.obsm['X_UMAP'])[0][0]
LAP_end_cell=spt.nearest_neighbors(LAP_end_point,adata.obsm['X_UMAP'])[0][0]

plt.scatter(*adata.obsm["X_UMAP"].T,c='#D3D3D3',s=3)
plt.scatter(*adata[LAP_start_cell].obsm['X_UMAP'].T,s=20)
plt.scatter(*adata[LAP_end_cell].obsm['X_UMAP'].T,s=20)

[12]:

<matplotlib.collections.PathCollection at 0x7fa1b8da6760>

../_images/notebooks_05.scRNAseq_data_of_HSC_29_1.png

Calculate the least action path between the given starting cells and end cells.

[13]:

lap=spt.least_action(adata,
    init_cells=adata.obs_names[LAP_start_cell],
    target_cells=adata.obs_names[LAP_end_cell],
    vecfld=VecFld,
    basis='UMAP',
    adj_key='X_UMAP_distances',
    EM_steps=10,
    n_points=20
)

Plot the least action Path.

[14]:

LAP_ptime,LAP_nbrs=spt.lap.map_cell_to_LAP(adata,basis='UMAP')
sub_adata=adata[LAP_nbrs,:]
sub_adata.obs['ptime']=LAP_ptime
sub_adata=sub_adata[np.argsort(sub_adata.obs["ptime"].values), :].copy()

fig, ax = plt.subplots(figsize=(8,8))
plt.axis('off')
ax = sc.pl.embedding(adata,  basis='X_UMAP',color='cluster',show=False,ax=ax,frameon=False,legend_loc=None,alpha=0.8,size=150)
ax.streamplot(adata.uns['E_grid'][0], adata.uns['E_grid'][1], adata.uns['V_grid'][0], adata.uns['V_grid'][1],density=1.2,color='black',linewidth=2.5,arrowsize=1.5,minlength=0.1,maxlength=0.8)
ax = spt.plot_least_action_path(adata,basis='UMAP',ax=ax,point_size=120,linewidth=5)
sc.pl.embedding(sub_adata, basis='X_UMAP',ax=ax, color="ptime", cmap="Purples",frameon=True,size=150,title=' ')

../_images/notebooks_05.scRNAseq_data_of_HSC_33_0.png

7.2 Pseudotime-dependent genes on LAP¶

Select the targeted cell type for least action path (LAP) analysis and filter out genes with highly variable and expression below the minimum proportion.

[15]:

sub_adata_path=sub_adata[sub_adata.obs['cluster'].isin(['HSC','BMEP','MKP'])]
sub_adata_path=spt.filter_gene(sub_adata_path,min_exp_prop=0.1,hvg_gene=5000)

clusters ordered by ptime:  ['HSC', 'BMEP', 'MKP']
Cell number     537
Gene number     2210

Next, we use a Generalized Additive Model to fit the gene expression and corresponding cell pseudotime. Pseudotime-dependent genes are then filtered based on the degree of model fit and the false discovery rate (FDR). To visualize pseudotime-dependent genes across the trajectory, we partition the cells ordered by pseudotime into windows and rank genes based on their maximum expression within each window.

[16]:

df_res=spt.ptime_gene_GAM(sub_adata_path,core_number=5)
df_sig_res = df_res.loc[(df_res['model_fit']>0.4) & (df_res['fdr']<0.05)]

Genes number fitted by GAM model:   2210

[17]:

df_sig_res

[17]:

	gene	pvalue	model_fit	pattern	fdr
ACTN1	ACTN1	1.110223e-16	0.490395	increase	1.044082e-15
AIF1	AIF1	1.110223e-16	0.421778	decrease	1.044082e-15
AVP	AVP	1.110223e-16	0.780329	decrease	1.044082e-15
BIRC5	BIRC5	1.110223e-16	0.525313	increase	1.044082e-15
CCNB1	CCNB1	1.110223e-16	0.512947	increase	1.044082e-15
...	...	...	...	...	...
TPT1	TPT1	1.110223e-16	0.605880	decrease	1.044082e-15
TYMS	TYMS	1.110223e-16	0.536141	increase	1.044082e-15
UBE2C	UBE2C	1.110223e-16	0.583134	increase	1.044082e-15
VIM	VIM	1.110223e-16	0.405040	decrease	1.044082e-15
ZFAS1	ZFAS1	1.110223e-16	0.602351	decrease	1.044082e-15

93 rows × 5 columns

Order genes and plot the heatmap.

[18]:

sort_exp_sig = spt.order_trajectory_genes(sub_adata,df_sig_res,cell_number=30)
spt.plot_trajectory_gene_heatmap(sort_exp_sig,smooth_length=200,gene_label_size=15,cmap_name='twilight_shifted')

Finally selected 93 genes.

../_images/notebooks_05.scRNAseq_data_of_HSC_41_1.png

Show the relationship between genes and ptime.

[19]:

spt.plot_trajectory_gene(sub_adata_path,gene_name='VIM',show_cell_type=False)

[19]:

<Axes: title={'center': 'VIM'}, xlabel='ptime', ylabel='expression'>

../_images/notebooks_05.scRNAseq_data_of_HSC_43_1.png