Question

How can I align single RNA seq(scRNAseq) data by using STAR or another aligner?

1

Entering edit mode

4.3 years ago

dhkwnr95 ▴ 20

Hi, I have some problems about align GEO data.

How Can I align my GEO's fastq data to gene annotation file(GTF file) For example, SRR11804718(https://trace.ncbi.nlm.nih.gov/Traces/sra/?run=SRR11804718) data, writers are used dropTag(in dropEst) and they make read1, read2 by Seq-Well protocol. Therefore, read1 has barcode + UMI = 21 length sequence, read2 has 50 length mRNA sequence. just like above(SRR11804718).

The problem is, when I used STAR aligner, it arised error: read1(21) is too shorter then read2(50). I think STAR which accept read1, read2 are paired data(like bulk RNA seq). So, How can I put these data right way? read 1 has only barcode and UMI information and in my opinion, this information is really important for align single cell RNA seq data(fastq) -> ref genome.

In SRR11804718 data, they use dropTag(in dropEst) to bcl -> fastq / and they use Seq-Well protocol. Please help me. How can I make bam file by aligning scRNA-seq data by using STAR right way?

Best regards

RNA-Seq scRNA-Seq single cell seq-well dropTag • 4.0k views

ADD COMMENT • link updated 4.3 years ago by Rob 6.9k • written 4.3 years ago by dhkwnr95 ▴ 20

score 2 · Answer 1 · 2020-09-08

2

Entering edit mode

4.3 years ago

Rob 6.9k

STAR has a "STARSolo" mode specifically for dealing with tagged-end single cell data. Check the STAR manual and search for "STARSolo". The tool is well-documented.

ADD COMMENT • link 4.3 years ago by Rob 6.9k

0

Entering edit mode

Thank you for your advice. Can I ask two more question? 1. In "STAR" mannual(https://physiology.med.cornell.edu/faculty/skrabanek/lab/angsd/lecture_notes/STARmanual.pdf) , "14.25 STARsolo (single cell RNA-seq) parameters" Tab, they say 'one cell barcode and one UMI barcode in read2, e.g. Drop-seq and10X Chromium' -> Therefore, I'm affraid my data(Seq-Well protocol (NOT Drop-Seq protocol)) cannot apply to STARsolo. 2. In SRR11804718(https://trace.ncbi.nlm.nih.gov/Traces/sra/?run=SRR11804718) data and article(https://www.nature.com/articles/s41591-020-0944-y), they said 'The read structure was paired-end with read 1 beginning from a custom read 1 primer containing a 12-bp cell barcode and an 8-bp UMI, and with read 2 containing 50 bp of mRNA sequence' However, In SRA run browser(https://trace.ncbi.nlm.nih.gov/Traces/sra/?run=SRR11804718), read 1 have 21 length (NOT 12+8 = 20). What happen on this data?

ADD REPLY • link 4.3 years ago by dhkwnr95 ▴ 20

0

Entering edit mode

One extra base added to a read might be purely to allow the fastq maker to assess phasing. Are you sure that your data is so different from DropSeq? What aligner does the lab the developed Seq-Well use?

ADD REPLY • link 4.3 years ago by swbarnes2 14k

0

Entering edit mode

I'm just want to play with that GEO data(SRR11804718) / I'm not a reasercher of that lab. They said briefly, they used Seq-Well protocol and using STAR aligner. And Would you please explain more detail about that 'assess phasing'? In normally where is the location of extra base in read1(barcode+UMI)? first of sequence? last of sequence? between cellbarcode and UMI? I've no idea about extra base, even I never heard before.

ADD REPLY • link 4.3 years ago by dhkwnr95 ▴ 20

0

Entering edit mode

On this document(http://shaleklab.com/wp-content/uploads/2019/07/SeqWell-S3-Protocol.pdf), 24page,/ they said : Read 1 can sometimes be 21 base pairs; this depends on the company andbead lot you are ordering from. Please consult with your bead provider to determine which read length to use. / Therefore, I think I have to send e-mail to article auther.

ADD REPLY • link 4.3 years ago by dhkwnr95 ▴ 20