Hi all,
I'm now 6 months into the field of NGS and analysis of sequencing data. I have been working on RNA-Seq data and recently, just started to venture into CAGE-Seq data.
I wanted to ask how do we actually map CAGE-Seq data? We did a paired-end sequencing for the CAGE data and then got the fastq files. After cleaning, I got the clean reads files for read1 and read2 but both of them are of different size. When I run them on STAR, it said that mapping could not be done as the run finished for 1 read while the other 1 is still not.
Is this normal for CAGE-Seq data? Or should we just map read1 only as we are only interested in the TSS i.e. reads seq from 5' end?
I am a bit confused how to process CAGE data here.
Please give some guidance & advice. Thank you very much.
Can you elaborate on the "cleaning" part?
And do you mean different read lengths or different number of reads in R1 vs R2?
Cleaning is where I trimmed off 4 basepairs off the reads which correspond to the index of the samples they represent.
Yes, I get different number of reads for R1 & R2.
Please post names and versions of the programs you used, and also the exact commands. You should clean and map R1+R2 as paired files, i. e., simultaneously and keeping proper pair information.
Here's the reads processing before mapping...
read_skipper.pl R1_step1.fq CAC
fastx_trimmer -f 4 -i R1_step1.fq -o R1_trimmed.fq -Q33
perl ../IndexQuality_CAGE_20.pl R1_trimmed.fq R1_trimmed.fq I.fq R1_20.fq R1_20.2.fq I_20.fq
qcleaner_renew_v3.1.pl --i ./R1_step1_skip.fq --o R1_clean.fastq --log qclog.txt
qcleaner_renew_v3.1.pl --i ./Undetermined_S0_L001_R2_001.fastq --o R2_clean.fastq --log qclog.txt
fastx does not preserve pairing, use Trimmomatic or BBDuk do trim adapters and low quality.
Thank you for your suggestion. I will try it out and see if it works.