Entering edit mode
9.7 years ago
flavobacteria
▴
50
I have RNA sample pair-end sequenced, what should I do with these two separate
files? Just need simply merge them before analysis? or something else? Thank you!
You should not merge the two fastq files. You should provide both of them to the aligner at the same time. For example, Tophat manual clearly mentions:
Thank you Ashutosh, I see. Because of my samples are from bacteria, I actually use Bowtie and then HTSeq. Do you think these two will consider the "additional unpaired reads" as that in Tophat? How to add the two files into Bowtie and htseq?
Thank you!
If I am understanding you correctly, you have a pair of fastq files (two files) and a file that contains unpaired or orphan reads. I DON'T think you can use Bowtie to align all these reads together. I may be wrong though. You can always align them separately and merge the two bam files (paired end fastq files and orphan or single reads). Now the tricky part is that HT-seq takes into account if the data was paired-end or not. In case of paired-end if both the reads align to the same exon, they will only contribute to a single count for that gene. I am not sure how HTseq will work for merged bam file (will have to go through the source code) as the SAM flag for mapped paired-end read where the mate doesn't map is different from single end read which has mapped. You can calculate the counts separately for the two bam files and merge the counts instead of the bam files. Frankly speaking I have never tried quantifying the expression this way.
Thank you Brian, I understand this is just a general question (I am kind of new to this field but learning).
I want to look at the gene regulation difference under the two defined conditions (I guess it is differentiation of the transcriptome). Certainly we want to see roughly which kind of genes are actively expressed or depressed (maybe how much, hopefully?).
You mean we just need to take one of the two files (R1 or R2) for quantification/analysis?
Thank you Brian again
Yes, just map the reads simultaneously with an aligner that can handle paired reads, which most can. Then you can calculate expression from the resulting sam file. Or, for convenience, BBMap can map paired reads and directly output rpkm counts as a file, skipping intermediate steps.