No more special reasons,if I want treat pair-end files (so one sample two xxx.fastq.gz files) as single-end to process.
- It means double sequence coverage?
- For the align step by star, should I merge these two files together
to map and I wonder how to merge just by
cat samplex_R1.fastq.gz samplex_R2.fastq.gz
, or give up one file of them two? Orjust do map firstly as single-end(I worried about this way, after all, the contains of file are different), then merge their bam files together? - I confused about why wc -l
samplex_R1.fastq.gz
is different fromwc -l samplex_R2.fastq.gz
since they are pair?
Any idea will be very grateful!!!
Treat each pair as a single end fastq and after quantification you can take the average of the TPM/RPKM/FPKM of the read pairs.
Cheers !!
While you can do that, that doesn't mean that one should do that (one shouldn't).
Useful view, thanks!
My data (m+n) samples. m sample just few part for all, but we got pair-end RNA-Seq data from company, then, we got single-end data for next n samples, since they (experiment people) changed idea for some reasons, but for us, we plan process all of them as single-end way. For me, I'm not very understand what it means for treat that m sample's pair-end RNA-Seq as single-end, but I guess it should start from mapping step, so for details, is what put in star now.
In that case they want you to only use the R1 files.
Thanks! Sounds reasonable, but I'm not sure it's right or not. Have you processed data by this way?
First of all, if you have paired-end data then you why you want to process them as single end. Paired end data provides you with better mappability and alignment as you have two mates to support the mapping. I provided solution in case of the pairs is not sequenced correctly or you have a mix of paired and single end data to work with. The solution I SUGGESTED not a regular analysis step.
Hope this helps.
Cheers !!!
I see, thank you,I think my reason is mix of paired and single end data to work with. Best.
No, absolutely don't. This may sound logical but cases where one mate is mapped and the other isn't would greatly affect the result as it would effectively divide the counts of the first mate by two (because one adds + 0 from mate two). If you want to use single-end reads, take the forward reads of each sample and proceed with standard tools such as
salmon
for quantification orstar
for alignment. Don't do any custom/untested procedures. That only creates bias.