Hey, I was wondering if someone can help me to see what I am getting wrong here?
I have paired-end RNA seq data bam files but do not have the fastq. My goal is to align to two new references, can someone help?
I used this code:
samtools flagstat file.bam
I received this...
63178554 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
59041463 + 0 mapped (93.45% : N/A)
63178554 + 0 paired in sequencing
31589277 + 0 read1
31589277 + 0 read2
50981107 + 0 properly paired (80.69% : N/A)
56645781 + 0 with itself and mate mapped
2395682 + 0 singletons (3.79% : N/A)
389730 + 0 with mate mapped to a different chr
261086 + 0 with mate mapped to a different chr (mapQ>=5)
I then went to sort...
# sort paired read alignment .bam file (sort by name -n)
samtools sort -n SAMPLE.bam -o SAMPLE_sorted.bam
# save fastq reads in separate R1 and R2 files
samtools fastq -@ 8 SAMPLE_sorted.bam \
-1 SAMPLE_R1.fastq.gz \
-2 SAMPLE_R2.fastq.gz \
-0 /dev/null -s /dev/null -n
Output: [M::bam2fq_mainloop] processed 63178554 reads
I have also tried this
samtools bam2fq SAMPLE.bam > SAMPLE.fastq
Both will say its processed the total number of reads but when I do samtools flagstat, every is 0's or samtools view it says it is a truncated file....for all of them....
I am not sure why....can someone help?
Also I noticed that for read1 and read2 they are the same, but this expected because I have paired end reads, correct?
ERROR MESSAGES:
output- [M::bam2fq_mainloop] discarded 0 singletons
For both R1 and R2 files:
samtools flagstat file_R1.fastq.gz
[W::sam_read1] Parse error at line 2
[bam_flagstat_core] Truncated file? Continue anyway.
0 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
0 + 0 mapped (N/A : N/A)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (N/A : N/A)
0 + 0 with itself and mate mapped
0 + 0 singletons (N/A : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
Why are you running flagstat on a fastq? What do you think that accomplishes?
I didn't know, I was trying to just do things to figure out why it kept truncating, I am teaching myself as I go along and trying to understand. That is why I am here. Do you have some advice on how to help?
Thanks.
samtools will work on sam and bam files, not on fastq files. fastq files are very simple text files, you can check fastq files via
gunzip -c file_R1.fastq.gz | head
andgunzip -c file_R1.fastq.gz | wc -l
, which will count the number of lines. There should be equal number of lines in the R1 and R2 files.EDIT: Including a link to very brief descriptions of typical file formats.
Thank you! I didn't look at them , I was trying to use flagstat to look at them which was clearly stupid on my end....going to try this command. I was confused why they were truncated....
So I can just use the bam2fq....I don't have to separate the R1 and R2 first then merge the files to get the new .fq file?
There are very very few applications where you want to merge R1 and R2 in any way. You almost certainly want them separate, but check the user guide for whatever software you are using on the fastqs.
It's not clear to me that you need help. I don't see any evidence that your bam2 fastq steps didn't do exactly what they should have done. Did you try looking at the first few lines of your fastqs? Again, what did you attempt to accomplish by running flagstat on a fastq?