Converting BAM to FASTQ...to new reference genome
0
0
Entering edit mode
2.2 years ago
kcarey • 0

Hey, I was wondering if someone can help me to see what I am getting wrong here?

I have paired-end RNA seq data bam files but do not have the fastq. My goal is to align to two new references, can someone help?

I used this code:

samtools flagstat file.bam 
I received this...
63178554 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
59041463 + 0 mapped (93.45% : N/A)
63178554 + 0 paired in sequencing
31589277 + 0 read1
31589277 + 0 read2
50981107 + 0 properly paired (80.69% : N/A)
56645781 + 0 with itself and mate mapped
2395682 + 0 singletons (3.79% : N/A)
389730 + 0 with mate mapped to a different chr
261086 + 0 with mate mapped to a different chr (mapQ>=5)

I then went to sort...

# sort paired read alignment .bam file (sort by name -n)
samtools sort -n SAMPLE.bam -o SAMPLE_sorted.bam
# save fastq reads in separate R1 and R2 files
samtools fastq -@ 8 SAMPLE_sorted.bam \
    -1 SAMPLE_R1.fastq.gz \
    -2 SAMPLE_R2.fastq.gz \
    -0 /dev/null -s /dev/null -n

Output: [M::bam2fq_mainloop] processed 63178554 reads

I have also tried this

samtools bam2fq SAMPLE.bam > SAMPLE.fastq

Both will say its processed the total number of reads but when I do samtools flagstat, every is 0's or samtools view it says it is a truncated file....for all of them....

I am not sure why....can someone help?

Also I noticed that for read1 and read2 they are the same, but this expected because I have paired end reads, correct?

ERROR MESSAGES: 

output- [M::bam2fq_mainloop] discarded 0 singletons

For both R1 and R2 files:

samtools flagstat file_R1.fastq.gz 

[W::sam_read1] Parse error at line 2
[bam_flagstat_core] Truncated file? Continue anyway.

0 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
0 + 0 mapped (N/A : N/A)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (N/A : N/A)
0 + 0 with itself and mate mapped
0 + 0 singletons (N/A : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
bam Samtools STAR fastq RNA-Seq • 2.0k views
ADD COMMENT
0
Entering edit mode

Why are you running flagstat on a fastq? What do you think that accomplishes?

ADD REPLY
0
Entering edit mode

I didn't know, I was trying to just do things to figure out why it kept truncating, I am teaching myself as I go along and trying to understand. That is why I am here. Do you have some advice on how to help?

Thanks.

ADD REPLY
0
Entering edit mode

samtools will work on sam and bam files, not on fastq files. fastq files are very simple text files, you can check fastq files via gunzip -c file_R1.fastq.gz | head and gunzip -c file_R1.fastq.gz | wc -l, which will count the number of lines. There should be equal number of lines in the R1 and R2 files.

EDIT: Including a link to very brief descriptions of typical file formats.

ADD REPLY
0
Entering edit mode

Thank you! I didn't look at them , I was trying to use flagstat to look at them which was clearly stupid on my end....going to try this command. I was confused why they were truncated....

So I can just use the bam2fq....I don't have to separate the R1 and R2 first then merge the files to get the new .fq file?

ADD REPLY
0
Entering edit mode

There are very very few applications where you want to merge R1 and R2 in any way. You almost certainly want them separate, but check the user guide for whatever software you are using on the fastqs.

ADD REPLY
0
Entering edit mode

It's not clear to me that you need help. I don't see any evidence that your bam2 fastq steps didn't do exactly what they should have done. Did you try looking at the first few lines of your fastqs? Again, what did you attempt to accomplish by running flagstat on a fastq?

ADD REPLY

Login before adding your answer.

Traffic: 1894 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6