Entering edit mode
9.6 years ago
jyu429
▴
120
Hi,
I want to modify the sequence names in my bam file. They're supposed to be for paired end, but the names don't have /1
and /2
so I can't use software like bedtools bam2fastq. Anyway, I'd like to add /1 to the end of the name if the flag in the second column is 99 or 83 and then /2 if its 163 or 147. For instance,
HSQ1008:141:D0CC8ACXX:3:2202:1520:59984 163 chr14 105899906 60 101M = 105900110 305 CCTTTCCAGGAAAGGGAGTAGCGAGGCTGCTCACTTAGAGCCACGCACCTGGGGCTGACAGTGTGCCTGGCAGTACCTGTGTGGAAAGACAGTTACAGAGG @C@DDFDDHHHHAHGBHG1AFHIGIJGIIGEIJIGIFE?BBGIIIIBHIEGHHHFFFEEEEEEDCCCCCDDCD?CACCDDDCBB@ACBC?CCDCCCACAC8 RG:Z:NA12877 XT:A:U NM:i:0 SM:i:37 AM:i:37 X0:i:1 X1:i:0 XM:i:0 XO:i:0 XG:i:0 MD:Z:101
should become
HSQ1008:141:D0CC8ACXX:3:2202:1520:59984/2 163 chr14 105899906 60 101M = 105900110 305 CCTTTCCAGGAAAGGGAGTAGCGAGGCTGCTCACTTAGAGCCACGCACCTGGGGCTGACAGTGTGCCTGGCAGTACCTGTGTGGAAAGACAGTTACAGAGG @C@DDFDDHHHHAHGBHG1AFHIGIJGIIGEIJIGIFE?BBGIIIIBHIEGHHHFFFEEEEEEDCCCCCDDCD?CACCDDDCBB@ACBC?CCDCCCACAC8 RG:Z:NA12877 XT:A:U NM:i:0 SM:i:37 AM:i:37 X0:i:1 X1:i:0 XM:i:0 XO:i:0 XG:i:0 MD:Z:101
How should I go about this? The bam file is also missing header information. Thanks!
I think you should be able to use Picard Sam2Fastq on this bam file and still able to get a pair of fastq files as output. You don't need to add
/1
or/2
in the bam file. The/1
and/2
tags are trimmed off from the paired-end reads before they are written in the bam file. However, this information can be deduced using the samtools bitwise flag (second column). As you mentioned that 99 and 83 will represent/1
reads. Sam2Fastq also uses the same logic to assign the_1
and_2
tags to the reads. You will have to make sure that you create a proper header for the bam file in case it is missing one. Read about the SAM format here https://samtools.github.io/hts-specs/SAMv1.pdfyou already asked a very similar question: Substitute first column based on second column
why would you need to to this ? what's your final aim ?
I'm just trying to convert a bam to a fastq file, but the paired reads have duplicate names rather than /1 and /2. I tried using Hydra bamtofastq and this seems to work, so my question is resolved. Thanks!
Hello jyu429!
It appears that your post has been cross-posted to another site: http://seqanswers.com/forums/showthread.php?t=58344
This is typically not recommended as it runs the risk of annoying people in both communities.
Ah, sorry. I wasn't sure how to delete it, but it won't happen again.