How to remove Read Name (1st column) extension from a BAM file?
1
1
Entering edit mode
7.7 years ago
IrK ▴ 100

Hello everyone,

i have paired-end data, after alignment I have noticed that R1 and R2 have different read names in BAM file, for example:

SRR1032070.122660125.1
SRR1032070.122660125.2

so read R1 has extension .1 and read R2 has extension .2. This causes a problem when I try to convert BAM to BED file with bedtools bamtobed -bedpe -i. So the only solution I can think of is to remove these extensions. Could anyone please advice on the tool, I would not like to convert data back to SAM, as it is extremely large!!!

Thank you

bam paired-end • 4.3k views
ADD COMMENT
0
Entering edit mode

I assume you got this data from SRA? You should have used -F|--origfmt Defline contains only original sequence name option to avoid getting these kind of read names.

As for adding /1 /2 to read names you could use reformat.sh from BBMap suite with the addslash=t or addcolon=t options.

ADD REPLY
0
Entering edit mode

ohhh ok, so you mean when I convert SRA to FATSQ with fastq-dump -F use F option

Thank you

ADD REPLY
2
Entering edit mode
7.7 years ago
samtools view -h in.bam | sed '/^[^@]/s/^\(.*\)\.[12]\t/\1\t/' | samtools view -Sb -o out.bam -
ADD COMMENT
1
Entering edit mode

thank you so much, Pierre.

I modified the sed part [12] to [.1.2] and it works. So the final command, which works for me is:

samtools view -h in.bam| sed '/^[^@]/s/^\(.*\)\.[.1.2]\t/\1\t/' | samtools view -Sb -o out.bam

May I clarify for the future reference, how paired-end reads can be annotated with extensions 1 and 2? What software was used for this?

ADD REPLY

Login before adding your answer.

Traffic: 2475 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6