Add Cigar string and Template Length to Read Name
1
0
Entering edit mode
3.0 years ago
Leendert ▴ 40

Hi all,

I need to convert a BAM file to Fastq format, but I don't want to loose the Cigar and TLen information.

My idea is to edit each read name in the BAM file, by appending both Cigar and TLen to the current read name, so that this information can go across to Fastq file. I can then parse these read names later, and split on separator to extract this information.

I couldn't come across any software that can do this. So my thoughts are to use pysam, but this seems to be the long way round.

I'm thinking something like samtools | awk | sed might be quicker, but I'm still getting to know these tools in bash.

Can anyone suggest any tools that might help to do this, or point me in the right direction for the bash | pysam commands?

pysam bam sed awk python • 974 views
ADD COMMENT
2
Entering edit mode
3.0 years ago

using samjdk: http://lindenb.github.io/jvarkit/SamJdk.html

java -jar dist/samjdk.jar -e 'record.setReadName(record.getReadName()+":"+record.getCigarString()+":"+record.getInferredInsertSize() ); return record;'  in.bam |samtools sort -n | samtools fastq
ADD COMMENT

Login before adding your answer.

Traffic: 1864 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6