Question

Extarct specific bases from BAM/SAM files

0

Entering edit mode

3.1 years ago

Ankit ▴ 500

Hi everyone,

I want to extract a specific base (based on position lists) from my reads from bam/sam file.

If any specific tools / package exist please let me know.

I followed the long a useful thread here. But I did not get the specific base.

I would appreciate any help.

Thank you

BAM SAM Bases Extract • 1.0k views

ADD COMMENT • link 3.1 years ago by Ankit ▴ 500

score 1 · Answer 1 · 2021-10-26

1

Entering edit mode

3.1 years ago

Pierre Lindenbaum 164k

I wrote http://lindenb.github.io/jvarkit/Sam2Tsv.html

ADD COMMENT • link 3.1 years ago by Pierre Lindenbaum 164k

0

Entering edit mode

Thank you for the tools.

My sequence reads are amplicon.

It gives error:

with full genome hg38.fa

Sequence dictionaries are not the same size (455, 25)

with only chr14.fa (I thought amplicon could be the cause of error)

Sequence dictionaries are not the same size (1, 25)

What could be the issue?

I created dictionary file with picard:

java -jar picard-2.26.2/picard.jar CreateSequenceDictionary R=hg38.fa O=hg38.dict

or

java -jar picard-2.26.2/picard.jar CreateSequenceDictionary R=chr14.fa O=chr14.dict

My command:

java -jar dist/sam2tsv.jar -R chr14.fa ./../S49.sort.bam

I would appreciate any suggestions.

Thanks

ADD REPLY • link 3.1 years ago by Ankit ▴ 500

0

Entering edit mode

the dictionary in the BAM file is not the same as the one in hg38.fa. The reference is not the same as the one that was used to map the reads. See the @SQ lines in samtools view -H S49.sort.bam

ADD REPLY • link 3.1 years ago by Pierre Lindenbaum 164k

0

Entering edit mode

THANK YOU for pointing it out.

I have two limitations:

My data is a bisulfite reads from an amplicon and I used individual chromsomes like chr1.fa , chr2.fa .......... so on to create bisulfire reference genome using bismark and then aligned the data. How to created dictionary file for this. I tried

java -jar picard-2.26.2/picard.jar CreateSequenceDictionary R=*.fa O=hg38.dict

but it gives error and i think that is also not the correct syntax to do it. But I do not find the solution.

. sam2tsv -R

java -jar dist/sam2tsv.jar -R hg38.fa ./../S49.sort.bam

script needs hg38.fa but my bisulfite genome reference folder has two fasta a). genome_mfa.CT_conversion.fa b). genome_mfa.GA_conversion.fa

I do not know how to deal with bisulfite reference, which one to supply in -R option.

thanks

ADD REPLY • link 3.1 years ago by Ankit ▴ 500