Question

generation of sequences (from bam) starting at a specific position

0

Entering edit mode

8.0 years ago

curiousbiologist ▴ 40

How can I generate sequences from a bam/sam file starting at a specific position and remove what is before this position? Thank you!

alignment sequence • 2.0k views

ADD COMMENT • link 8.0 years ago by curiousbiologist ▴ 40

1

Entering edit mode

To me, it's unclear what you are asking for. Is this the same as alignment containing sequences from position a to b ? What do you want to obtain? "Sequences"? Is that a read/fasta/fastq/reference/variant...?

ADD REPLY • link 8.0 years ago by WouterDeCoster 47k

0

Entering edit mode

I would like to obtain a bam "cropped" (all my reads aligned and starting at a defined position)

ADD REPLY • link 8.0 years ago by curiousbiologist ▴ 40

1

Entering edit mode

One of the options in this threads should do this: How to get the consensus sequence from a BAM alignment

ADD REPLY • link 8.0 years ago by GenoMax 147k

0

Entering edit mode

I would like to play with all the selected reads after, I'm not sure consensus is a good approach

ADD REPLY • link 8.0 years ago by curiousbiologist ▴ 40

score 0 · Answer 1 · 2016-12-12

0

Entering edit mode

8.0 years ago

curiousbiologist ▴ 40

I'm deeply sorry if my question was obscure: what I want to do is to get, from an alignment, reads without nucleic bases before position 30 of the reference sequence. Is it possible to crop a bam file?

ADD COMMENT • link 8.0 years ago by curiousbiologist ▴ 40

0

Entering edit mode

You could do something like (adjust the name of the "chromosome" in your alignment file as needed).

samtools view file_sorted.bam  "chr:30-N"| awk -F "\t" '{print "@"$1"\n"$10"\n+\n"$11}' > reads_before_30.fq

ADD REPLY • link 8.0 years ago by GenoMax 147k

0

Entering edit mode

Thank you for your answer! However, I got reads matching within the interval given, not cropped reads inside this interval

ADD REPLY • link 8.0 years ago by curiousbiologist ▴ 40

0

Entering edit mode

I am not aware of a tool that will do that automatically for you. You will need to use a custom script to do something that specific.

ADD REPLY • link 8.0 years ago by GenoMax 147k

0

Entering edit mode

Thank you for your answer, it means to me to have a return. My programming skills are not great, do you know a script that I can use as start basis to do my custom script?

ADD REPLY • link 7.9 years ago by curiousbiologist ▴ 40

0

Entering edit mode

Could you use an igv screenshot and a graphical program (e.g. MS paint) to clarify what you aim for?

ADD REPLY • link 7.9 years ago by WouterDeCoster 47k

0

Entering edit mode

curiousbiologist wants individual reads chopped so they start and end at a specific position i.e. nothing should extend to left or right of an interval a <--> b

ADD REPLY • link 7.9 years ago by GenoMax 147k

0

Entering edit mode

Okay, makes me wonder "why" OP would want that, but fine. This is not a straightforward question, requires modification of CIGAR, sequence, qualities, start,...

ADD REPLY • link 7.9 years ago by WouterDeCoster 47k

0

Entering edit mode

Yes you got it genomax2. I want to have several (and switchable) windows of reads from different samples in order to compare them using different score calculations; stats, entropy (shannon entropy score). If I haven't same size of reads pieces, my results will be misrepresented

ADD REPLY • link 7.9 years ago by curiousbiologist ▴ 40

score 0 · Answer 2 · 2016-12-14

would it be possible to resolve this problem using an awk script? I was thinking of an alignment, conversion to fasta (with gap or x for non-aligned bases) and then trimming using awk or fastx_trimmer? Maybe there is something easier? how do I get gap or 'x' for non-aligned bases before and after each reads?