How to extract just the reads ID, start position and lenght from the alliigned file?
0
0
Entering edit mode
6.3 years ago
vellryba • 0

Hello,

I would like to get a text file with just the Read name, start position and the length using my reference file and sam/bam file.

Like this:

 Read Name  Start Position  Length
M03972:51:000000000-BJVL8:1:1113:16722:4595 1   66

M03972:51:000000000-BJVL8:1:1114:23117:18542    1   98

Can someone please help with the command for this?

Thank you very much!

next-gen sequencing • 1.7k views
ADD COMMENT
2
Entering edit mode

Hi vellryba, welcome to Biostars.

This is the structure of a SAM file:

$ samtools view test.bam | head -n 1
SRR1610193.15889013 16  chr1    10000   8   4S44M   *   0   0   AACCATAACCCTAACCCTAACCCTAACACTAACCCTAACCCTAACCCT    DEHF>FHF@ED?C:HFFCE:HCFFAACFFHEADGGDFDHFDDDAD@@@    NM:i:1  MD:Z:23C20  AS:i:39 XS:i:36 XA:Z:chr18,-63692,5S33M1D10M,1;chr22,+44626520,48M,3;chr6,-1114755,13S35M,0;chr6,-147864,4M1D38M1D6M,3;chr11,-80886866,4M1I43M,3;

...and these are the specs of SAM files. Given you read that, I am sure you can find a solution, using Unix tools like cut or awk. In my experience, you learn much more by solving these basic problems yourself, rather than getting a ready-to-use script. If you have problems, feel free to ask.

ADD REPLY
0
Entering edit mode

Hi, I have a problem working out where the alignment starts. I would really appreciate some help.

ADD REPLY
0
Entering edit mode

Table on page 5 (in the SAM spec file that @ATPoint linked above) has that information.

ADD REPLY
0
Entering edit mode

I agree with ATPoint. Spoon feeding will only harm you

ADD REPLY
0
Entering edit mode

I dont normally work with NGS, I just need to to do this one thing. I am extremely short on time. This might loose me days since I am an imposed deadline. Normally, I would agree but this is not my line of work and I am under big pressure to get the preliminary results quickly.

ADD REPLY
0
Entering edit mode

time pressure is not a good excuse to not learn some basics about the task you have been asked to do. In this case the answer is simple and will definitely not take you days or even an hour.

ADD REPLY
0
Entering edit mode

Well...It has been over two hours since I tried to do this.....This question isnt the first thing I did...

I am really not after a lecture. I am after help

ADD REPLY
1
Entering edit mode

In fact, everybody is trying to help you in some way or the other. What everybody wants to know your genuine efforts to find the solution.

ADD REPLY
0
Entering edit mode

I dont understand how you get a starting position from a sam file. I know in this example: M03972:51:000000000-BJVL8:1:1101:8544:11793 99 gi|11111113|ref|TL| 1058 42 127M = 1129 127

the starting position is 482. How do you get to that? I can the manual, but I thought left most position would be the starting position?

ADD REPLY
0
Entering edit mode

Show us what you have tried and we will help you fix your code.

Here are some additional hints: awk can split a SAM record using \t separator. You are then looking for three specific fields. I am not sure what "length" you are referring to. Is it the length of the read or length of the alignment?

ADD REPLY
0
Entering edit mode

Its the starting position really. So I have this:

M03972:51:000000000-BJVL8:1:1101:8544:11793 99  gi|11111113|ref|TL| 1058    42  127M    =   1129    127

I know how to extract M03972:51:000000000-BJVL8:1:1101:8544:11793 and 99. What I dont know is where the alignment starts. Where is the start position?

ADD REPLY
0
Entering edit mode

I answered that in my comment above. That table describes mandatory fields in a SAM file.

Table on page 5 (in the SAM spec file that @ATPoint linked above) has that information.

ADD REPLY
0
Entering edit mode

Hi, thank you (couldnt answer till now).

So from the example I gave, the starting position should be 482. How do you arrive to that number? Can you explain this to me? This is what I am struggling with.

ADD REPLY
0
Entering edit mode

I know how to extract M03972:51:000000000-BJVL8:1:1101:8544:11793 and 99. What I dont know is where the alignment starts.

https://samtools.github.io/hts-specs/SAMv1.pdf section " The alignment section: mandatory field"

(...) 4 POS Int 1-based leftmost mapping POSition (...)

ADD REPLY

Login before adding your answer.

Traffic: 1527 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6