Entering edit mode
6.3 years ago
vellryba
•
0
Hello,
I would like to get a text file with just the Read name, start position and the length using my reference file and sam/bam file.
Like this:
Read Name Start Position Length
M03972:51:000000000-BJVL8:1:1113:16722:4595 1 66
M03972:51:000000000-BJVL8:1:1114:23117:18542 1 98
Can someone please help with the command for this?
Thank you very much!
Hi vellryba, welcome to Biostars.
This is the structure of a SAM file:
...and these are the specs of SAM files. Given you read that, I am sure you can find a solution, using Unix tools like
cut
orawk
. In my experience, you learn much more by solving these basic problems yourself, rather than getting a ready-to-use script. If you have problems, feel free to ask.Hi, I have a problem working out where the alignment starts. I would really appreciate some help.
Table on page 5 (in the SAM spec file that @ATPoint linked above) has that information.
I agree with ATPoint. Spoon feeding will only harm you
I dont normally work with NGS, I just need to to do this one thing. I am extremely short on time. This might loose me days since I am an imposed deadline. Normally, I would agree but this is not my line of work and I am under big pressure to get the preliminary results quickly.
time pressure
is not a good excuse to not learn some basics about the task you have been asked to do. In this case the answer is simple and will definitely not take you days or even an hour.Well...It has been over two hours since I tried to do this.....This question isnt the first thing I did...
I am really not after a lecture. I am after help
In fact, everybody is trying to help you in some way or the other. What everybody wants to know your genuine efforts to find the solution.
I dont understand how you get a starting position from a sam file. I know in this example: M03972:51:000000000-BJVL8:1:1101:8544:11793 99 gi|11111113|ref|TL| 1058 42 127M = 1129 127
the starting position is 482. How do you get to that? I can the manual, but I thought left most position would be the starting position?
Show us what you have tried and we will help you fix your code.
Here are some additional hints:
awk
can split a SAM record using\t
separator. You are then looking for three specific fields. I am not sure what "length" you are referring to. Is it the length of the read or length of the alignment?Its the starting position really. So I have this:
I know how to extract M03972:51:000000000-BJVL8:1:1101:8544:11793 and 99. What I dont know is where the alignment starts. Where is the start position?
I answered that in my comment above. That table describes
mandatory fields in a SAM file
.Hi, thank you (couldnt answer till now).
So from the example I gave, the starting position should be 482. How do you arrive to that number? Can you explain this to me? This is what I am struggling with.
https://samtools.github.io/hts-specs/SAMv1.pdf section " The alignment section: mandatory field"