Entering edit mode
8.0 years ago
SaltedPork
▴
170
I have used Bedtools to produce histogram data with:
$ genomeCoverageBed -ibam example.bam -g genome.txt > coverage.txt
I now have the frequency data for my histogram, but now I want to go back and find the read ID's. I've looked through the options on Bedtools and there doesn't seem to be one, any ideas?
What exactly do you mean by "find the read IDs"? For a specific interval? If so you could use
samtools view
with a bed file (Samtools And Region List ). If you truly just need read ID's, then let us know and an addition can be suggested to the solution with samtools.I have a list of reads which have high frequency/occurances. At the moment, all I can see is what chromosome they are from. I would be choosing the areas of high coverage and wanting to find the ID's for the reads from that region. Thanks for the response.
In case you want the full fastq records from the regions you have defined in
test.bed
you could dosamtools view your_aligned_sorted.bam -L test.bed | awk -F "\t" '{print "@"$1"\n"$10"\n""\+\n"$11}' > your_fastq
If you only need the fastq ID's then
samtools view SRR1972739_sorted.bam -L test.bed | awk -F "\t" '{print "@"$1}' > fastq_ID
Thanks, I now have the list of ID's that I want. Does the new file have ID's that correspond to my BED file, in the same order? Just checking, because I would like to just cut/paste these ID's into my file with the frequencies.
They should be in the same order (in terms of sorting) as that was present in your sorted bam file.
Great, Many thanks for your help!