How to pipe awk of bed file into samtools to extract fasta sequences?
1
0
Entering edit mode
3.2 years ago
mrj ▴ 180

I have a bed file (seq.bed) that contains "queryID queryStart queryEnd". Following is the example (the content of seq.bed file).

SRR5892231.6 28 178

SRR5892231.7 4 307

SRR5892231.7 16 307

SRR5892231.9 216 408

I would like to extract out sequences that have at least 100bp in their upstream regions. if I use the following awk,

awk 'OFS="\t" {print $1, $2-100, $2}' seq.bed

how can i use pipe to send the output to samtools for "samtools faidx seq.fna queryID:queryStar-queryEnd"?

Could you please help me figure out how to pass $1, $2-100, and $2 for samtools faidx seq.fna queryID:queryStar-queryEnd?

bedtools pipe samtools bash • 1.4k views
ADD COMMENT
2
Entering edit mode
3.2 years ago
alex.zaccaron ▴ 470

Piping to bedtools would be more straightforward:

awk 'OFS="\t" {if($2<100) print $1,"0",$2; else print $1, $2-100, $2}'  seq.bed | \
bedtools getfasta -fi seq.fasta -bed stdin

But if samtools is required, I would use a for loop:

awk '{if($2<=100) print $1":1-"$2; else print $1":"$2-100"-"$2}'  seq.bed  | while read LOCUS; do
   samtools faidx seq.fasta $LOCUS
done
ADD COMMENT
0
Entering edit mode

Alex, this is great. Thank you very much.

ADD REPLY

Login before adding your answer.

Traffic: 1629 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6