Entering edit mode
6.1 years ago
eDNAuRNA
▴
20
Hi everyone,
I have a bed file (tab separated columns) with hundreds of genomic coordinates as follows.
chr1 88833393 88834022 EXr19 1 +
chr1 22531002 22531628 EXr20 1 +
chr1 10355070 10355696 EXr21 1 +
I am trying to query a gemini database by using a genomic region based query as follows.
gemini query --header --show-samples --region 1:88833393-88834022 -q "select * from variants" gemini.db >> output.tsv
Is there a way I can generate a query for each genomic coordinate given in the bed file automatically? An urgent help will be appreciated.
Thanks
Hi Fin,
thanks a bunch. This worked like a charm. I need a quick modification. The query shouldn't include "chr" from the bed file. The code you shared includes "chr" in the output and it won't work like this. Can you please suggest how to avoid adding "chr" in the output? Right now following query is being generated.
Secondly, can you please explain how the code you suggested actually works? If you don't have time, please point me to a tutorial. Thirdly, what --dry-run is doing and what will happen if i remove it?
Thanks again, I am very close to solve a problem I was facing for two months.
Cheers,
Hello,
a good introduction to
parallel
is here in biostars :)What my code do is, to start for each line in the
regions.bed
the command between the quotation marks. With--colsep "\t"
we also tell that there are multiple arguments in each line delimited by a tab. Doing so we can use the placeholders{n}
in the command.With
--dry-run
we forceparallel
to not execute the command and just print out the command it will use instead. This is good for having a look, if everything of our input parameters is parsed correct. To finally execute the commands we need to remove the option.To get rid of the
chr
we can usesed
and pipe the result toparallel
:fin swimmer
Hi Fin,
You are amazing. Its working perfectly :)
Thanks a bunch. Have a great weekend.
Best,