Dear Biostars
Does some one know where can download mm9 piRNAs in BED or GTF format ?
Thanx in advance
Dear Biostars
Does some one know where can download mm9 piRNAs in BED or GTF format ?
Thanx in advance
1) Download the compressed file from http://pirnabank.ibab.ac.in/Mouse.tar.gz . It is mm9 based.
2) Uncompress the file using tar -xzvf Mouse.tar.gz. It will take a few minutes and a new directory with name Mouse will be created. Each piRNA has its own fasta file with a header storing the alignment information. For example, >mmu_piR_037869|gb|DQ726753|Mus_musculus:1:93235274:93235303:Minus. You now need to parse this information.
3) Create a shell script Piwi_BED.sh outside the Mouse directory and paste the below code in it.
for file in Mouse/mmu_piR_*
do
grep ">" $file | awk -F: '{print $2,"\t",$3,"\t",$4,"\t",$1,"\t",$5}'
done
4) Run the code sh Piwi_BED.sh > mm9_piwi.bed
5) Columns are chr, start. end , name and strand. You need to sort it and also convert strand value from "Plus" to "+" and Minus to "-".
Very useful information, thanks! So, for completeness, you could do it like this:
for file in Mouse/mmu_piR_*; do grep ">" $file | awk -v OFS="\t" -F: '{print "chr"$2,$3,$4,$1,"1",$5}'; done | sed -e 's/Plus/+/g' -e 's/Minus/-/g' |awk 'NF > 0'| sort -k1,1 -k2,2n > mm8_piRNA.bed
Note that the coordinates in the archive appear to be from NCBIM36 (mm8) and not from mm9 (at least the few ones that I checked individually).
I happened to need these coordinates in GRCm38 (aka mm10) coordinates today. As karlos.klammer mentioned, the coordinates in the txt files are mm8, so I just converted them to the current coordinate system with Ensembl chromosome names and made a GTF out of it. Should someone need something like this in the future (I have no idea why piRNAbank makes it such a pain to just get coordinates, they obviously have them in a database), you can just download the GTF here and save yourself the hassle.
I'm trying to create the same piRNA gtf file BUT for Human piRNAs (from piRNAbank).
I used the same command as karlos.klammer suggested here and did the liftOver in UCSC from hg18 to GRCh38. So, I currently have a bed file that looks like that:
$ head hglft_piRNA.bed
chr1 14630 14657 >hsa_piR_013426|gb|DQ588205|Homo sapiens 1 +
chr1 18536 18563 >hsa_piR_005239|gb|DQ577218|Homo sapiens 1 +
chr1 26806 26836 >hsa_piR_016792|gb|DQ593109|Homo sapiens 1 -
chr1 32134 32160 >hsa_piR_019669|gb|DQ596983|Homo sapiens 1 -
...and now I'm stuck...
I need to result in a file that looks exactly like the gtf you made (below), but how do I do this?? I'm unfortunately not a top-notch bioinformatician, so help is most appreciated :)
$ head piRNAs.GRCm38.gtf
2 piRNAbank exon 92542278 92542304 . + . gene_id "mmu_piR_000001"; transcript_id "AB250975";
2 piRNAbank exon 92543694 92543719 . + . gene_id "mmu_piR_000002"; transcript_id "AB250977";
2 piRNAbank exon 92546494 92546519 . + . gene_id "mmu_piR_000003"; transcript_id "AB250979";
Hi I have a similar problem and lacking a really small step.
I needed exactly the same as hesco and basically did the same.
I converted to bed file with karlos klammers script and then get this
$ head hsa_piRNA-GRCh38.bed
1 14630 14657 >hsa_piR_013426|gb|DQ588205|Homo sapiens 1 +
1 18536 18563 >hsa_piR_005239|gb|DQ577218|Homo sapiens 1 +
1 26806 26836 >hsa_piR_016792|gb|DQ593109|Homo sapiens 1 -
1 32134 32160 >hsa_piR_019669|gb|DQ596983|Homo sapiens 1 -
1 39680 39710 >hsa_piR_014636|gb|DQ590030|Homo sapiens 1 -
i guess because there was a space in the fasta there is now to columns instead of one with Homo and sapiens
So when I am now trying to make a gtf using Devon Ryon's script I get the following:
$ head hsa_piRNA-try.gtf
1 piRNAbank exon 14630 14657 . 1 . gene_id "hsa_piR_013426"; transcript_id "DQ588205";
1 piRNAbank exon 18536 18563 . 1 . gene_id "hsa_piR_005239"; transcript_id "DQ577218";
1 piRNAbank exon 26806 26836 . 1 . gene_id "hsa_piR_016792"; transcript_id "DQ593109";
1 piRNAbank exon 32134 32160 . 1 . gene_id "hsa_piR_019669"; transcript_id "DQ596983";
..........
so instead of the +/- I get the 1. I thought I could correct this by running this command instead, so replacing 9 with 10 to get the next 'column' but this doesn't work and will instead give the weird output below
$ cat hsa_piRNA-GRCh38.bed | tr ">\|" "\t" | awk '{printf("%s\tpiRNAbank\texon\t%s\t%s\t.\t%cas\t.\tgene_id \"%s\"; transcript_id \"%s\";\n", $1, $2, $3, $10, $4, $6)}' > hsa_piRNA-GRCh38.gtf
$ head hsa_piRNA-GRCh38-v3.gtf
1 .iRNAbangene_id "hsa_piR_013426"; transcript_id "DQ588205";
1 .iRNAbangene_id "hsa_piR_005239"; transcript_id "DQ577218";
1 .iRNAbangene_id "hsa_piR_016792"; transcript_id "DQ593109";
1 .iRNAbangene_id "hsa_piR_019669"; transcript_id "DQ596983";
1 .iRNAbangene_id "hsa_piR_014636"; transcript_id "DQ590030";
Maybe someone with more bioinfo knowledge has a quick idea how to fix this? Thank you so much!
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Can you tell more detail? Now I happened to need these coordinates in GRCm37 (mm9).
You can just use liftOver from UCSC to convert mm8 to mm9 coordinates.
http://pirnabank.ibab.ac.in/ seems to be down. Is there any other database link?
Thanks
i figure out how to download it ,go to ncbi nucleotide ,search piRNA and your species
As majority of the answers, pirna is usualy downloaded from pirnabank. All the coordination is quit old, for human is hg18 and for mice is mm8.
http://pirnabank.ibab.ac.in/request.html
However, if you blast these sequences and you will find the coordination is not perfect matched.
Take human piwi-RNA as example:
if you make a blast with UCSC you will find:
Everyone should think about how to deal with this problem.