Where Can I Download Pirna (Piwi Rnas) Of Mouse (Mm9)
3
1
Entering edit mode
11.1 years ago
biorepine ★ 1.5k

Dear Biostars

Does some one know where can download mm9 piRNAs in BED or GTF format ?

Thanx in advance

• 5.2k views
ADD COMMENT
0
Entering edit mode

Can you tell more detail? Now I happened to need these coordinates in GRCm37 (mm9).

ADD REPLY
0
Entering edit mode

You can just use liftOver from UCSC to convert mm8 to mm9 coordinates.

ADD REPLY
0
Entering edit mode

http://pirnabank.ibab.ac.in/ seems to be down. Is there any other database link?

Thanks

ADD REPLY
0
Entering edit mode

i figure out how to download it ,go to ncbi nucleotide ,search piRNA and your species

ADD REPLY
0
Entering edit mode

As majority of the answers, pirna is usualy downloaded from pirnabank. All the coordination is quit old, for human is hg18 and for mice is mm8.

http://pirnabank.ibab.ac.in/request.html

However, if you blast these sequences and you will find the coordination is not perfect matched.

Take human piwi-RNA as example:

>hsa_piR_000011|gb|DQ569929|Homo sapiens:16:34608961:34608986:Plus
AAACUGACCAGAUGAAUGAGAAACCC

if you make a blast with UCSC you will find:

   ACTIONS      QUERY           SCORE START  END QSIZE IDENTITY CHRO STRAND  START    END      SPAN
---------------------------------------------------------------------------------------------------
browser details YourSeq           26     1    26    26 100.0%    16   +   34608961  34608986     26
browser details YourSeq           25     1    26    26 100.0%     9   +   31898448  31898965    518
browser details YourSeq           24     1    26    26  96.2%    19   +   62716276  62716301     26

Everyone should think about how to deal with this problem.

ADD REPLY
4
Entering edit mode
11.0 years ago

1) Download the compressed file from http://pirnabank.ibab.ac.in/Mouse.tar.gz . It is mm9 based.

2) Uncompress the file using tar -xzvf Mouse.tar.gz. It will take a few minutes and a new directory with name Mouse will be created. Each piRNA has its own fasta file with a header storing the alignment information. For example, >mmu_piR_037869|gb|DQ726753|Mus_musculus:1:93235274:93235303:Minus. You now need to parse this information.

3) Create a shell script Piwi_BED.sh outside the Mouse directory and paste the below code in it.


for file in Mouse/mmu_piR_*
do
    grep ">" $file | awk -F: '{print $2,"\t",$3,"\t",$4,"\t",$1,"\t",$5}'
done

4) Run the code sh Piwi_BED.sh > mm9_piwi.bed

5) Columns are chr, start. end , name and strand. You need to sort it and also convert strand value from "Plus" to "+" and Minus to "-".

ADD COMMENT
1
Entering edit mode

Very useful information, thanks! So, for completeness, you could do it like this:

for file in Mouse/mmu_piR_*; do grep ">" $file | awk -v OFS="\t" -F: '{print "chr"$2,$3,$4,$1,"1",$5}'; done | sed -e 's/Plus/+/g' -e 's/Minus/-/g' |awk 'NF > 0'| sort -k1,1 -k2,2n > mm8_piRNA.bed

Note that the coordinates in the archive appear to be from NCBIM36 (mm8) and not from mm9 (at least the few ones that I checked individually).

ADD REPLY
2
Entering edit mode
10.2 years ago

I happened to need these coordinates in GRCm38 (aka mm10) coordinates today. As karlos.klammer mentioned, the coordinates in the txt files are mm8, so I just converted them to the current coordinate system with Ensembl chromosome names and made a GTF out of it. Should someone need something like this in the future (I have no idea why piRNAbank makes it such a pain to just get coordinates, they obviously have them in a database), you can just download the GTF here and save yourself the hassle.

ADD COMMENT
0
Entering edit mode

I'm trying to create the same piRNA gtf file BUT for Human piRNAs (from piRNAbank).

I used the same command as karlos.klammer suggested here and did the liftOver in UCSC from hg18 to GRCh38. So, I currently have a bed file that looks like that:

$ head hglft_piRNA.bed
chr1    14630   14657   >hsa_piR_013426|gb|DQ588205|Homo        sapiens 1       +
chr1    18536   18563   >hsa_piR_005239|gb|DQ577218|Homo        sapiens 1       +
chr1    26806   26836   >hsa_piR_016792|gb|DQ593109|Homo        sapiens 1       -
chr1    32134   32160   >hsa_piR_019669|gb|DQ596983|Homo        sapiens 1       -

...and now I'm stuck...

I need to result in a file that looks exactly like the gtf you made (below), but how do I do this?? I'm unfortunately not a top-notch bioinformatician, so help is most appreciated :)

$ head piRNAs.GRCm38.gtf

        2       piRNAbank       exon    92542278        92542304        .       +       .       gene_id "mmu_piR_000001"; transcript_id "AB250975";
        2       piRNAbank       exon    92543694        92543719        .       +       .       gene_id "mmu_piR_000002"; transcript_id "AB250977";
        2       piRNAbank       exon    92546494        92546519        .       +       .       gene_id "mmu_piR_000003"; transcript_id "AB250979";
ADD REPLY
0
Entering edit mode
cat hglft_piRNA.bed | tr "\>\|" "\t" | awk '{printf("%s\tpiRNAbank\texon\t%s\t%s\t.\t%s\t.\tgene_id \"%s\"; transcript_id \"%s\";\n", $1, $2, $3, $9, $4, $6)}' > hglft_piRNA.gtf
ADD REPLY
0
Entering edit mode

Worked perfectly, thanks a lot :)

ADD REPLY
0
Entering edit mode

Hi I have a similar problem and lacking a really small step.

I needed exactly the same as hesco and basically did the same.

I converted to bed file with karlos klammers script and then get this

$ head hsa_piRNA-GRCh38.bed

1       14630   14657   >hsa_piR_013426|gb|DQ588205|Homo        sapiens 1       +
1       18536   18563   >hsa_piR_005239|gb|DQ577218|Homo        sapiens 1       +
1       26806   26836   >hsa_piR_016792|gb|DQ593109|Homo        sapiens 1       -
1       32134   32160   >hsa_piR_019669|gb|DQ596983|Homo        sapiens 1       -
1       39680   39710   >hsa_piR_014636|gb|DQ590030|Homo        sapiens 1       -

i guess because there was a space in the fasta there is now to columns instead of one with Homo and sapiens

So when I am now trying to make a gtf using Devon Ryon's script I get the following:

$ head hsa_piRNA-try.gtf

1       piRNAbank       exon    14630   14657   .       1       .       gene_id "hsa_piR_013426"; transcript_id "DQ588205";
1       piRNAbank       exon    18536   18563   .       1       .       gene_id "hsa_piR_005239"; transcript_id "DQ577218";
1       piRNAbank       exon    26806   26836   .       1       .       gene_id "hsa_piR_016792"; transcript_id "DQ593109";
1       piRNAbank       exon    32134   32160   .       1       .       gene_id "hsa_piR_019669"; transcript_id "DQ596983";

..........

so instead of the +/- I get the 1. I thought I could correct this by running this command instead, so replacing 9 with 10 to get the next 'column' but this doesn't work and will instead give the weird output below

$ cat hsa_piRNA-GRCh38.bed | tr ">\|" "\t" | awk '{printf("%s\tpiRNAbank\texon\t%s\t%s\t.\t%cas\t.\tgene_id \"%s\"; transcript_id \"%s\";\n", $1, $2, $3, $10, $4, $6)}' > hsa_piRNA-GRCh38.gtf

$ head hsa_piRNA-GRCh38-v3.gtf

1       .iRNAbangene_id "hsa_piR_013426"; transcript_id "DQ588205";
1       .iRNAbangene_id "hsa_piR_005239"; transcript_id "DQ577218";
1       .iRNAbangene_id "hsa_piR_016792"; transcript_id "DQ593109";
1       .iRNAbangene_id "hsa_piR_019669"; transcript_id "DQ596983";
1       .iRNAbangene_id "hsa_piR_014636"; transcript_id "DQ590030";

Maybe someone with more bioinfo knowledge has a quick idea how to fix this? Thank you so much!

ADD REPLY

Login before adding your answer.

Traffic: 1610 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6