Entering edit mode
4.1 years ago
bertb
▴
20
Hello,
I am trying to eventually run CollectRnaSeqMetrics using PICARD, with the following command:
find UWN_t2.bam -exec echo java -jar ~/workspace/rnaseq/student_tools/picard.jar CollectRnaSeqMetrics I={} O=picard/{}.RNA_Metrics REF_FLAT=$RNA_HOME/refs/sacCer3.ncbiRefSeq.ref_flat.txt STRAND=SECOND_READ_TRANSCRIPTION_STRAND RIBOSOMAL_INTERVALS=$RNA_HOME/refs/ref_ribosome.interval_list \; | sh
but I one of the lines of the output gave me the following warning:
WARNING CollectRnaSeqMetrics The RIBOSOMAL_INTERVALS file, /home/user/workspace/rnaseq/refs/ref_ribosome.interval_list does not contain intervals
I looked back through some of the files, and noticed that my .dict file looks just like my ref_ribosome.interval_list file:
cat sacCer3.dict
@HD VN:1.5
@SQ SN:chrI LN:230218 M5:6681ac2f62509cfc220d78751b8dc524 UR:file:/home/user/workspace/rnaseq/refs/sacCer3.fa
@SQ SN:chrII LN:813184 M5:97a317c689cbdd7e92a5c159acd290d2 UR:file:/home/user/workspace/rnaseq/refs/sacCer3.fa
@SQ SN:chrIII LN:316620 M5:54f4a74aa6392d9e19b82c38aa8ab345 UR:file:/home/user/workspace/rnaseq/refs/sacCer3.fa
@SQ SN:chrIV LN:1531933 M5:74180788027e20df3de53dcb2367d9e3 UR:file:/home/user/workspace/rnaseq/refs/sacCer3.fa
@SQ SN:chrIX LN:439888 M5:4eae53ae7b2029b7e1075461c3eb9aac UR:file:/home/user/workspace/rnaseq/refs/sacCer3.fa
@SQ SN:chrV LN:576874 M5:d2787193198c8d260f58f2097f9e1e39 UR:file:/home/user/workspace/rnaseq/refs/sacCer3.fa
@SQ SN:chrVI LN:270161 M5:b7ebc601f9a7df2e1ec5863deeae88a3 UR:file:/home/user/workspace/rnaseq/refs/sacCer3.fa
@SQ SN:chrVII LN:1090940 M5:a308c7ebf0b67c4926bc190dc4ba8ed8 UR:file:/home/user/workspace/rnaseq/refs/sacCer3.fa
@SQ SN:chrVIII LN:562643 M5:f66a4f8eef89fc3c3a393fe0210169f1 UR:file:/home/user/workspace/rnaseq/refs/sacCer3.fa
@SQ SN:chrX LN:745751 M5:6757b8c7d9cca2c56401e2484cf5e2fb UR:file:/home/user/workspace/rnaseq/refs/sacCer3.fa
@SQ SN:chrXI LN:666816 M5:e72df2471be793f8aa06850348a896fa UR:file:/home/user/workspace/rnaseq/refs/sacCer3.fa
@SQ SN:chrXII LN:1078177 M5:77945d734ab92ad527d8920c9d78ac1c UR:file:/home/user/workspace/rnaseq/refs/sacCer3.fa
@SQ SN:chrXIII LN:924431 M5:073f9ff1c599c1a6867de2c7e4355394 UR:file:/home/user/workspace/rnaseq/refs/sacCer3.fa
@SQ SN:chrXIV LN:784333 M5:188bca5110182a786cd42686ec6882c6 UR:file:/home/user/workspace/rnaseq/refs/sacCer3.fa
@SQ SN:chrXV LN:1091291 M5:7e02090a38f05459102d1a9a83703534 UR:file:/home/user/workspace/rnaseq/refs/sacCer3.fa
@SQ SN:chrXVI LN:948066 M5:232475e9a61a5e07f9cb2df4a2dad757 UR:file:/home/user/workspace/rnaseq/refs/sacCer3.fa
@SQ SN:chrM LN:85779 M5:71c39cf065b8d574f636b654c274cf1b UR:file:/home/user/workspace/rnaseq/refs/sacCer3.fa
I'm guessing this is not what the .dict file is supposed to look like. The yeast fasta file sacCer3.fa seems fine, and so I'm wondering why when I create the .dict file with:
java -jar ~/workspace/rnaseq/student_tools/picard.jar CreateSequenceDictionary R=sacCer3.fa O=sacCer3.dict
the output is
********** NOTE: Picard's command line syntax is changing.
**********
********** For more information, please see:
********** https://github.com/broadinstitute/picard/wiki/Command-Line-Syntax-Transition-For-Users-(Pre-Transition)
**********
********** The command line looks like this in the new syntax:
**********
********** CreateSequenceDictionary -R sacCer3.fa -O sacCer3.dict
**********
11:22:22.142 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/user/workspace/rnaseq/student_tools/picard.jar!/com/intel/gkl/native/libgkl_compression.so
[Wed Oct 14 11:22:22 EDT 2020] CreateSequenceDictionary OUTPUT=sacCer3.dict REFERENCE=sacCer3.fa TRUNCATE_NAMES_AT_WHITESPACE=true NUM_SEQUENCES=2147483647 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
[Wed Oct 14 11:22:22 EDT 2020] Executing as user@user-VirtualBox on Linux 5.4.0-48-generic amd64; OpenJDK 64-Bit Server VM 11.0.8+10-post-Ubuntu-0ubuntu120.04; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.18.15-SNAPSHOT
[Wed Oct 14 11:22:22 EDT 2020] picard.sam.CreateSequenceDictionary done. Elapsed time: 0.02 minutes.
Runtime.totalMemory()=373030912
... and a strange file is created.
Any help greatly appreciated! Thanks
Note: Running on a VM
Why do you think that? The dict file looks fine to me.
Ok, I didn't realize that was what it was supposed to look like. But is the .interval_list file supposed to look the same?
When I create that file:
I get a file that looks like this:
Is this right? Just wondering where I've gone wrong here
The interval list is supposed to contain all intervals from the source BED file. Do the chromosome names match between the BED and DICT files?
I think it's a problem with my ref_ribosome.gtf file, which I create before converting it to a .bed file. The original yeast reference gtf file (sacCer3.ncbiRefSeq.gtf) looks fine, but when I try to extract the location of the ribosomal sequences:
...the file that gets created looks empty. Is there a problem with this command?
Thanks,
The command is fine, but if it results in an empty output, all that tells me is that there is no
rrna
string in the gtf file. Are you sure rRNA entries definitely contain the stringrrna
? I'm assuming you're looking for rRNA entries - let me know if that is not the case.You were right - looking in the .gtf file, there were no rrna strings. I had to specifically pull out the gene names:
Thanks for your help with this. By the way, if this is not too much of a departure from this thread, what is the most common way to view the CollectRnaSeqMetrics output, for instance, in an Excel file? Thanks