Hi,
I am new to bulkRNA-seq and trying to convert my fastq files to count matrix so I could run a differential analysis. I have read it is suggested to use Salmon. I have a few questions:
My data consists of healthy and control patients, so I created an index to salmon using the gencode human transcriptome and mentioned in the command to create the index --gencode:
salmon index -t gencode.v41.transcripts.fa.gz -i salmon_index --gencode
When running quant method:
salmon quant -i salmon_index --libType A -1 ${prefix}_R1_001.fastq.gz -2 ${prefix}_R2_001.fastq.gz -o /bulkRNAseq_human/INCPMPM-14146.0/quant_samples/quant/${prefix};
I got a warning for all of my samples saying : "found no concordant and consistent mappings". When looking at some of the metadata info, I found most of them with above 90% mapped. Is it a problem? should I change anything?
- If I want to use salmon on bulk RNA data from mice but I see there is only one read for example, one fastq file look like this : FGC2321_s_1_AGCTCGCT-GCAGAATC.fastq, how am I supposed to run the quant method?
Thanks :)
What is the read length of the experiment? Please show the salmon logs. Single-end quantification for the 2nd question is covered in the salmon manual.
after running : gunzip -c H1_S11_L001_R1_001.fastq.gz | awk 'NR%4 == 2 {lengths[length($0)]++} END {for (l in lengths) {print l, lengths[l]}}' It is printed - 67 9145551. About the logs:
Can you provide the contents of the file
lib_format_counts.json
?Thanks! So it looks like you have reads mapping but almost always as orphans. You may want to look at running your data through repair.sh from the BBmap toolkit to "repair" or re-synchronize the files, as it looks like they may may somehow have gotten desynchronized.
That having said, dod you manipulate the data somehow, e.g. trimming, renaming of reads, something like that?
No, this is the raw data I received. I will try to use repair.sh than. Thanks!