Hello Everyone,
I was hoping to see if anyone has insight on how to process/analyze the following type of data.
My lab in developing a new method of RNA in vitro transcription. To validate our approach, we sent reversed transcribed samples off for illumine paired-end sequencing.
Our general workflow for this process is as follows:
- RNA synthesis via in vitro transcription (using established methods or our new method)
- 3’ end oligo ligation to the RNA products from step 1
- Primer annealing and reverse transcription
- PCR amplification of the reversed transcribed product
- Purification
- Sequencing
We are particularly interested in the following: -Comparing 3’ end homogeneity of our RNA samples. -Comparing fidelity of our method vs. established methods (That is, what is the general error rate in transcription we see between control samples and samples made using our method.
Currently, I have the R1 and R2 fatsq.gz files for each sample. Since this in in vitro transcribed products I also have my own sequence that will need to be used as a reference. So far, I have found several online resources (tutorials/general workflows) for different types of analysis. I have no experience with doing this type of analysis but would at least like to investigate what needs to be done to do the analysis.
My specific questions are:
- Does anyone have suggested resources for how I should do my data processing (alignment/mapping to my custom reference and such)?
- Any recommendations for software for the alignment/mapping?
- Suggestions for resources on how to do basic analysis of this type of data?
- Any recommendations for software for this type of analysis?
- I am open to any other recommendations people may have.
Some other information:
- We used Genewiz Amplicon-EZ (150-500 bp) service for the actual sequencing.
- Our amplicons are approximately 200 base pairs in length.
- We are also getting sequencing done at one of core facilities. They will also do an analysis for us, but I would very much like to compare the data between our core facility and what we got from Genewiz.
Best
BD
Honestly, any popular aligner will do: STAR, bowtie2, bwa, etc. Just make a FASTA file containing your sequences of interest, build an index using the alignment software, and run your alignment software on your paired-end FASTQ files using that index. You'll get your "counts" for each product and a BAM file (to assess coverage).
Before that, you'll want to run a FASTQC quality check and, if adapters are present, trim those off using cutadapt or similar software.
tl;dr you're not doing anything fancy beyond any other standard RNAseq analysis. Reading the manual/tutorials for the set of software (e.g. fastqc+cutadapt+star) that you decide you want to use should get you where you need to be :)