Entering edit mode
5.2 years ago
Fawzi Yassine
▴
20
Hi,
I am doing how RNA-seq analysis on data that has two runs per (two sra
files) sample and each run has two fastq
files (forward and reverse reads). An example sample is: https://www.ncbi.nlm.nih.gov/sra?LinkName=biosample_sra&from_uid=2999520
How to align such a sample (hisat2 syntax is appreciated)?
How to write the phenodata file (phenotype data) such a sample?
regards,
These seem to be lane replicates: Confused about merging RNA-seq lanes/runs
Typically I would merge them at the fastq level but as dates differ quite a lot, I would process them separately and then check for potential batch effects e.g. by PCA. If there is no indication of that simply merge the BAM files.
Thanks for the reply/
How to merge the BAM files?
How to write the phenodata file (phenotype data) such a sample?
regards,
samtools merge
please read its manual for the correct syntax.What are phenotype data in that case? Align them independently and if they are ok, merge the BAM files and use whatever
phenotype data
you'd use if having a single file.This seems complicated! Can I just ignore the second run of each sample.
I have no insight into your data or analysis goals so I cannot comment on this. Aligning data and making some basic quality controls is not too complicated but an essential step that should be done before every analysis. Try to work it out. If you do not quality control your data I do not see how you could confidently stand up for your analysis. Read the
DESeq2
workflow at BIoconductor, it covers everything from alignment/quantification, creation of a count matrix and PCA. Some simple Pearson correlation on the datamight
suffice as well. Maybe simply merging data at fastq level is ok, too.You are right it’s better that I merge both runs of a each sample. In
samtools merge
can you tell me how to deal with the read group directive for this particular sample https://www.ncbi.nlm.nih.gov/sra?LinkName=biosample_sra&from_uid=2999520 Thanks,