I am just getting introduced to bioinformatics, and I am having a hard time. My PI has no background in bioinformatics and she is no help... she just shares "scripts" stored in Word docs where she copies and pastes commands x1M times...
I got confused and wanted to ask about some details here, hoping someone would help me understand.
I have a paired-end ChIP seq and RNA-seq data sets.
For analyzing both data sets (based on my PI's "scripts")
They jump straight to alignment where they align both reads separately.Then they use the sam files from both reads to create a tag directory for the sample. Here is an example to be more clear:
## Align reads with STAR
STAR \
--readFilesCommand zcat \
--genomeDir /STAR_indexes/mm10/ \
--runThreadN 24 \
--readFilesIn Samp1_Rep1_R1_001.fastq.gz \
--outFileNamePrefix Sample1_Rep1_R1_001_
STAR \
--readFilesCommand zcat \
--genomeDir /STAR_indexes/mm10/ \
--runThreadN 24 \
--readFilesIn Samp1_Rep1_R2_001.fastq.gz \
--outFileNamePrefix Sample1_Rep1_R2_001_
makeTagDirectory Sample1_Rep1 Sample1_Rep1_R1_001_Aligned.out.sam Sample1_Rep1_R2_001_Aligned.out.sam
Shouldn't both reads be aligned together? Or is this way also fine?
If reads are aligned separately as I had indicated and then those 2 sam files from R1 and R2 are used to created the TagDirectory -- does that mess up the TagDirectories in any way?
Versus if the reads were aligned in paired-end mode and the that single sam file was used to make the TagDirectory?
Or either way it does not make a difference when creating the TagDirectory?
Also... in this case my PI skips the trimming... wouldn't "non-trimmed" samples introduce some bias?
Sorry if these are stupid Q! Just hard to wrap my head around how this works.
Thanks!
Please use the formatting bar (especially the
code
option) to present your post better. You can use backticks for inline code (`text` becomestext
), or use one of (a) the option highlighted in the image below/ (b) fenced code blocks for multi-line code. Fenced code blocks are useful in syntax highlighting. If your code has long lines with a single command, break those lines into multiple lines with proper escape sequences so they're easier to read and still run when copy-pasted. I've done it for you this time.