Entering edit mode
4.3 years ago
the_darkside
•
0
Hello, I am new to bioinformatics analysis so any simple guidance or explanation would be appreciated:
I have a fastq
file downloaded from NCBI: link
The file includes Illumina paired-end whole exome sequence (WES) data, which I assume includes exomes from both tumor and normal tissue because the study that produced this data aimed to analyze somatic mutations. By looking at the first 8 lines of the file, I am curious, how should one determine reads from the normal tissue vs reads from the tumor tissue ?
head -n 8 ERR1527288.fastq
@ERR1527288.1 1 length=200
TAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGCCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCA
+ERR1527288.1 1 length=200
@CCFFDDDHHHDFHIHJAFGHIJA?GHIJ:DGIJJ?FHIJJFHFGGI==CGII8CDGGI=ACEHF?B;;CE.;(55=59ABDD5<9?CB><AAAB9<<ACCCCFFFFFHHHGHJIGIJJJJJJJIGIGIJJIJJJJJJJIJJJJJIGIE;=@3CGEGIGGFEC?@DB@DECCA?B23=A?BB??B2<99?A<?ACCB###
@ERR1527288.2 2 length=200
CTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTACTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGATAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGATAGGGTTAG
+ERR1527288.2 2 length=200
@@CFFFFFHFHHFGIGGBFCGH@FA?EGHE@GEHIGGEGHGE@GDDHCCCG@GFF>EADE=AEADE=?D;@;?;?5;=A?C3<9(5<9<?B#########?@?DDD=BFA;CCE:AEBGGIIIGHG?E>A1?DDBB?F;D>>FF;B@G;BFFCG).@@GI=CAA>?)(((6;;?5;CB399<AB?BBBDD##########
You don't :) information about which is normal which is tumor is saved separately, not in fastq file. For a sequencing machine there is no difference if one sequence normal or tumor tissue.
This is likely either tumor or normal file.