I've got these links:
data/NA18603/sequence_read/ERR000103.filt.fastq.gz
data/NA18603/sequence_read/ERR000103_1.filt.fastq.gz
data/NA18603/sequence_read/ERR000103_2.filt.fastq.gz
data/NA18542/sequence_read/ERR000104.filt.fastq.gz
data/NA18542/sequence_read/ERR000104_1.filt.fastq.gz
data/NA18542/sequence_read/ERR000104_2.filt.fastq.gz
data/NA18582/sequence_read/ERR000105.filt.fastq.gz
data/NA18582/sequence_read/ERR000105_1.filt.fastq.gz
data/NA18582/sequence_read/ERR000105_2.filt.fastq.gz
data/NA18592/sequence_read/ERR000106.filt.fastq.gz
data/NA18592/sequence_read/ERR000106_1.filt.fastq.gz
data/NA18592/sequence_read/ERR000106_2.filt.fastq.gz
data/NA18605/sequence_read/ERR000107.filt.fastq.gz
data/NA18605/sequence_read/ERR000107_1.filt.fastq.gz
data/NA18605/sequence_read/ERR000107_2.filt.fastq.gz
data/NA18592/sequence_read/ERR000108.filt.fastq.gz
data/NA18592/sequence_read/ERR000108_1.filt.fastq.gz
data/NA18592/sequence_read/ERR000108_2.filt.fastq.gz
data/NA12234/sequence_read/ERR000130.filt.fastq.gz
data/NA12234/sequence_read/ERR000130_1.filt.fastq.g
now im trying to figure out:
1) are they all from the same human ?
2) why does the NA18* number change ?
3) why there are 3 versions of each ERR000* file (I thought matched were 2 (paired reads))
the sequence alignments i've run before were grouped by chromosome. So then these simply have the entire 23 chromosomes in a file each ?
Yeah, should be. I've never checked the alignment files from 1K genomes, but generally, alignment output files are for all aligned regions. For 1K genomes, that means all autosomes, sex chromosomes, mitochondrial chromosome, and non-chromosomal supercontigs. The link to the 1K genome project's description of the alignment protocol is here: ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/README.human_g1k_v37.fasta.txt
To be clear, that's all aligned regions in the one alignment output file...