I am looking for novel sequence insertions identified in the 1000 genomes project, and I found 3 files in this directory:
ftp://ftp.ebi.ac.uk/pub/databases/dgva/estd59_Durbin_et_al_2010/gvf/estd59_Durbin_2010_highquality_novel_sequence_insertion_pilot2.gvf
ftp://ftp.ebi.ac.uk/pub/databases/dgva/estd59_Durbin_et_al_2010/gvf/estd59_Durbin_2010_highquality_mobile_element_insertion_pilot1.gvf
ftp://ftp.ebi.ac.uk/pub/databases/dgva/estd59_Durbin_et_al_2010/gvf/estd59_Durbin_2010_highquality_mobile_element_insertion_pilot2.gvf
It seems for non-mobile element insertions, there is only about 400 novel sequence insertions. Is there any other place where I can find more?
EDIT: for mobile elements, Casey Bergman's answer seems to be the best out there. Still, out of 7830 entries in the table, only 3089 sequences are given for the predictions in this table, the rest being blank.
I believe the SV people has a consensus about how to define "novel". Every paper I read on "novel" sequences/insertions define "novel" essentially the same way.
Mobile element insertions are not novel.
Neither are segmental duplications or CNVs for that matter. Virtually all new sequence come from pre-exisiting sequences in the genome. I think "novel" here is shorthand "not in the reference genome".
I believe the SV people has a consensus about how to define "novel". Even paper I read on "novel" sequences/insertions define "novel" essentially the same way.