It has been awhile. I grew tired of programming at work then coming home and programming some more. I've been working of my genealogy for awhile now. I'd like to bounce some ideas off you.
When I had my DNA tested by Ancestry.com then extracted the results as a text file I found it contained some documentation followed by the documentation and the start of the data as indicated below:
#Genetic data is provided below as five TAB delimited columns. Each lineOne average each generation has a single mutation across the 23 pairs. Most of the mutations will happen in the non-coding regions so will have no impact on the SNPs mentioned above. Most mutations will be transitions, between A and G or C and T, rather than transpositions. Deletions and insertions are very rare. The allele position is based upon alignment against the model so deletions are indicated by a 0. I don't know how insertions are handled, maybe a repetition of rsID and position.
#corresponds to a SNP. Column one provides the SNP identifier (rsID where
#possible). Columns two and three contain the chromosome and basepair position
#of the SNP using human reference build 37.1 coordinates. Columns four and five
#contain the two alleles observed at this SNP (genotype). The genotype is reported
#on the forward (+) strand with respect to the human reference.
rsid chromosome position allele1 allele2
rs4477212 1 82154 T T
rs3131972 1 752721 A G
rs12562034 1 768448 G G
...
rs6517463 21 39752673 0 0
There were a bit over 700,000 line for 22 chromosomes. Chromosomes 23, 24, and 25
represent the x unique, y unique, and shared x/y alleles though not necessarily in that order. That's about 30,000 identified bases per strand. Ancestry selected the sites tested because they represent common variations within the human population. 4 to the 30,000 power seems large enough that no two individuals should ever have identical chromosomes but that doesn't really make sense given the way we come by them.
Except for very rare cases one shares 23 chromosomes with one's father and 23 with one's mother. Even if both parents had completely unique chromosomes I will share 50% with each. My sister also shares 50% of her chromosomes with each but not the same ones. At least the Y I get from my father's father where my sister gets one of my father's mother's X chromosomes. All sisters will share the same X chromosome from their paternal grandmother.
Most of the time the siblings will share about 12 chromosomes with each other. The number can go up or down based upon a normalized distribution. Cousins will share, based upon a normal distribution, around 5 chromosomes and second cousins will share around 2 chromosomes. It is highly unlikely there will be any point mutations within the documented sites within 4 generations.
Without knowing Ancestry's algorithm I am assuming confidence level 95% means one unmodified chromosome, Moderate means one mutation, low means two mutations, and very low means 3 mutations.
I have 16 unique great-great-grandparents. I only have 30 unique 3rd great grandparents. It's likely I don't share any DNA with some of them, at least through them. Who know, I may share 2 chromosomes with a few of them. There's no way I can share DNA through all of my 4th great grandparents because I only have 46 chromosomes to share amongst them.
I happen to have access to my sister's, my two parents', and my DNA tests. Just from the DNA hints on ancestry.com I can say that when my parent's confidence level in relationship to an individual is higher than mine and both are no more than 95% then a point mutation at a tested site has happened. When several cousins share the same confidence level for the very same portion of the tree no mutation has happened and the same chromosome is being shared. When the confidence drops the mutation happened in the unshared region of the tree.
No comments:
Post a Comment