Wednesday, January 28, 2009

Genetics

Since a number of people ask me what I do every day (zero is a number, look it up), I figured I'd give anyone interested a little summary. It's kind of funny to think about how much my field has changed since I started, since I just got my first real job in that field, but I guess that's science for you.

The goal of my field is to figure out what genes contribute to disease risk. After Gregor Mendel, tired of being teased about the lack of a "y" at the end of his first name, joined a man convent and channeled his sexual frustration into breeding pea plants, people figured out that genes change the way people and peas look, act, and get diseases. Years later Watson, Crick, and Rosalind Franklin (the undercredited woman who actually did the work upon which the boys based their theory) figured out what DNA looked like, and the hunt for actual disease genes was on. In the early goings, some egghead figured out that if you stain DNA with certain dyes it turns all stripey:

Then some different egghead figured out that people with certain disease have different stripey patterns than people without those diseases. Being eggheads, they correctly assumed that the genes that caused those diseases lie somewhere within the stripe that is different. The problem with this is that very few diseases are caused by a large enough genetic changes to show up in the stripey pattern. Furthermore, each stripe contains hundreds and hundreds of genes due to a miracle of packaging that allows "miles" of DNA to fit into the little structures seen above:

So if you want to nail down the culprit gene you need better resolution. Luckily, someone started noticing these "short tandem repeat" sequences spaced throughout the genome. Basically, once in a blue moon, the enzymes that copy DNA before cell division "skip" like a party house CD, causing a sequence that should have been T&A to be copied as T&A&T&A&T&A&T&A. Since this is exceedingly rare, we can assume you got yours from your parents. Side note: we share some of these repeats with chimps; why would a non-functional marker appear in exactly the same spot in humans and chimps if they didn't get it from a common ancestor? But that's another blog post. These repeats don't do anything and generally don't even lie within genes (if they happened in the middle of a gene, they would screw up that gene and that gene would not work) but they are a great way to figure out if you got a chunk of DNA from your father or your mother and whether you share that chunk of DNA with your siblings. If you share a chunk of DNA with your siblings and/or parents, and you also share a trait with that relative, we hypothesize that there is some gene within that chunk that may affect the trait. But again, the problem is resolution. We usually inherit a whole chromosome from a parent, one from mom and one from dad, so you might think the best we could do is say "the disease gene is on the dad chromosome." But luckily for us, there's a genetic phenomenon known as crossover. Basically, when a mommy chromosome and a daddy chromosome love each other very much, they lay down next to each other in the nucleus during prophase 1 of meiotic cell division and trade parts (usually, they get back the stretch of chromosome corresponding to the one they donated, but occasionally there's a mix up and big problems can result). Crossovers happen, on average, 34 times per baby. So when you get a sample of a whole bunch of families and look at the sections of DNA shared by people with similar traits in aggregate, you can narrow down the region that contains the gene that is causing the trait similarity considerably. This is called "linkage analysis," and we've found a fair number of disease genes this way.

But it hasn't worked out nearly as well as we hoped for several reasons. The primary reason is that most diseases aren't caused by a single gene. The ones we're most concerned with (hypertension, diabetes, cancer) are affected by many, many genes acting in concert to affect your overall risk of disease. No single gene has a large enough effect to easily detect. Another problem, again, is resolution. Although crossovers happen fairly frequently when you get a big enough sample of families, the best we can do is narrow down a region on a chromosome containing twenty to several hundred genes. Knowing that one of a set of a hundred genes probably contributes to disease isn't all that helpful.

Instead of sequencing relatively few markers that give us information about genetic similarity over a relatively wide region of a chromosome, people started realizing that there were single base changes (these are called single nucleotide polymorphisms, or SNPs, and happen when, like, a T becomes an A) basically everywhere, and if we sequenced a boatload of these, we could narrow down disease regions better. The problem was that this was expensive to do, on the order of 2-3 dollars per SNP back in the day. Multiply that by the half million or so SNPs you need in a several hundred people and pretty soon you're talking about real money. But much like iphones going from 500 to 200 dollars, we can now get 1.2 million SNPs typed on a person for about 400 bucks. There are now many studies with several thousand participants with a million SNPs each, which should be plenty to find the major genetic contributors to a given disease.

But guess what, again it hasn't worked out that way. We've found some disease genes to be sure, but not nearly the number we know must exist. One explanation is that common diseases are caused by tons of different rare mutations that all have the same end result. People are working on sequencing people's entire genomes to see if that's the case. Recall how the human genome project was supposed to take 20 or so years and cost billions to sequence a whole genome? Well, now we can do it in a few days at a cost of 300 grand. Pretty nuts. An alternate explanation for our lack of success in finding the reasons that both you and your mom are fat is that genetics is way more complicated than we thought. It may be that DNA sequence differences are not the primary cause of differential disease risk. But that's a whole nother issue. So now you know all you need to know about genetics. No stealing my job.

No comments: