We improved the genome map for cattle

The author is an associate professor of dairy cattle genetics at Penn State University.

Chad Dechow

Just like the early European explorers who made the first maps, every world map improved with each expedition. The same case applies to studying DNA.

The Council on Dairy Cattle Breeding (CDCB) introduced subtle changes to its genomic evaluation system this past December. One change was looking at more DNA . . . from about 60,000 (or 60K) DNA markers to nearly 80K. Those DNA markers (or SNPs) are now mapped to a new bovine reference genome.

A reference genome is basically a map of a species’ DNA sequence that geneticists use to evaluate and share genomic information. The new bovine reference genome is referred to as ARS-UCD. This acronym indicates the organizations that were involved with generating the new assembly . . . scientists from USDA’s Agricultural Research Service (ARS) and from the University of California-Davis (UCD).

Why the new look?

You might be wondering why we needed a new reference assembly since the cattle genome was sequenced some time ago. In fact, the first genome assembly was released in 2004.

To say that any species sequence is assembled is somewhat of a relative term, however. The first draft of the human genome sequence was announced in 2000 and then “completed” in 2003. And yet, 15 years after it was “completed,” we still have fairly large sequence chunks of unknown or “unplaced” location.

Assembling a genome presents many challenges. We don’t have the capability of taking an animal’s DNA and reading it from one end to the other as one might read a book. Rather, we take many copies of the genome, break it into shorter segments, at random, and sequence each piece separately. Since we use multiple copies of the genome, each spot will be sequenced many times, but in different segments. In most instances, the sequence of one piece will overlap with the sequence of another piece, which allows us to fit all the pieces together like a giant puzzle.

Works well, but

This strategy works well for much of the genome, but there are some challenges. Scientists do their best to make sure each section of DNA gets sequenced many times, but some sections may get missed by chance.

A much larger problem is that most genomes, including those of humans and cattle, contain repetitive elements. Approximately half of our genome is made up of DNA sequences that are repeated thousands of times. These repetitive elements are important, although we aren’t entirely sure why and have a lot to learn about their function.

These repetitive sequences may serve as spacers between genes and, I suspect, influence the spatial orientation of a gene within the nucleus of a cell. Both of those factors would influence when a gene is turned on and expressed, or left in a more dormant state. Some repetitive elements are active during embryonic development.

For the most part, repetitive elements are located in the same general location from one individual to the next, but there are exceptions. Those differences surely contribute to phenotypic variation, but much more research is needed to understand how.

Repeats are generally spread throughout the genome and can range from a few nucleotides —or DNA letters -— to thousands of nucleotides long. One such example is the long-interspersed element BovB, which is reported to comprise around 20 percent of cattle genome. Each BovB section is about 3,200 nucleotides long, and there are likely more than 150,000 BovBs scattered throughout the cattle genome.

Assembling the puzzle

Imagine trying to put a puzzle together correctly with a bunch of nearly identical pieces and some pieces that are very likely missing. That has been the challenge facing those trying to assemble genomes! Newer technologies have helped to overcome this challenge by facilitating the sequencing of longer pieces, and those technologies were implemented to derive the newest reference assembly.

The previous assembly that has been in use since 2009 was UMD 3.1. It was developed at the University of Maryland. That version had around 72,000 “gaps” in the assembly compared with 393 in the new assembly. The new assembly is about 2.74 billion nucleotides long . . . 67 million longer than the previous assembly.

Both the ARS-UCD and UMD assemblies relied on the same animal — a Hereford cow named Dominette. Dominette was chosen because she was highly inbred. Using an inbred animal simplifies the assembly process because there are fewer differences in the genetic material inherited from sire and dam.

There is variation between the genomes of Hereford cattle and our dairy breeds. Zoetis recently developed a Holstein-specific genome assembly. Whether a breed-specific genome map will enhance the accuracy of genomic predictions remains to be seen, and it will be an interesting question for Zoetis scientists to explore.

More accurate predictions

While the development of a new genome reference is interesting, it may not be apparent how this will help enhance the accuracy of our genomic predictions. There are at least two areas where it may help.

Improved imputations. CDCB is now using 80K genotypes in its genomic evaluations. However, most farmers test their cows with a less costly, and lower-density, chip that tests between 20K and 45K. That means that we must “impute” half or more of the SNPs used to determine a cow’s genomic predicted transmitting ability (PTA). In genomic testing lingo, impute is a concise way of saying “make a highly educated guess.”

If a cow’s sire and maternal grandsire have been genotyped with a higher-density chip, we can impute with greater than 99 percent accuracy in most instances. Part of the imputation process is determined by the distance between two SNPs along a chromosome. Our ability to impute will be more precise if we know with greater certainty how much distance exists between two SNPs.

Locate recessives. We will also be able to identify recessive haplotypes affecting fertility more accurately with an improved genome assembly. Detecting haplotypes depends on having an accurate genome map. We may be able to detect existing recessive haplotypes more readily and, once found, find the exact mutation more rapidly because of our new genome assembly.

Tests by USDA and CDCB scientists suggest that raising the DNA evaluation from the 60K genotypes in the old assembly to 80K genotypes in the new assembly will improve reliability by a percentage or two. Published reliability may not be higher for Holsteins because validation studies suggest our previous estimates were slightly inflated, but our true reliability will be higher.

More to learn

The one lesson that has stood out to me more than any other during the genomic era is that, when it comes to our genome, we never know as much as we think we do. This new reference assembly is one small step on our path to discovering the secrets of the cattle genome and will help to increase the accuracy of genomic predictions along the way.