UC San Diego Jacobs School of Engineering blog: Bernhard Palsson

Wednesday, October 7, 2020

Digitizing the genome

by Cam Lamoureux, UC San Diego bioengineering PhD candidate

The genome has historically been known as life’s instruction manual. Indeed, the genome sequence of any organism contains all of the information needed to specify its form and function, from the simplest single-celled bacterium to complex organisms such as humans. But with rapidly developing sequencing technology, the genome is taking the stage as a new type of hard drive, nature’s way of storing information.

Understanding exactly how the genome represents an organism’s information remains a challenge for scientists. Any given DNA base (A, T, C or G) in the genome sequence can be involved in multiple different functions. As part of a gene, for example, a DNA base codes for a particular building block, known as an amino acid, of the protein that the gene specifies. That amino acid, in turn, may be part of a particular shape in the final protein. The DNA base may also be part of a sequence on the opposite side of the DNA double helix that is involved in controlling another gene’s activity. With so many different functions, information encoded by the genome sequence is convoluted and overlapping, yet it is critical to understanding an organism’s behavior.

Our work in bioengineering professor Bernhard Palsson’s Systems Biology Research Group at UC San Diego addresses this challenge. We introduce a completely new way of representing this information. For every DNA base, we can answer a simple yes/no question about every type of information the sequence can encode: does this DNA base encode that information? Borrowing from computer science, we realized that the answer to this question can be thought of as a “bit,” a binary digit. By doing so, we can scan across the entire genome of any organism, ask this question, and tabulate the answer as 1 for “yes” and 0 for “no.”

With this approach, we can construct a clean, quantitative record of the bits of information that an entire genome encodes. We call this method of genome annotation the “Bitome.”

We envision that the Bitome will serve as a key foundational tool for genome engineering, with applications in the sustainable production of industrial and medical compounds. For example, bioprocess engineers who reprogram bacterial genomes to sustainably produce chemical compounds can use our method to quickly assess which parts of an organism’s genome sequence are important for their application, and which are less important. They can make predictions about how proposed changes to the genome sequence will affect the organism.

While the Bitome’s capability mirrors traditional genome browsers, our approach provides far more utility and flexibility. Because we have digitized genome information, we can perform computations on those bits of information.

As a test case, we studied the E. coli genome and showed that DNA bases that contain fewer bits of information are more likely to be mutated during adaptive evolution. Because this observation is based on information that can be encoded by any genome sequence—not just E. coli—it could be used to predict genes that are more likely to mutate in cancerous tissues, for example.

The Bitome’s digitized representation facilitates prediction with machine learning. In part of our study, we applied machine learning to pinpoint the use of a particular stop codon as a predictor of mutability. This result is significant because it provides a deeper understanding of how genes mutated during adaptive evolution, a key tool for genome engineering. We also used machine learning to predict gene essentiality directly from the genome, another key capability for engineering genomes.

We are excited by the potential future applications of the Bitome as a way of analyzing genome sequences. This concept is inherently extensible to any organism’s genome and will undoubtedly serve useful both for deeply understanding the information encoded in a genome and for predicting behavior based on that information. With this work, we hope to further bridge the gap between the genome sequence information and the complex, critical functions that it encodes.

Publication: Lamoureux, C. et al (2020) The Bitome: digitized genomic features reveal fundamental genome organization. Nucleic Acids Res. https://doi.org/10.1093/nar/gkaa774

Thursday, March 5, 2020

Metabolic and genetic basis for auxotrophies in Gram-negative species

By Yara Seif

While some bacteria survive independently, others reduce their metabolic expenditures by utilizing the nutrients available to them in their environment. These bacteria choose to adapt the concept of simple living or “less is more,” meaning one can survive on minimal requirements (we could definitely learn from them). Auxotrophy, a.k.a nutritional dependencies, are a characteristic of host adaptation. They are hard to characterize experimentally because there are too many nutrients to choose from, and also because they differ from one strain to another.

In a study published Mar. 5 in PNAS, we develop a computational workflow that uses both flux balance analysis and comparative genomics to predict nutrient requirements de novo and from sequences alone.

In our workflow, we compare the gene content across several strains of bacteria, and build metabolic networks tailored to each genetic background. Next, we simulate for growth on a minimal medium, and when that cannot be achieved, we run our algorithm called AuxoFind, to search for possible nutrients that would restore growth in silico.

Metabolic networks were tailored to the gene content of different bacteria and nutrient dependencies were predicted and validated experimentally. Image courtesy of Systems Biology Research Group

We find that when the same gene is missing, the nutrient requirements change across species, because they have different metabolic networks and combinations of alternative pathways. We also observed that the absences are manifested as a result of a large range of genetic modifications going from simple and small mutations (like single nucleotide polymorphisms) to large and complex genetic changes (whole genome rearrangements and multi-gene deletions).

The significance of this work is as follows:

Patients with certain diseases (such as Crohn’s disease or cystic fibrosis) tend to be chronically infected with bacteria. Over time, these bugs become more vicious because they slowly adapt to the in vivo environment. Understanding how these adaptations occur is a first step towards devising therapeutic solutions.

###

Yara Seif is a UC San Diego bioengineering Ph.D. student. As a member of Bernhard Palsson's Systems Biology Research Group, she studies the metabolism of bacterial strains as well as the evolution of metabolic traits across strains especially in relation to their lifestyle. Her research so far has included multi-strain genomic and metabolic analysis of gram-negative strains using a combination of constraint-based metabolic modeling, comparative genomics and machine learning.