Comp Chem, Adverse Drugs, and Dyes

Research can go in many unexpected directions and my experiences at NC State are no exception to this rule. When I first joined the NC State chemistry department I knew two things; the first was that I wanted to continue working in the field of computational chemistry and the second was that my research would be focused on the field of drug discovery. Everything else was completely unknown to me.

The research lab that matched my desire to work in computational chemistry and drug discovery was a new group led by Denis Fourches. Denis joined NC State in January 2015 as a Chancellor’s Faculty Excellence cluster hire with the chemistry department and bioinformatics research center. He specialized in a trending sub-field of chemistry called cheminformatics.  Cheminformatics uses statistical tools (like hierarchical clustering & machine learning) to analyze large databases of chemicals for new applications. Many of the tools used in this field are the same ones that Netflix and Amazon use to suggest movies or purchases. Specifically, Denis wished to use these models to aid drug developers in designing new, more potent, and selective compounds.

When I joined Denis’ group, my first research project was conducting a literature search for compounds to treat prostate cancer. Our goal was to build a machine learning model to predict drug potency for treating prostate cancer. But, we soon realized another research team from Canada was working on this same problem and one conference call later, we were kindly informed not to continue our efforts. These are the risks of joining a new research lab. (One year later this same team would publish an article detailing how they had identified over 100 new Class I and II clinical trial compounds (J. Chem. Inf. Model. 2017 DOI: 10.1021/acs.jcim.7b00137). We were lucky to have invested so little time into this project.)

Denis sent me back to the literature, this time searching for ‘adverse drug reactions’ occurring from drug interactions with immune signaling proteins known as human leukocyte antigens (HLA). HLA proteins are found on the surface of cells and present an antigen (short peptide built from degraded protein) to T-cells in the body. If the antigen presented was built from degraded, healthy protein then the immune system stays turned off. But, if the antigen is foreign (i.e. built from bacteria or a virus) then the system is turned on and the infected cell is removed. Sometimes a drug can bind to HLA proteins and be mistaken as a pathogenic antigen and the immune system is turned on by mistake.

That literature search would lead to my first publications with the Fourches Lab. In the first study, we conducted a proof-of-concept study that demonstrated cheminformatic tools could reliable identify HLA binders (J. Cheminf. 2017, DOI: 10.1186/s13321-017-0202-6). This initial study gave us key insights into the complexity of HLA and drug interactions that we were able to incorporate into a second study screening the DrugBank database to identify new, unsuspected HLA binding drugs (J. Cheminf. 2018. DOI10.1186/s13321-018-0257-z). DrugBank is an online database that houses over 7,000 approved, experimental, and illicit drugs. Ultimately, our screening platform identified 22 potential HLA binders.

Our model scores the favorability of placing a drug inside the HLA binding pocket. If a drug has a favorable score, then it is likely to bind with HLA and may cause an adverse drug reaction.
In our second study we used a three-tiered system to identify 22 DrugBank compounds from a database of 7,000. This ensemble approach reduced the possibility of identifying false-positives (compounds that don’t really bind HLA).

Perhaps the most unpredictable research projects come from collaborators. The most influential collaboration I have worked on is with NC State’s College of Textiles. The College of Textiles houses the over 98,000 chemical dyes created by Max Weaver (during his time with Eastman Chemical). Our job is to provide assistance with digitally storing the dyes and determining new chemical applications for these compounds. Typically, this is performed using chemical similarity searches.

Chemical similarity is a cheminformatics principle that believes molecules with shared structural features and properties will have shared activity. The simplest example of this is the rule from general chemistry, ‘like dissolves like.’ A chemical similarity search is performed in many ways. One approach we use is by converting a compound’s structure into a numerical string of 0’s and 1’s (known as a bit string). This bit string is then used to map structural features (i.e. benzene rings, alcohol groups, etc.) between two compounds and the shared features are measured. The shared features give us a numerical score for determining similarity.

IMG_20180215_155554 (1)
There are over 98,000 compounds in the Max Weaver Dye Library and they are stored inside glass vials with hand-drawn labels. One of the biggest challenges is converting these labels into digital files for cheminformatics analysis. To date, only 2,000 dyes have been digitized.

Our work with the Max Weaver Dye Library resulted in another College of Textiles collaboration with Harold Freeman and Tova Williams. Tova was a PhD student in NC State’s fiber and polymer science program (she recently completed her defense!) and her research was designing alternative hair dyes that are safer and more affordable for customers. She soon noticed that, even though hair dyes are a multibillion dollar industry, there was no publicly available database of current and former dyes.

Tova began combing through the literature and created the hair dye substance database (HDSD). The HDSD contains 313 hair dyes that are categorized as temporary, semi-permanent, and permanent (ACS Sust. Chem. Eng. 2018 DOI: 10.1021/acssuschemeng.7b03795). These categories are assigned by a dyes chemical interactions with hair and the number of hair washings a color will last (before returning to the original hair color). Next, we assisted Tova in characterizing the HDSD by molecular properties (like weight and hydrogen bond count) and performed a hierarchical clustering analysis. Hierarchical clustering is another technique for visualizing and comparing chemical similarity. These clustering results can be used by dye scientists to design alternative hair dyes by shared chemical properties.

One of the most difficult figures I have helped to build. The innermost ring is known as a circular dendrogram and was created from our hierarchical clustering analysis. We identified 9 clusters of dyes based on their chemical properties. The next ring (known as a sunburst visualization) is color coded by dye type (precursor or pigment), the following ring is color-coded by hair dye type (temporary, semi-permanent, and permanent), and the last ring is color coded by dye color. This figure required a combination of R-coding and photoshop to build. 

My research has developed quite a bit since joining the Fourches Lab and I no longer limit myself to working in drug discovery. But, one thing remains the same and that is , I still don’t know what directions future projects will go or what unexpected collaborations the future holds. The allure of research is the unknown.

* * *

Tuesday Photo Challenge: This week I thought I would showcase some of the Colorful figures I have helped design during my time at NC State.


One thought on “Comp Chem, Adverse Drugs, and Dyes

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s