Dr. Jozef Madzo, Biomedical Science Interpreter at The Wistar Institute

October 9, 2024

Share

In July 2024, Jozef Madzo, Ph.D., joined Wistar’s faculty as an assistant professor and new director of the Bioinformatics Core Facility, which provides investigators with critical data analysis support. We sat down with him to learn more about how bioinformatics plays a role in modern biomedical research.

At the highest level, what is bioinformatics, to you?

Bioinformatics, at its core, is the intersection between biology and computational science — that is, data analysis techniques that we develop using programming languages. The field exists to answer biological questions that are basically impossible to approach without computer assistance. Mostly, that means what people call Big Data: enormous datasets with multiple variables.

Bioinformaticians are similar to computational biologists, but where that field has specialized to split work between the lab and the keyboard, bioinformaticians are very much coders: we’re writing scripts to analyze big datasets for a variety of research areas. Technically, there is a Madzo lab at Wistar, but it’s all computer-based — I don’t use pipettes or live cells. For me, the science is the coding and the analysis.

I typically write in a programming language called R because it has well-integrated data visualization elements, and in my role of supporting different scientists’ need for data analysis, it’s important that I export critical findings to easy-to-read graphics.

Why has bioinformatics become so important to biomedical research?

The datasets are too big. To find anything significant or interpret any data properly, you need to run it through code. There have always been big datasets, but the technology that researchers use has improved — and for us, improved technology basically means even larger datasets. Advanced sequencing methods are better because they “see” more of what’s going on in cells’ genetics, but “seeing more” is another way of saying “producing more data.” Somebody has to sort through it.

Developing the code to analyze these datasets properly is a specialization in its own right. Students enrolled in Ph.D. programs right now learn a certain amount of coding through their training by necessity, but generally speaking, having a biology background does not guarantee that you’ll know how to interpret the results of your own experiment — that’s how complex biological datasets have become. Mainly we want to make sure that the best statistical methods are used so that the significance of results is clear and undistorted (which can be a pitfall of advanced statistics, especially with big data).

Large, complex datasets have better statistical powering, which means that we can have greater confidence in our results. We also have more opportunity for discovery, but the era of Big Biological Data does create a need for labor specialization between scientists. On the one hand, you have PIs in their labs running their experiments, and then on the other, you have people like me and Dr. Andrew Kossenkov in Wistar’s Bioinformatics Facility running the analysis on the data that the experiments produce.

How did you become interested in bioinformatics?

I didn’t start to specialize until I was a postdoctoral fellow. I’m from Slovakia, so I got my Ph.D. in Prague and had planned to complete a postdoc fellowship in the U.S. before returning — but that’s when I met my wife, so I stayed here.

I was working in a lab at the University of Chicago where we started to deal with a lot of epigenetic data, which involves very large quantities of information. We didn’t have much bioinformatics support, so I tried to work through it on my own, occasionally checking my work with UChicago’s bioinformatics core specialist.

My methods for that analysis eventually wound up in a paper we published, and I stuck with the coding from there because I found that I enjoyed it. I finished my postdoc and found out that there was a bioinformatics master’s program at Temple, so I worked in a cancer research lab at the Fels Institute while finishing up my specialization.

What role do you see for artificial intelligence in bioinformatics?

I think we’ll continue to use and improve on machine learning, which is a subset of artificial intelligence. Whenever I hear “AI,” I tend to think of sentient robots and science fiction, but that’s distinct from the classical machine learning methods we use.

A lot of what gets called AI now refers to neural networks or deep learning technology, which is really good at identifying undefined variables — for example, training a program to figure out whether an image contains a cat.

Sometimes, we do use neural networks — especially when there’s an image dimension to something, like in spatial transcriptomics, where subsets of data are associated with a physical region of a cell or tissue — but mostly, our data are already defined because we know what we’re measuring in advance. Classical machine learning allows us to optimize algorithms that can sort through the big data and test for patterns or associations with confidence.

Personally, I find repetitive elements very interesting, which used to be thought of as “junk DNA” — noncoding regions of genetic material that don’t produce proteins. The thinking went that those areas didn’t really do anything, so many scientists ignored them. But it turns out that, in the aggregate, patterns in repetitive elements can be very useful in analyzing cancer biology because they can predict — for example, how well people will respond to certain treatments, or even, potentially, the existence of cancer itself. But without good code and computing power, those discoveries are almost impossible to make.

What do you think the future looks like for bioinformatics?

I think it will keep improving; that’s what computers do. And as sequencing technologies get better and cheaper, that gives everybody more data — which means we’ll continue to have even more opportunities for discoveries. As a field, to make sure these improvements are their very best, we need to continue pushing in the direction of providing our methods and code with every publication so that our colleagues can properly replicate our data.

Ultimately, a big part of the appeal for me with bioinformatics is the opportunity for collaboration. Almost everyone requires our support, which means that I can work to solve a wide variety of important problems.

I get excited at the opportunity to work with really smart people; when you have that chance as a scientist, you’re pushed to do your best work. And Wistar — well, Wistar has dozens of geniuses. I’m excited about the work we’re going to do together.