The latest news from The Hyve on Open Source solutions for bioinformatics

Hyver Stories: Early Stage Researcher Jeremy

July 17, 2020 | By Jeremy Georges-Filteau

Meet Early Stage Researcher Jeremy Georges-Filteau (30).

He joined The Hyve in July 2018 for a three-year industrial PhD. When Hyvers and other researchers can’t use data from real patients they use simulated patient data. Unfortunately, these simulations are below par. Jeremy’s PhD project will result in the creation of good quality artificial data that truly resemble real patients.

 

 

 

 

 

Can you tell me a bit about your background?

I started a Bachelor in electrical engineering because I wanted to do robotics and things like that, but it turned out the programme was not oriented towards that. So I quit and worked for two years. I then learned about bioinformatics. The focus on both technology and nature really got me hooked. I did my bachelor in Bioinformatics at the Université Laval in Quebec (Canada) and then started a masters in Machine Learning at McGill in Montreal.

 

How did you get to know The Hyve?

During my bachelors, I did a one-year exchange in Strasbourg, France. On a backpacking trip across Europe in the months before my exchange in France I met a Dutch guy and I ended up marrying him. For that reason, I also wanted to stay in the Netherlands for a few months during my internship in Bioinformatics. I applied to a bunch of places but The Hyve was my first choice and I got it.

By the end of my Masters, I started looking for a PhD. I was then living in Montreal with my (now ex-)husband. He wanted to come back to his friends and family in The Netherlands. I was lucky that at that time The Hyve emailed me about their PhD project. I applied right away.

 

How does an industrial PhD work?
Basically, all my work is done at The Hyve. I’m not in one of the company teams but now that my project is in a practical phase I share the outcome of my research and discuss with the teams how to apply the insight in their daily work. Occasionally, I meet up with my supervisor at the Radboud University in Nijmegen, but my regular discussions are with my in-company supervisor Elisa.

The PhD project I’m currently working on is different from the one I applied to though. I tried to work on that for a while, but found it had already been researched too much. I didn’t feel I could develop anything that would be publishable. At least not on my own, without being part of a full academic research group.

The initial project gave me an idea for another one though. For that project I would need a lot of patient data but because of privacy regulations, medical data is very hard to come by. For The Hyve this means they often use synthetic data when testing software for clients since they can’t access the real patient data.

There are currently a few ways to create synthetic patient data, but it never leads to realistic data. For some applications that’s fine, for others it’s practically unworkable. So for my project I decided to develop an algorithm that creates good quality synthetic-patient data.

You may have seen deepfake images and videos. They use machine learning algorithms (specifically Generative Adversarial Networks) to create fake videos or fake images of people. I thought: Why not use that approach to create realistic synthetic medical data? I started looking into that and found that virtually nobody has researched that yet for observational health data. So it seemed the perfect PhD-project and I decided to go for that.

 

Jeremy and other researchers in the AiPBAND project 

 

How would such an algorithm lead to fake patient data?

You would train the algorithm on real-patient data from Electronic Health Records (EHR). It would learn what the data looks like. The trained model can then produce any amount of synthetic data, fake patients in this case but looking identical to real ones. This approach would eliminate the privacy issues that The Hyve and other companies now have to work around as the data are not derived from actual patients. Besides, you can share the synthetic data as much as you want to enable close collaborations with other researchers and even enable research for a large number of people who don't have the resources to access the data currently. Freeing the massive amounts of collected observational health data could have a big impact on precision medicine, machine learning and even business.

 

 

 

You’re two years into your PhD. What are your plans for the next and final year?

I’ve almost finished writing a review on my research topic. Now it’s time to start looking into applications. How can I put the theory into practice and make it useful – for The Hyve in particular?

One obvious first application is to make good quality testing data. The constructed data we now use to test data is incomplete. As a result, some errors may go unnoticed until the product is delivered.

Another application would involve the OHDSI-community. This community has developed a common data model, but they’re still working with separate health data sets. It would be really interesting to train the algorithm on these independent data sets and create one big dataset that represents all the others, but can be freely distributed because it is not from actual patient data. The OHDSI-community often organizes hack-a-tons for example to develop software. An open  dataset with synthetic patient data would be perfect for such events.

Another option would be to feed the algorithm with the initial status of a patient and simulate disease progression. Pharmaceutical companies could then use this “digital twin” to calculate the effect of medication and possible side effects. Besides, you can repeat this for a large number of patients.

 

What do you like about working at The Hyve?

I like the environment. It’s really a positive place. Everybody is very friendly. I wish I had more time to interact with my colleagues. In the beginning, I was so stressed about my project that I was a bit in my head and then corona happened. Now that I’m looking for practical applications, I’m sure I’ll get a chance to interact more.

 

What do you do when you want to take your mind off your PhD-research?

I got a game console recently. When you’re doing a PhD, it’s on your mind practically 24 hours a day. I needed a way to relax and get my head out of the PhD-project. So I’ve been gaming a lot recently. Besides that, I like to do manual work. I like to build stuff.

 

 

 

* ESR in the AiPBAND project, a European Training Network (ETN) funded by the Marie Skłodowska-Curie Actions (MSCA)