Rhodium catalysts have been enantioselectively converting alkenes with hydrogen (hydrogenation) for half a century. Many articles have been published about these types of catalysts and the underlying reaction mechanism, so one would expect a sophisticated understanding of these catalytic reactions. But there is still no simple way to quickly select the right ligands for your homogeneous catalyst when switching substrates. Adarsh Kalikadien, Evgeny Pidko and colleagues from TU Delft and Janssen Pharmaceutica wanted to see if they could develop a predictive model for this using machine learning, but the project turned out differently than expected.
“The idea wasn’t complicated,” says Kalikadian, a PhD student in the Bidco group. “We created a simple model of the reaction using the well-known rhodium catalyst. The goal was to create statistical models to predict which catalysts and ligands you could use, so that you needed less trial and error. They used different machine learning models on a set of computational data and high-throughput experiments that Jansen had done.
random
The team compared, among other things, the performance of these models. “We calculated all sorts of properties based on quantum chemistry — the most intensive and expensive calculations — and 2D cheminformatics as well as 2D representations,” says Kalikadin. These properties are different representations of the catalyst as seen by the model. As a test, they also added a random set, containing 34 random numbers between -100 and 100. “The strange thing is that all the simpler models, including the random model, performed the same as the expensive version; it turned out to be completely useless.”
“We made everything open source.”
Something that wasn’t reflected in the paper, but affected the project, was simple. Censorship “On the computer, you can draw the 3D structure of the catalyst you’ve tested under certain conditions. You can then do DFT calculations on that and extract the properties,” says Kalikadin. “Now we used the CAS numbers of the bonds for this. But what we didn’t realize was that the CAS numbers and the drawings on the vials in the lab didn’t match our 3D structures.
“We spent months discussing the properties with the team and making improvements, and in the end we got really good calculations at a high computational level,” the PhD student continues. But during one meeting it turned out that the computational structures did not point to the right ones. Identifiers For the experimental data! So we had to go through all these structures one by one to see where things went wrong. And when we processed the right molecules and created a new statistical model, we were surprised to find that we got almost the same results. So one conclusion was: This is why com.outdomain The modeling approach, it doesn’t matter what you put in it. It was an indication that the model didn’t learn much from the specific representation. “Looking back, we can laugh about it, but during the project it took some of my mental health,” he says with a laugh.
Evaluate
This was intended to be a simple project, but it didn’t go as planned. “I found a lot of the results a little disappointing,” Kalikadin admits. Still, the research—and especially the data it generated—was valuable, especially in light of the rise of machine learning. “That’s why we made everything open source. Not only can you view all the data, but we also provide the code, including packages and manuals, so that anyone who wants to do the same kind of research can do it.”
So they published one of the largest datasets for a particular type of hydrogenation reaction. “Publishing was still a challenge. It was a very in-depth study of how machine learning works in chemistry and not all the conclusions were positive. This led to a high-profile journal rejecting the paper because they felt ‘it didn’t belong here.’ Fortunately, it stood up.” Chemical Sciences More open to it, so we can post our data, codes and even interactive numbers there.
meaningful
What now? “Our representation wasn’t as meaningful as we had hoped, so we’re now looking for a representation of the catalyst that’s maybe a little less simplistic, but still as simple as possible,” says Kalikadian. “You also want to keep the costs from getting too high, so we’re trying to incorporate more information about the reaction mechanism into the model without making it too broad. So a dynamic version of the representation.
Kalikadin, AV et al (2024) Chemistry. Science fiction.Digital ID: 10.1039/D4SC03647F
More Stories
Which can cause an increase in nitrogen.
The Central State Real Estate Agency has no additional space to accommodate Ukrainians.
The oystercatcher, the “unlucky national bird,” is increasingly breeding on rooftops.