Researchers from the National Research University Higher School of Economics have developed a machine learning model that predicts the risk of developing complications in patients who have suffered a myocardial infarction. For the first time, the model takes into account genetic data, allowing for a more accurate assessment of the risk of long-term complications. The study has been published in the Frontiers in Medicine journal.

© iStock
Ischemic heart disease (IHD) is a condition where the heart does not receive enough blood and oxygen due to narrowing or blockage of the coronary arteries. It is usually provoked by plaques forming on the vessel walls from fats and cholesterol. IHD may manifest as angina pectoris (chest pain), myocardial infarction (heart attack), or other complications.
According to the WHO, ischemic heart disease is the most common cause of mortality worldwide, accounting for 13% of deaths. Therefore, it is important to prescribe treatment wisely and reduce the risks of complications and recurrences. Researchers from the HSE have built a model capable of predicting the likelihood of developing complications after a myocardial infarction.
Scientists analyzed the data of patients from the Surgut District Center for Diagnostics and Cardiovascular Surgery who were admitted with a myocardial infarction during the period from 2015 to 2024. Upon admission to the emergency department, the physician-researchers explained the study's provisions and obtained the patients' consent to participate. Then, cardiologists assessed the condition of the coronary arteries supplying the heart and based on the assessment, carried out operations to restore blood flow: balloon angioplasty and stenting or aortocoronary bypass. Patients were treated with medication using RAAS blockers, beta-blockers, statins, and dual antiplatelet therapy. Data were recorded in the hospital's medical histories. Standard clinical parameters were determined for every patient: blood pressure, body mass index, cholesterol, and glucose levels.
At the laboratory stage, physician-researchers extracted DNA from leukocyte rings in the collected blood samples, and then froze it at –80 °C for future genetic testing. Genotype was determined by a specific genetic variation (polymorphism) in the VEGFR-2 gene. The VEGFR-2 genetic marker is an element in the body's signal system that controls the growth of new blood vessels. There are three genotypes — C/C, C/T, and T/T — differing in the variation of DNA nucleotides cytosine (C) or thymine (T) at this gene segment. The marker has long been known, but its influence on the prognosis of complications after myocardial infarction was studied for the first time.
The authors of the article considered the impact of 39 factors on the prognosis of risks of cardiac death, recurrent acute coronary syndrome, stroke, and the need for repeat revascularization — a procedure that helps restore blood flow in the arteries. To select an effective model, the researchers trained and tested several machine learning algorithms: gradient boosting (CatBoost and LightGBM), random forest, logistic regression, and the AutoML approach.
The best performance was shown by the CatBoost model — a gradient boosting algorithm optimized for working with data denoting categories or groups, not numerical values. It makes predictions by sequentially creating and training "weak" decision trees, where each subsequent tree corrects the mistakes of the previous ones. In building trees, the algorithm splits the data into two parts: the model is trained on one part of the data, and the errors are calculated on the other. This reduces the effect of overfitting, where the model simply memorizes the correct answers, and helps find common patterns for predictions in unfamiliar cases.
The influence of features on the accuracy of the model was evaluated using the method of sequential feature addition, which tests their contribution at each stage. Scientists selected the 9 most significant factors: gender, body mass index (BMI), Charlson comorbidity index, taking into account the presence of serious concomitant diseases, the condition of the left ventricle lateral wall, the extent of left coronary artery trunk damage, the number of affected arteries, VEGFR-2 gene variant, the choice of procedure between percutaneous coronary intervention or coronary artery bypass grafting, and statin dosage.
The results showed that the dose of statins—medications used to lower blood cholesterol levels—was the most important factor affecting the risk of developing complications. High doses of statins reduce this risk, especially in patients with an adverse genotype. The polymorphism of VEGFR-2, specifically the presence of the T allele, was found to be the fourth most important factor.

Maria Poptsova
"Previously, genetic factors were not used in machine learning models, mainly due to the fact that sequencing or even genotyping of individual nucleotides is not conducted in hospitals. However, in addition to standard indicators, we had access to data on polymorphism in the VEGFR-2 gene. Thanks to this, we managed to compare this indicator with others and found out that the risk allele of the VEGFR-2 variant is among the five most important factors for predicting long-term outcomes in patients with myocardial infarction," explains one of the authors of the article, the Head of the International Laboratory of Bioinformatics at the HSE Maria Poptsova.
Researchers emphasize that the analysis of genetic data helps in creating more accurate and personalized models for predicting the risks of cardiovascular complications in patients after myocardial infarction.

Alexander Kirdeev
"Cardiovascular diseases require resources for diagnosis, treatment, rehabilitation, and prevention, and therefore create a high burden on the healthcare system. The implementation of such models into clinical practice will allow to reduce mortality and the frequency of recurrent infarctions, optimize treatment, and reduce the burden on physicians," comments one of the authors of the article, a research intern at the International Laboratory of Bioinformatics Alexander Kirdeev.
The study was conducted as part of the HSE project "Mirror Laboratories".