top of page
< Back

Exploring Research Visibility of the COST members: a Bibliometric Analysis of Topics

Virtual Mobilty Grant
Applicant name:
Piotr Wójcik
Ania Zalewska.jpg
Start date:
9.10.2023
End date:
30.10.2023
Applicant institution:
Uniwersytet Warszawski
Purpose of the grant:
In the team consisting of researchers representing five countries (Poland, Germany, Romania, Netherlands and Italy) we measured the results and content of research published through the FinAI COST action 19130. In particular, we performed this analysis with respect to specific topics that relate to important deliverables of our action, i.e. “risk of using digital assets” (see deliverable 10), “stress tests for AI evaluation” (deliverable 11), and “finance failed trials” (deliverable 12). We analyzed all publications of the participants of our COST Action scrapped from Google Scholar at the end of March 2023. Out of the total number of 216 Action participants for that moment, 51 did not have a google scholar profile. Therefore, finally we took into account the publications of 165 unique researchers. We applied some additional filtering rules. As the Action focuses on the applications of artificial intelligence in finance including the risks related to digital assets, we limited our sample to publications not earlier than 2010 (2009 is the year of the birth of bitcoin). In addition, as google scholar profiles might include not only research but also didactic materials (e.g. handouts for students) we limited our interest to the records that have non-empty journal name, which in case of google scholar
profile also includes phrases like “working paper”, “proceedings”, etc. To classify articles into topics we applied the state-of-art BERTopic algorithm, i.e. a transformer-based pre-trained language model based on the word embeddings. In a nutshell, embeddings are contextual representations of text. Embedded documents are represented in a vector space which enables comparing them semantically. Specifically, the most popular variants of the embeddings are Generative Pre-trained Transformer (GPT) and Bidirectional Encoder Representations from Transformers (BERT). In accordance with its name, BERTopic algorithm uses a variant of the latter. Precisely, it applies Sentence Bidirectional Encoder Representations from Transformers (Sentence-BERT). As the analyzed documents were written in different languages, we applied a multilingual variant of BERTopic. Parameters of the BERTopic algorithm were optimized using topic coherence measures, i.e., UCI, UMass, UCI-NPMI. Cosine similarity which takes values between -1 and 1, was used for identification of topics’ embeddings that were most similar to certain phrases: “risk of using digital assets”, “stress tests for AI evaluation”, “failed trials”. In the first part of the analysis we focused on the titles of articles. In our database there were 8,066 unique records representing scientific publications from 2010 onwards with non-empty titles. In addition, we extended the analysis on article descriptions (abstracts). In this part there were 5,485 unique records representing scientific publications with non-empty description (abstract) from the period 2010-2023. The results of the analysis offer a comprehensive view of the portfolio of created articles and working papers and its topic classification, leading to important benefits for FinAI-related institutions across the European Union. The Action participants will gain valuable insights into how established COST teams engaged in collaborative research can enhance their future publication output, how well the most important Action topics were covered in their research and what are the other emerging topics that can lead to future joint projects proposals. Last, but not least, comprehensive analysis of the scientific output of the Action so far would give interesting insights for the last deliverable of the Action which is “an edited volume containing scientific achievements of the Action”.
bottom of page