Category

Research

Organization

DiCEM programme, UPF Barcelona (2025)

Decidim Topic Modeller

Guided modelling and exploratory visualization of the most common topics inside proposals of any Decidim instance.

This notebook parses and cleand proposal textual data and metadata, for optmizing semantic analysis. It generates embeddings using the sentence-transformers library and open multilingual language models. I then applies a customizable BERTopic pipeline for modelling and visualizing topics, including their hierachica relationships and changes over time.

Large Project Gallery Image #1
Large Project Gallery Image #1
Large Project Gallery Image #1

Open Python notebook for processing and visualizing common topics in proposals from open data of any open Decidim instance.

The text and other metadata from the 31K proposals was processed in Python, for cleaning and preparation for semantic analysis, before processing text embeddings and applying the BERTopic pipeline of dimensionality reduction (UMAP), clustering (HD SCAN) and topic extraction (cTF-IDF + Bag of Words from clusters). This approach, and overall vision of the BERTopic library, recognizes the subjectivity when identifying topics in their countext, and, as such, aims form retaining more control by the user when parameterizing topic modelling and interpreting their results with visualization.

  • The image featured in the carousel #1
  • The image featured in the carousel #2
  • The image featured in the carousel #3
  • The image featured in the carousel #4
  • The image featured in the carousel #1
  • The image featured in the carousel #2
  • The image featured in the carousel #3
  • The image featured in the carousel #4
  • The image featured in the carousel #1
  • The image featured in the carousel #2
  • The image featured in the carousel #3
  • The image featured in the carousel #4

Tools:

Python: Sentence-Transformers, BERTopic, UMAP, HDBSCAN, Plotly, NLTK, Pandas, Cosmograph Plugin, NLTK. Other open source tools: Cosmograph Web App, Open Refine Models: projecte-aina/ST-NLI-ca_paraphrase-multilingual-mpnet-base, published by the BSC-CNS Barcelona Supercomputing Center, for embeddings generation. Perplexity AI and Anthropic Claude Sonnet for coding assistance.

Credits:

Created by: Diego Arredondo Ortiz Prototyped within the course Data Analysis and Information Visualization Concerning Global Issues within the Digital Culture and Emerging Media (DiCEM) programme at Universitat Pompeu Fabra. Barcelona, 2025.

Create a free website with Framer, the website builder loved by startups, designers and agencies.