Guided modelling and exploratory visualization of the most common topics inside proposals of any Decidim instance.
This notebook parses and cleand proposal textual data and metadata, for optmizing semantic analysis. It generates embeddings using the sentence-transformers library and open multilingual language models. I then applies a customizable BERTopic pipeline for modelling and visualizing topics, including their hierachica relationships and changes over time.
Open Python notebook for processing and visualizing common topics in proposals from open data of any open Decidim instance.
The text and other metadata from the 31K proposals was processed in Python, for cleaning and preparation for semantic analysis, before processing text embeddings and applying the BERTopic pipeline of dimensionality reduction (UMAP), clustering (HD SCAN) and topic extraction (cTF-IDF + Bag of Words from clusters). This approach, and overall vision of the BERTopic library, recognizes the subjectivity when identifying topics in their countext, and, as such, aims form retaining more control by the user when parameterizing topic modelling and interpreting their results with visualization.