Tools:
Data cleaning, language processings and selection: Python, Jupyter Notebook, Sentence-Transformers, UMAP, Scikit-Lean, NLTK, Plotly, Seaborn, Matplotlib
Langage model: intfloat/multilingual-e5-large-instruct
Other open tools: OpenRefine, LibreOffice, AntConc, Gephi