This is the official agenda of the TaCoS. Arrival and check-in is on the ###DATE at the Youth Hostel Saarbrücken from ###TIME onwards. Please tell the hostel staff that you are part of the conference and keep your ID ready. The TaCoS team will be there in person from ###TIME - ###TIME to take care of the registration. If you miss this time window, please talk to us at the university the following day.
|8:30 - 9:30||Lanyards and Reception Seminar Room 23.21.00.44
Register with us and get your lanyards if you did not check in at the hostel.
|9:30 - 10:00||Opening Plenary Lecture Hall 3F
Greeting and general information
|10:00 - 11:00||Keynote talk by Dr. Yulia Zinova Lecture Hall 3F
▼ Word Embeddings and MorphologyClick here to download the slides.
In this talk I will present some of the ongoing research on morphology with static embeddings. I will talk about what kind of information gets implicitly learned, which limitations the models face and how linguistic knowledge can be used to boost performance of language models, especially for languages with less resources and/or rich morphology.
|11:00-11:30||Coffee Break Room 23.21.00.44|
|11:30-12:00||Student talk Seminar Room 23.21.00.46
▼ Throwing Shaders at Language Models - Evaluating Creative Code Generation - Jan KelsClick here to download the slides.
We introduce the Shadertoys dataset containing 27.508 shader programs gathered from Shadertoy.com, annotated with metadata. To evaluate how language models handle code generation we publish a benchmark suite for creative programming tasks. The first task serves as a proof of concept and evaluates models on their ability to generate the return statement of a function. Language models reached about 0.28 while code specific models reached 0.37 and fine tuned models managed 0.60 for said metric. The benchmark and associated dataset are available on Huggingface.
Student talk Seminar Room 23.21.00.048
▼ An Exploration of translanguaging on Social Media by Hispanic International Students in the United States - Cristina Reguera-Gòmez[SLIDES]
International students from Spanish-Speaking countries in American universities often communicate in Spanish or English depending on the environment. However, they also interact in a way that can be described as the combination of both languages within the same discourse. This phenomena is known as translanguaging, and it is a common linguistic practice all around the world (de la Luz Reyes, 2012). It can be said that it involves the creation of new ways of communicating. Nevertheless, translanguaging is also often perceived as not being skilled in neither of the two languages (Wei, 2021). This social conception may cause students to restrain themselves from translanguaging with certain audiences, and reserve it for more casual conversations with fellow bilingual international students. The purpose of this research is to examine the translanguaging practices on social networks of Hispanic international students in American universities, with a focus on how they translanguage and what social functions their translanguaging practices have. To answer these questions, this study examines the posts and messages of five participants through critical and multilingual discourse analysis. The results show that participants translanguage at the sentential and suprasentential level, and that the type of translanguaging and its frequency are closely linked to the audience and the social network used. In other words, participants translanguage more frequently and in more diverse ways when communicating with fellow Spanish-English bilingual international students, while expressing different identities. The present study suggests that translanguaging is more complex than the mixture of two languages and its presence is highly reliant on the audience.
|12:10-12:40||Student talk Seminar Room 23.21.00.46
▼ Machine Transliteration between two Persian Dialects - The Case of Farsi and Tajiki - Rayyan MerchantClick here to download the slides.
Despite speaking mutually intelligible varieties of the same language, speakers of Tajik Persian, written in a modified Cyrillic alphabet, cannot read Persian texts written in the Perso-Arabic script. Due to overwhelming similarity between the dialects, transliteration may be more appropriate than translation. Previous work created a statistical model, but lacked parallel corpora with which to judge the model (Davis, 2012). We aim to demonstrate that transliteration provides a way to “translate” between the two dialects, utilizing a neural-based approach to grapheme to phoneme conversion. Our work focuses on one direction: from the Perso-Arabic script to Cyrillic (henceforth referred to as Farsi and Tajiki). As a low-resource language, datasets for Tajiki are sparse, much less parallel Farsi-Tajiki datasets. Facing a lack of data, our data-acquisition strategy consisted of manual collection of blog posts written in both scripts. We then utilized GaChalign, an implementation of the Gale-Church sentence aligner (Gale & Church, 1993) to align the texts (Lilling & Francis, 2014). Our efforts created the first sentence-aligned, digraphic Persian corpus containing around 2,300 sentences and 66,000 words. Our preliminary investigations delivered promising results. Our current model correctly transliterates 39.2% of words from our test set of 3,507 words. After predictions that are one and two edit-distances away from the correct form, the accuracy increases to 66.7% and 82.2%, respectively. We evaluate our model using the BLEU score metric (Papineni et al., 2002), with the results presented in Figure 1. Our progress shows that a neural network-based G2P model is a viable method of transliteration between both dialects. We envision that the final form of the tool will consist of a freely-available browser extension.
Student talk Seminar Room 23.21.00.048
▼One Parent one language, one child two Grammars - Sumrah Arshad[SLIDES]
This study investigates the acquisition of negation and NC in bilingual children acquiring Dutch and Italian simultaneously. Zeijlstra(2004, et seq.) hypothesises that formal features of negation are not present in the language input and all the features of negation are semantic. Only the input can guide children to assume formal features of negation. Zeijlstra's hypothesis seems straightforward for Dutch which allows only one negative element per clause and is considered as double negation language. Dutch acquiring children receive excessive input for acquiring the adult-like expression of negation. This hypothesis also seems straightforward for children acquiring only Italian which allows the use of more than one negative element per clause and is called as negative concord language. Italian acquiring children also receive explicit input for the acquisition of negation of Italian. For the expression of negation Dutch and Italian belong to two distant groups. In this study we investigate data of young children acquiring Dutch and Italian simultaneously. Hypothesis: Bilingual children acquiring a double negation language Dutch and an NC language Italian will have mixed grammars. Naturalistic spoken speech of children acquiring Italian and Dutch simultaneously is retrieved from CHILDES. Negative sentences for children and parents, per month, for each language, were searched. After cleaning, Italian sentences were then divided into containing i) the negative marker non as the only negative element, ii) sentences for negative concord. Dutch negative sentences contain mainly the negative marker niet. In order to estimate the effect of independent variables on the dependent variable, a GLMM (Baayen, 2008) with a Poisson error structure and the log link function (McCullagh & Nelder, 1989) was used. Full null model comparison was significant. The main effect of individual fixed effects was tested using the drop1 function (Dobson, 2002). Results Dutch: Parents' input for niet (LRT: x2 (1)=6.38, p< 0.001) and age (LRT: x2 (4)=27.59, p< 0.0001) both show a positive significant effect on bilingual children's acquisition of negation. No sentence containing more than one negative element was found in children's or parents' sentences. Post hoc multiple comparisons show that children of different age-groups did not differ in observing the estimated average use of niet, thus suggesting that bilingual children acquire negation in Dutch very early, similar to their L1 peers. Italian: Non: Parents' input containing non only has a significant positive effect on children's non as the only negative marker (LRT:x2(1)=22.21, p< 0.0001). Age was also significant. NC: Parents' input for NC did not show a significant effect on children's acquisition of NC but age was significant (LRT: x2 (1) = 6.51, p = 0.01). No sentence was found containing any of the Dutch negative elements. First NC was found in the 33rd month, and the first instance of NC in L1 Italian peers was found at the age of 23 months. Conclusion: We conclude that bilingual children acquiring Dutch and Italian simultaneously as their L1s do not mix the grammar of both of their languages. Their acquisition of negation in Dutch and Italian is similar to their respective peers acquiring only one of the language.
|12:40 - 13:40||Lunch University Mensa|
|13:40-14:10||Student talk Seminar Room 23.21.00.46
▼ BART meets Macbeth - Summarization of Shakespeare's plays with BART based models - Rishu Kumar, Katja Konermann, Jingyan ChenClick here to download the slides.
With the rise in multilingual models and their increasing capacity in multiple downstream tasks, this study explores a recent state-of-the-art model forthe downstream task of summarization. We explore different summarization techniques, namely abstractive and dialogue summarization. Dialogues pose a special challenge to the task of summarization as their format differs quite a bit from other types of text. Especially in plays, important actions and crucial pieces of information are often only stated indirectly or not in the dialogue itself but in short stage directions. Additionally, summaries of dialogues require a large amount of rewording as well as rephrasing. For example, instances of the pronoun in 1st person usually have to be replaced by a 3rd person pronoun. For this study, we worked with Shakespeare plays in their original English version and German translations of them. This process included creating our own dataset by manually mapping chunks of the plays to short summaries. The antiquated language in these plays presents a further challenge, mainly because it anslated into present-time English for the summaries. Our approach tests the language-agnostic vocabulary space of BART-based multilingual models by providing input in Old English, New English and German. We further fine-tune our models for dialogue summarization with the SAMSUM dataset in English and its machine-translated equivalent in German as well as our own hand-crafted datasets. In this talk, we will walk you through our data collection and dataset creation, discuss preprocessing steps as well as the training and evaluation of the different models. Finally, we will talk about extensions and further research in this area.
Student talk Seminar Room 23.21.00.048
▼ Lexicon-based data synthesis for Swiss German NLP - Barabara KovačićClick here to download the slides.
People communicate and present their opinion in their native languages, especially with regional expressions including dialects. Especially on Social Media, dialectal words and phrases can be found. With half of the human population having a social media account, the demand for NLP tools that can handle dialectal expressions has never been more important. But often, there is a lack of resources for dialectal language. One possible solution to data scarcity is to synthetically increase the available data - however, low-resource languages are a challenging application for data synthesis, as the existing methods are less beneficial when applied to out-of-domain data. Therefore, many pretrained models cannot be effectively used. This leads to the question, how data synthesis can be effectively used to improve language models handling data including dialectal expressions. The German language has the reputation of being a pluricentric language as it is the national language of six countries, Switzerland amongst them. Swiss German alone can be divided into six to eight different dialects. Therefore, there have been numerous annotation efforts that pay specific attention to Swiss German, such as ArchiMob, SwissDial and NOAH. When trying to use language models for Swiss German which were pretrained on a Standard German dataset, the same problems occur as previously mentioned although there are only minor differences in the grammar. As the major differences between these two varieties are on the phoneme level, it can be beneficial to enhance a Standard German dataset with Swiss German words so that the language model is more robust to their occurrences when working on a Swiss German dataset. Therefore, this bachelor thesis aims to augment the datasets in a way so that they can be used for existing, pretrained language models. In doing so, the focus will be on the task of part-of-speech tagging.
|14:20-14:50||Student talk Seminar Room 23.21.00.046
▼ Coronaleugnerchats auf Telegram - Ein distinktiver Schreibstil? - Rebekka BorgesClick here to download the slides.
Durch die Covid-19-Pandemie entstanden riesige Telegram-Gruppen, in denen Coronaskeptiker sich über Proteste und Verschwörungsmythen austauschen. Diese Kommunikation findet in einem sehr typischen Jargon statt. Ob sogar von einem distinktiven Coronaleugner-Stil gesprochen werden kann, den die Gruppenmitglieder bspw. zu identitätsbildenden Zwecken wählen, sollen verschiedene Klassifikationsexperimente zeigen. Dafür wurden Coronaleugnerchats zunächst mit einer Accuracy von 99 % in Abgrenzung gegen Telegram-Newstexte mit einer BERT-Implementation als solche klassifiziert. Da der Entscheidungsprozess neuronaler Netze eine Blackbox darstellst, ist nicht ersichtlich, nach welchen Kriterien die Klassifikation erfolgt, also welche Merkmale sich beim Training als wie relevant erweisen. Merkmale können über Stichworte hinausgehen und sehr abstrakt und wenig offensichtlich sein wie z. B. Wortsequenzen oder die Wortlänge. Ein Stil im variationslinguistischen Sinn umfasst neben lexikalischen Markern auch abstraktere Muster wie die hier womöglich relevanten. Die Experimente sollen zeigen, ob ein Neuronales Netz mit einer simpleren, auf Stichworte ausgerichtete Architektur generell sowie bei der Manipulation verschiedener Stichwortevorkommen schlechtere Ergebnisse erbringt und somit auf relevante abstrakte Merkmale wie die eines Stils geschlossen werden kann. Die verschiedenen Korpusversionen für die Experimente (alle beinhalten ungefähr 60.000 Samples mit einer maximalen Länge von 50 Wörtern) werden in zehn Subsets für Kreuzvalidierungen unterteilt und diese werden im Verhältnis 90:10 in Training- und Testset gesplittet. BERT performt mit ~ 99 % Accuracy und die optimale CNN-Architektur mit zwei convolutional layers mit maximal 91 %. Für die auf die Gruppen-Chats versus Nachrichtentexte folgenden Experimente wurden Stichwörter, die besonders oft in den korrekt klassifizierten Samples enthalten waren, maskiert. Zudem wurden Telegram-Nachrichten zum Thema Corona von Coronaleugner gegen das Bundesministerium für Gesundheit abgeglichen. Außerdem wurde der Effekt des Kommunikationssettings untersucht. Aus dem schlechter als BERT performenden CNN lässt sich ableiten, dass es mehr relevante Merkmale als nur Stichworte gibt, wie auch aus dem bei der Abgrenzung von Nachrichten zum selben Thema performenden CNN. Der Unterschied zwischen Gruppen- und Kanal-Nachrichten ist minimal und nicht signifikant. Wurden Emojis und Satzzeichen ebenfalls vektorisiert, konnte die Performance gehoben werden. Viele Ergebnisse deuten also darauf hin, dass ein gemeinsamer, über Stichworte hinausgehender Stil anzunehmen ist.
Student talk Seminar Room 23.21.00.048
▼ Language Revitalization - A case for Idu Mishmi - Akhilesh Kakolu RamaraoClick here to download the slides.
Arunachal Pradesh is one of the linguistically richest and most diverse regions in all of Asia. In part due to this high diversity, a Hindi-based creole has been rapidly sweeping the state in recent years. The focus of this talk is around the on-going language revitalization efforts for Idu Mishmi language, a threatened language of Arunachal Pradesh. I will be sharing my experience of working with the Idu Mishmi community to develop technologies (like dictionary application, e-reader applications etc.) and learnings from the field.
|14:50 - 15:30||Coffee Break
Seminar Room 23.21.00.48
|15:30 - 16:30||Keynote talk by Dr. Kilian Evang Lecture Hall 3F
▼ Semantic Roles for Semantic Parsing Lessons Learned and Future DirectionsClick here to download the slides.
In this talk I report on efforts to annotate a parallel Role and Reference Grammar treebank (RRGparbank) with semantic roles for verbs. I discuss the annotation scheme developed, the annotation process, and semantic parsing results. Finally, I highlight some problems with existing annotation schemes based on VerbNet, PropBank, FrameNet, and VerbAtlas, and sketch a new scheme that solves some of these problems.
|9:45 - 10:00||Day 2 Overview Lecture Hall 3F
Information for the day
|10:00 - 11:00||Keynote talk by Dr. Nurul Lubis Lecture Hall 3F
▼ Dialogue Evaluation via Offline Reinforcement Learning and Emotion Prediction[SLIDES]
Task-oriented dialogue systems aim to fulfill user goals, such as booking hotels or searching for restaurants, through natural language interactions. They are ideally evaluated through interaction with human users. However, this is unattainable to do at every iteration of the development phase due to time and financial constraints. Therefore, researchers resort to static evaluation on dialogue corpora. Although they are more practical and easily reproducible, they do not fully reflect real performance of dialogue systems. Can we devise an evaluation that keeps the best of both worlds? In this talk I explore the usage of offline reinforcement learning and emotion prediction for dialogue evaluation that is practical, reliable, and strongly correlated with human judgements.
|11:00-11:30||Coffee Break 23.21.00.44|
|11:30-12:00||Student talk Seminar Room 23.21.00.46
▼ Evaluation of Russian Noun Word Embeddings For Cases and a Number - Anastasia YablokovaClick here to download the slides.
Russian can be characterized by a rich inflectional morphology. Particularly Russian nouns can illustrate this variety by changing the word form to indicate its grammatical case and number. The word embeddings for each of the case noun forms in singular and plural will be represented by its own real-valued vector of the length of 300 dimensions that encodes word meaning, so that words that are close in the embedding space should also be close in their meaning. Some other languages that have different noun forms for a number and/or case may exhibit interesting features in word embeddings. For example, Shafaei-Bajestan et al. (2022) have examined the semantic properties of English nominal pluralization and word embeddings and have found out that shift vectors for words, that belong to different groups on the basis of their meaning, are substantially different. This research has encouraged us to look closer at Russian noun embeddings and figure out if nouns that belong to the same group show any regularities on the basis of case forms and number. In order to do that, the dataset of 1700 nouns has been divided into groups on the basis of their semantical similarity, then enlarged with 12 columns that correspond to six Russian noun cases in two numbers (singular and plural). After that the fastText library is applied to get word vectors for each of the noun case forms. Then the difference vectors between base form (nominative case) and the other cases are calculated for each noun in the dataset. The average vectors are calculated for each group of words for every case form. The resulting vectors are added to be the base word form and then compared to the initial fastText vector. We assume that words that belong to the same group may have the same average vector for every case form. The results will help us to understand word embeddings better, thus improve word representations
Student talk Seminar Room 23.21.00.048
▼ Exploring Song Topics Across Different Countries - A Latent Dirichlet Allocation Approach - Nursulu Sagimbayeva[SLIDES]
Empirically, it is well-known that many songs produced by humanity are devoted to love and feelings. However, it is interesting to explore what other topics are common in songs. A step further is to look at the song topic distribution across different countries. Could it be so that, for example, the topic 'money' will be more widespread in America than in other countries, while the topic 'Politics' will prevail in Russian charts? In this project, we discover and compare the topics in popular songs of 20 different countries using LDA (Latent Dirichlet Allocation) topic modeling. Additionally, we analyze the most common languages of the songs popular in a given region. To gather data, we used Spotify's weekly charts at Kworb.net since they captured historic data and not only songs popular on a given day or week. We scraped the lyrics of the top 200 songs for each country from Genius.com. Then, we translated all the lyrics into English and preprocessed them. In the end, we performed LDA using the gensim library and visualized the results with the pyLDAvis tool. Our results suggest that there are some topics popular in all the researched countries: different shades of love (for example, romantic, unhappy, sensual), and the so-called 'thug life' topic that consists of cursing words, mentions of money, drugs, and so on. However, the significance of each topic and its exact content vary from country to country. Relevance: The result of topic modeling on songs' lyrics can be used in cultural and comparative studies, comparative analysis of the countries, historical analysis of trends over time, marketing, but also just to give an insight into a certain audience's preference.
|12:10-12:40||Student talk Seminar Room 23.21.00.46
▼ Microsyntactic Unit Analysis using Word Embedding Models - Experiments on Slavic Languages - Iuliia Zaitova, Irina Stenger, Tania AvgustinovaClick here to download the slides.
Microsyntactic units have been defined as language-specific transitional entities between lexicon and grammar, which idiomatic properties are closely tied to syntax. While these units are abundant and diverse, they are typically described based on individual constructions,making their comprehensive understanding difficult. This study proposes a novel approach to detect microsyntacticunits using Word Embedding Models (WEMs) trained on six Slavic languages, namely, Belarusian, Ukrainian, Russian, Bulgarian, Czech, and Polish and evaluates how well these models capture the nuances of syntactic compositionality. To address this challenge, we apply two different WEMs that previously proved effective at idiomaticity detection, namely Word2Vec CBOW and Context2Vec, as well as three adaptations of Word2Vec for syntactic tasks, namely Word2Vec CWINDOW, Word2Vec Structured, and Node2Vec. The training data is sourced primarily from the Leipzig Corpora Collection and the Russian National Corpus. To evaluate the models, we develop a cross-lingual inventory of microsyntactic units using the lists of microsyntantic units available at the Russian National Corpus. We extracted 50 most frequent microsyntactic units from each category (prepositions, adverbial and predicatives, parenthetical expressions, conjunctions, and particles), resulting in parallel sets of 227 microsyntactic units with their context sentences, each for one of the six Slavic languages under analysis. Our results demonstrate the effectiveness of WEMs in capturing microsyntactic units and identifying their compositionality. We find that simple Word2Vec embedding models adapted for syntactic tasks perform best, even when compared to neural-based DSMs. We show that the behavior of WEMs is consistent across all six Slavic languages under analysis, validating our proposed approach as applicable and effective for identifying microsyntactic units. Our findings contribute to the theory of microsyntax by providing insights into the detection of microsyntactic units and their crosslinguistic properties. Our approach has practical applications in natural language processing, machine translation, and computational linguistics, where the identification of microsyntactic units can improve the accuracy of tasks such as syntactic parsing and named entity recognition.
Student talk Seminar Room 23.21.00.048
▼ How do You measure Style (And Much More) - Mikhail SonkinClick here to download the slides.
If you were to give ten English scholars two English texts and ask which one was written by Jane Austen and which – by Charlotte Brontë, they would most probably have no difficulty in answering correctly. However, if you were to ask them to explain the motivation behind their answer, their responses would most definitely differ from each other. This little thought experiment begs the question: how do you automate that task? What exactly do you give to a computer to make it understand that two texts are written by different authors? Moreover, how do you make your algorithm not depend on a particular language? This is the problem of automatic authorship attribution. To resolve this, many scholars have tried to involve statistical methods. Only one rather simple method, however, seemed to stick – Burrow's Delta. Invented by John Burrows in 2002, the Delta Analysis has proved in time to be a robust instrument for authorship attribution. To identify, whether a document was written by Author A or Author B, you would need to collect a corpus of written texts and see, which author's “style” the document is most similar to. Of course, the scope of stylometry goes beyond authorship: many methods have been derived to compare different stylistic qualities of two authors. In that context, we will discuss one instrument in particular: the Zeta Analysis, which compares two corpora by extracting their keywords. In this talk, we will look “under the hood” of the two methods and try to understand how exactly they succeed in their tasks, what their advantages and drawbacks are. We will also discuss several cases in which these instruments serve a different function, such as: Assessing the quality of a parody, finding differences in characters' speech in dramatic works based on gender and family relation, detecting the translator's influence on a literary text. Come join and learn about how Digital Humanities deals with the intricacies of stylometry!
|12:40 - 13:30||Lunch University Mensa|
|13:30-15:30||Career Networking Meet-up Seminar Room 23.21.00.44
Opportunity to discover future career paths and learn from people in the industry and academia. Sponsors from industry and academics from HHU will be there to tell you about their careers in computational linguistics and answers questions about yours in an informal atmosphere. (There will be coffee)
|15:30 - 16:30||Keynote talk by Apl.Prof Wiebke Petersen Lecture Hall 3F
▼ On representation techniques in Panini's grammar of Sanskrit - Solving an ancient problemClick here to download the slides.
Panini's grammar of Sanskrit is one of the oldest recorded grammars (~350 BC), that has earned universal admiration among linguists: 'The descriptive grammar of Sanskrit, which Pānini brought to its perfection, is one of the greatest monuments of human intelligence and an indispensable model for the description of languages' (Bloomfield 1929). Being a grammar designed for an oral tradition it uses representation techniques that aim at compactness, e.g., a semi-formalized meta-language and an intricate system of conventions governing rule applications. In the talk I will introduce some of the techniques and focus on Panini's representation of phonological classes as intervals of a list. Already early commentators have asked whether this list is optimal with respect to length. I will show how Formal Concept Analysis can answer this question and why it is worth to know this analysis technique.
|16:30 - Open||Boardgames|
|9:45-10:00||Day 3 Overview Seminar Room 23.21.00.046, Seminar Room 23.21.00.48
Information for the day
|10:00-11:00||Workshop Seminar Room 23.21.00.46
▼ From Zero to Terminal Hero - Akhilesh Kakolu RamaraoClick here to download the slides.
This workshop will provide an introduction to the Linux terminal and Vim editor. You will learn how to navigate and manipulate files and directories using terminal interface, as well as execute basic system operations such as managing processes and installing softwares. The second half of the workshop will focus on the introduction to the Vim editor, a powerful and customizable text editor widely used by programmers. You will learn how to navigate, edit, and save files in Vim, and other commonly-used commands. By the end of the workshop, you will have gained a solid foundation in using both Terminal and Vim, and be able to use it confidently in your daily work. ---Pre-requisites--- For Windows Users: You need to have WSL2 installed on your laptop before the session. You can do so by following this guide: https://docs.slam.phil.hhu.de/#/wsl For MacOS Users: No prerequisites required. For Linux Users: No prerequisites required.
|10:00-10:30||Talk Seminar Room 23.21.00.48
▼ SEMSAI: Self-Referential Multi-Scale Modelling and Simulation of Severe Infectious Diseases Annegret JanzsoClick here to download the slides.
As seen for example during the Covid-19 pandemic, predictions on number of infections tend to be overshot by models, causing a decrease in acceptance in simulation-based predictions. This also causes the population to have less trust in the information itself, as well as those who convey it. To tackle this, the SEMSAI project aims to improve predictions to better mirror reality, by also portraying a change in population behavior based on given predictions. This reflexive modeling process is used to regain trust in conveyed information and give better information on what the best call for action is in a given situation. The influence of (scientific) communication on human behavior, as well as generally introducing agent-based cognitive social simulations as a tool that has interesting applications within linguistics and computational linguistics, which is not yet used much in those fields.
|11:10-11:30||Coffee Break Seminar Room 23.21.00.44|
|11:30-12:40||Workshop Seminar Room 23.21.00.046
▼ One word, a thousand pictures. Text to image Generation with Stable Diffusion - Adrienne WrightGoogle Colab Notebook Slides
Diffusion models are capable of generating images from text prompts. How do these models encode linguistic information and represent it pictorially? The first half of this workshop will present the basic architecture behind these models, followed by a hands-on latter half, where participants will be able to use a GPU in a custom Colab notebook to transform text into images with Stable Diffusion. We will dive into the computational and linguistic theory behind prompt engineering and parameter adjustment and try it out for ourselves. This workshop draws on my work in the LMU master seminars 'Computational Creativity' at the Centre for Information and Speech Processing, in which I designed a twitter bot that responded to tweets of dreams with pictorial dream sequences (https://github.com/gitovska/hallie-sue-nation) and 'Creating Art(efacts): Computer-based Image Generation and Editing' to be taken this semester at the Computer Vision and Learning Group, where Stable Diffusion was developed.
WorkshopSeminar Room 23.21.00.048
▼Hey Mycroft, let's play a game! - Developing skills for an open source voice assistant - Mikhail Sonkin, Katja Konermann[SLIDES]
Voice assistants are all around us. For some of them, such as Alexa by Amazon, it is possible to develop your own applications. However, voice assistants made by large companies often remain a black box, as most of their code base is proprietary. Additionally, privacy concerns might make some people reluctant to use these assistants or develop skills for them. In this workshop, we will take a look at an alternative – the open source voice assistant Mycroft by Mycroft AI. What are its upsides compared to its market-dominating competitors? Where does it fall short? Primarily, we will focus on guiding you through the development of a Mycroft Skill, explaining key components crucial for the design of a successful user interaction: Launching a skill with an intent, Responding to an utterance, asking follow-up questions, extracting relevant information from the user's utterance, finding information on the Internet, storing data, handling errors and designing fall-back answers, by the end, we will have our own module fully integrated into Mycroft's skill set. Join us to play around with Mycroft!
|12:40 - 13:30||Lunch Seminar Room 23.21.00.044|
|14:30-15:00||Snacks Haus der Universität|
|15:00-16:00||Keynote talk by Univ. Prof. Dr. Kevin Tang Haus der Universität
▼The use of computational models to determine acoustic and syntactic variations in Parkinson's Disease patients.Speech can be used as a non-invasive biomarker to capture fine changes in speech patterns in normal populations and individuals diagnosed with neuromotor disorders, such as Parkinson's Disease (PD). In this talk, I will demonstrate how computational models that are linguistically-informed can quantify acoustic and syntactic variations in PD patients. Paper ressources: 1) From sonority hierarchy to posterior probability as a measure of lenition: The case of Spanish stops 2) Quantitative Acoustic versus Deep Learning Metrics of Lenition 3) Lenition measures: Neural networks’ posterior probability vs. acoustic cues 4) Measuring Gradient Effects of Alcohol on Speech with Neural Networks’ Posterior Probability of Phonological Features 5) Language production in Parkinson's Disease Poster: 1) UF undergraduate research symposium poster presentation
|16:00 - Open||Closing plenary Haus der Universität
Closing talk from the organisation team, thanks and shoutouts
|10:00 - 13:00||Brunch ALEX Restaurant
It's a tradition to go to a group brunch with everyone on the last day. This is not mandatory, but is a nice way to end the conference. Participation is only for people who have signed up for reservation.