In this thesis, we report on our experiments toward multilingual discourse connective (or DC) identification and show how language-specific BERT models seem to be sufficient even with little task-specific training data and do not require any additional handcrafted features to achieve strong results. Although some languages are under-resourced and do not have large annotated discourse connective corpora. To address this, we developed a methodology to induce large synthetic discourse annotated corpora using a parallel word aligned corpus. We evaluated our models in 3 languages: English, Turkish, and Mandarin Chinese; and applied our induction methodology on English-Turkish and English-Chinese. All our models were evaluated in the context of the recent DISRPT 2021 Task 2 shared task. Results show that the F-measure achieved by our simple approach (93.12%, 94.42%, 87.47% for English, Turkish and Chinese) are near or at state-of-the-art for the 3 languages while being simple and not requiring any handcrafted features.