Login | Register

Comparison of Sequence-to-Sequence and Retrieval Approaches on the Code Summarization and Code Generation Tasks

Title:

Comparison of Sequence-to-Sequence and Retrieval Approaches on the Code Summarization and Code Generation Tasks

Chausseau, Nicolas (2021) Comparison of Sequence-to-Sequence and Retrieval Approaches on the Code Summarization and Code Generation Tasks. Masters thesis, Concordia University.

[thumbnail of Chausseau_MCompSc_S2021.pdf]
Preview
Text (application/pdf)
Chausseau_MCompSc_S2021.pdf - Accepted Version
Available under License Spectrum Terms of Access.
2MB

Abstract

In this study, we evaluate and compare state-of-the-art models on the code generation and code summarization tasks (English-to-code and code-to-English). We compare the performance of neural seq2seq BiLSTM [Yin et al. 2018] and attentional-GRU architectures [LeClair et al. 2019], along with that of a semantic code search model reproduced from [Sachdev et al. 2018]. We compare these three models' BLEU scores (1) on their original study datasets as well as (2) on additional benchmark datasets [Yin et al. 2018, Sennrich et al. 2018, LeClair et al. 2019], each time for translation and back-translation (i.e. English-to-code and code-to-English). We observe that, surprisingly, semantic code search performs best overall, surpassing the seq2seq models on 5 task-dataset combinations out of 8. We find that the seq2seq BiLSTM always outperforms the attentional-GRU, including on the relatively large (2M pairs) Javadoc-based dataset from the original attentional-GRU study, setting a new high score on that dataset, higher than four previous published studies.
However, we also observe that model scores remain low on several datasets. Some test-set questions are harder to answer due to a lack of relevant examples in the training-set. We introduce a new procedure for estimating the degree of novelty, and difficulty of any given test-set question. We use the BLEU score of the highest-scoring training-set entry as reference point for model scores on the question, a procedure which we call BLEU Optimal Search, or BOS. The BOS score (i) allows us to generate an information retrieval ceiling for model scores for each test-set question, (ii) can help to shed light on the seq2seq models' capacity to generalize to novel, unseen questions on any dataset, and (iii) helps to identify dataset-artifacts, by inspecting the rare model answers that score above it. We observe that the BOS is not reliably surpassed by the seq2seq models, except in the presence of dataset-artifacts (such as when the first words of the question contains the answer), and call for further empirical investigation.

Divisions:Concordia University > Gina Cody School of Engineering and Computer Science > Computer Science and Software Engineering
Item Type:Thesis (Masters)
Authors:Chausseau, Nicolas
Institution:Concordia University
Degree Name:M. Comp. Sc.
Program:Computer Science
Date:30 March 2021
Thesis Supervisor(s):Rigby, Peter C.
ID Code:988393
Deposited By: NICOLAS CHAUSSEAU GABORIAULT
Deposited On:29 Jun 2021 23:20
Last Modified:29 Jun 2021 23:20
All items in Spectrum are protected by copyright, with all rights reserved. The use of items is governed by Spectrum's terms of access.

Repository Staff Only: item control page

Research related to the current document (at the CORE website)
- Research related to the current document (at the CORE website)
Back to top Back to top