nach oben

Automated Software Engineering

Erschienen in:

01.05.2024

Distilled GPT for source code summarization

verfasst von: Chia-Yi Su, Collin McMillan

Erschienen in: Automated Software Engineering | Ausgabe 1/2024

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

A code summary is a brief natural language description of source code. Summaries are usually only a single sentence long, and yet form the backbone of developer documentation. A short descriptions such as “changes all visible polygons to the color blue” can give a programmer a high-level idea of what code does without the effort of reading the code itself. Recently, products based on Large Language Models such as ChatGPT have demonstrated a strong ability to write these descriptions automatically. However, to use these tools, programmers must send their code to untrusted third parties for processing (e.g., via an API call). This loss of custody is not acceptable to many organizations. In this paper, we present an alternative: we train an open source model using sample output generated by GPT\(-\)3.5 in a process related to knowledge distillation. Our model is small enough (350 m parameters) to be run on a single 16gb GPU, yet we show in our evaluation that it is large enough to mimic GPT\(-\)3.5 on this task.

Vorheriger Artikel Sound analysis and migration of data from Ethereum smart contracts

Nächster Artikel GenerativeGI: creating generative art with genetic improvement

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

https://www.prolific.co/

Aghajani, E., Nagy, C., Vega-Márquez, O.L., et al.: Software documentation issues unveiled. In: 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE), IEEE, pp. 1199–1210 (2019)

Ahmad, W., Chakraborty, S., Ray, B., et al.: A transformer-based approach for source code summarization. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, pp. 4998–5007, https://doi.org/10.18653/v1/2020.acl-main.449 (2020) https://aclanthology.org/2020.acl-main.449

Allamanis, M., Barr, E.T., Devanbu, P., et al.: A survey of machine learning for big code and naturalness. ACM Comput. Surv. 51(4) (2018a). https://doi.org/10.1145/3212695

Allamanis, M., Brockschmidt, M., Khademi, M.: Learning to represent programs with graphs. In: International Conference on Learning Representations, https://openreview.net/forum?id=BJOFETxR- (2018b)

Alon, U., Brody, S., Levy, O., et al.: code2seq: Generating sequences from structured representations of code. In: International Conference on Learning Representations https://openreview.net/forum?id=H1gKYo09tX (2019a)

Alon, U., Zilberstein, M., Levy, O., et al.: code2vec: Learning distributed representations of code. In: Proceedings of the ACM on Programming Languages 3(POPL):1–29 (2019b). https://doi.org/10.1145/3290353

Banerjee, S., Lavie, A.: Meteor: An automatic metric for mt evaluation with improved correlation with human judgments. In: Proceedings of the ACL workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization, pp. 65–72 (2005) https://aclanthology.org/W05-0909

Bansal, A., Eberhart, Z., Wu, L., et al.: A neural question answering system for basic questions about subroutines. In: 2021 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), pp. 60–71 (2021a) https://doi.org/10.1109/SANER50967.2021.00015

Bansal, A., Haque, S., McMillan, C.: Project-level encoding for neural source code summarization of subroutines. In: 2021 IEEE/ACM 29th International Conference on Program Comprehension (ICPC), IEEE, pp. 253–264 (2021b)

Bender, E.M., Gebru, T., McMillan-Major, A., et al.: On the dangers of stochastic parrots: can language models be too big? In: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency. Association for Computing Machinery, New York, NY, USA, FAccT ’21, pp. 610-623 (2021), https://doi.org/10.1145/3442188.3445922

Brown, T., Mann, B., Ryder, N., et al.: Language models are few-shot learners. In: Larochelle H, Ranzato M, Hadsell R, et al (eds.) Advances in neural information processing systems, vol. 33. Curran Associates, Inc., pp. 1877–1901 (2020) https://proceedings.neurips.cc/paper_files/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf

Chang, T.A., Bergen, B.K.: Language model behavior: A comprehensive survey. arXiv preprint arXiv:2303.11504 (2023)

Chen, Z., Jiang, F., Chen, J., et al.: Phoenix: democratizing chatgpt across languages. arXiv preprint arXiv:2304.10453 (2023)

Danilova, A., Naiakshina, A., Horstmann, S., et al.: Do you really code? designing and evaluating screening questions for online surveys with programmers. In: 2021 IEEE/ACM 43rd international conference on software engineering (ICSE), IEEE, pp. 537–548 (2021)

Delgado, R., Tibau, X.A.: Why Cohen’s kappa should be avoided as performance measure in classification. PloS one 14(9), e0222916 (2019)CrossRef

Dell, N., Vaidyanathan, V., Medhi, I., et al.: Yours is better! participant response bias in HCI. In: Proceedings of the Sigchi Conference on Human Factors in Computing Systems, pp. 1321–1330 (2012) https://doi.org/10.1145/2207676.2208589

Derner, E., Batistič, K.: Beyond the safeguards: exploring the security risks of chatgpt. arXiv preprint arXiv:2305.08005 (2023)

Donker, D., Hasman, A., Van Geijn, H.: Interpretation of low kappa values. Int. J. Bio Med. Comput. 33(1), 55–64 (1993)CrossRef

Forward, A., Lethbridge, T.C.: The relevance of software documentation, tools and technologies: A survey. In: Proceedings of the 2002 ACM Symposium on Document Engineering. Association for Computing Machinery, New York, NY, USA, DocEng ’02, pp. 26-33, (2002) https://doi.org/10.1145/585058.585065

Fowkes, J., Chanthirasegaran, P., Ranca, R., et al.: Autofolding for source code summarization. IEEE Transact. Softw. Eng. 43(12), 1095–1109 (2017). https://doi.org/10.1109/TSE.2017.2664836CrossRef

Gao, S., Chen, C., Xing, Z., et al.: A neural model for method name generation from functional description. In: 2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER), IEEE, pp. 414–421 (2019), https://doi.org/10.1109/SANER.2019.8667994

Ghorbani, A., Cassee, N., Robinson, D., et al.: Autonomy is an acquired taste: exploring developer preferences for github bots. In: 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE), IEEE, pp. 1405–1417 (2023)

Github: co-pilot. https://github.com/features/copilot (2022)

Gou, J., Yu, B., Maybank, S.J., et al.: Knowledge distillation: a survey. Int. J. Comput. Vis 129, 1789–1819 (2021)CrossRef

Gudibande, A., Wallace, E., Snell, C., et al.: The false promise of imitating proprietary llms. arXiv preprint arXiv:2305.15717 (2023)

Haiduc, S., Aponte, J., Moreno, L., et al.: On the use of automated text summarization techniques for summarizing source code. In: 2010 17th Working Conference on Reverse Engineering, IEEE, pp. 35–44 (2010) https://doi.org/10.1109/WCRE.2010.13

Haldar, R., Wu, L., Xiong, J., et al.: A multi-perspective architecture for semantic code search. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, pp. 8563–8568 (2020) https://doi.org/10.18653/v1/2020.acl-main.758, https://aclanthology.org/2020.acl-main.758

Haque, S., LeClair, A., Wu, L., et al.: Improved automatic summarization of subroutines via attention to file context. In: International Conference on Mining Software Repositories https://doi.org/10.1145/3379597.3387449 (2020)

Haque, S., Bansal, A., Wu, L., et al.: Action word prediction for neural source code summarization. In: 2021 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), pp. 330–341, https://doi.org/10.1109/SANER50967.2021.00038 (2021)

Haque, S., Eberhart, Z., Bansal, A., et al.: Semantic similarity metrics for evaluating source code summarization. In: Proceedings of the 30th IEEE/ACM International Conference on Program Comprehension, pp. 36–47 (2022) https://doi.org/10.1145/3524610.3527909

Hellendoorn, V.J., Sawant, A.A.: The growing cost of deep learning for source code. Commun. ACM 65(1), 31–33 (2021). https://doi.org/10.1145/3501261CrossRef

Hsieh, C.Y., Li, C.L., Yeh, C.K., et al.: Distilling step-by-step! outperforming larger language models with less training data and smaller model sizes. arXiv preprint arXiv:2305.02301 (2023)

Hu, X., Li, G., Xia, X., et al.: Deep code comment generation. In: Proceedings of the 26th Conference on Program Comprehension. Association for Computing Machinery, New York, NY, USA, ICPC ’18, pp. 200–210, https://doi.org/10.1145/3196321.3196334 (2018a)

Hu, X., Li, G., Xia, X., et al.: Summarizing source code with transferred API knowledge. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence. AAAI Press, IJCAI’18, p 2269-2275 (2018b)

Israel, G.D.: Determining sample size (1992)

Iyer, S., Konstas, I., Cheung, A., et al.: Summarizing source code using a neural attention model. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (vol. 1: Long Papers). Association for Computational Linguistics, Berlin, Germany, pp. 2073–2083, https://doi.org/10.18653/v1/P16-1195, https://aclanthology.org/P16-1195 (2016)

Jiang, S., Armaly, A., McMillan, C.: Automatically generating commit messages from diffs using neural machine translation. In: Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering. IEEE Press, ASE ’17, pp. 135-146 (2017)

LeClair, A., McMillan, C.: Recommendations for datasets for source code summarization. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 1 (Long and Short Papers), pp. 3931–3937 (2019)

LeClair, A., Jiang, S., McMillan, C.: A neural model for generating natural language summaries of program subroutines. In: Proceedings of the 41st International Conference on Software Engineering, IEEE Press, pp. 795–806, https://doi.org/10.1109/ICSE.2019.00087 (2019)

Li, J., Gui, L., Zhou, Y., et al.: Distilling chatgpt for explainable automated student answer assessment. arXiv preprint arXiv:2305.12962 (2023a)

Li, R., Allal, L.B., Zi, Y., et al.: Starcoder: May the Source Be with You! arXiv preprint arXiv:2305.06161 (2023b)

Li, Z., Wu, Y., Peng, B., et al.: Setransformer: a transformer-based code semantic parser for code comment generation. IEEE Transact. Reliab. 72(1), 258–273 (2023). https://doi.org/10.1109/TR.2022.3154773CrossRef

Liang, Y., Zhu, K.Q.: Automatic generation of text descriptive comments for code blocks. In: Proceedings of the thirty-second AAAI Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence. AAAI Press, AAAI’18/IAAI’18/EAAI’18 (2018)

Liu, S., Chen, Y., Xie, X., et al.: Retrieval-augmented generation for code summarization via hybrid GNN. In: International Conference on Learning Representations, https://openreview.net/forum?id=zv-typ1gPxA (2021)

Loyola, P., Marrese-Taylor, E., Matsuo, Y.: A neural architecture for generating natural language descriptions from source code changes. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (vol. 2: Short Papers). Association for Computational Linguistics, Vancouver, Canada, pp. 287–292, https://doi.org/10.18653/v1/P17-2045, https://aclanthology.org/P17-2045 (2017)

Lu, Y., Zhao, Z., Li, G., et al.: Learning to generate comments for API-based code snippets. In: Li, Z., Jiang, H., Li, G., et al. (eds.) Software Engineering and Methodology for Emerging Domains, pp. 3–14. Singapore, Springer Singapore (2019)CrossRef

Ma, W., Liu, S., Wang, W., et al.: The scope of chatgpt in software engineering: A thorough investigation. arXiv preprint arXiv:2305.12138 (2023)

McBurney, P.W., Liu, C., McMillan, C.: Automated feature discovery via sentence selection and source code summarization. J. Softw. Evol. Process 28(2), 120–145 (2016). https://doi.org/10.1002/smr.1768CrossRef

Nie, P., Rai, R., Li, J.J., et al.: A framework for writing trigger-action todo comments in executable format. In: Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. Association for Computing Machinery, New York, NY, USA, ESEC/FSE 2019, pp. 385–396 (2019) https://doi.org/10.1145/3338906.3338965

Novikova, J., Dušek, O., Cercas Curry, A., et al.: Why we need new evaluation metrics for NLG. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Copenhagen, Denmark, pp. 2241–2252, https://doi.org/10.18653/v1/D17-1238, https://aclanthology.org/D17-1238 (2017)

OpenAI: Chatgpt. https://openai.com/blog/chatgpt (2022)

Papineni, K., Roukos, S., Ward, T., et al.: Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting on association for computational linguistics, Association for Computational Linguistics, pp. 311–318, https://doi.org/10.3115/1073083.1073135 (2002)

Pérez-Mayos, L., Ballesteros, M., Wanner, L.: How much pretraining data do language models need to learn syntax? arXiv preprint arXiv:2109.03160 (2021)

Robillard, M.P., Marcus, A., Treude, .C, et al.: On-demand developer documentation. In: 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME), IEEE, pp. 479–483 (2017) https://doi.org/10.1109/ICSME.2017.17

Rodeghero, P., Jiang, S., Armaly, A., et al.: Detecting user story information in developer-client conversations to generate extractive summaries. In: 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE), pp. 49–59 (2017) https://doi.org/10.1109/ICSE.2017.13

Roy, D., Fakhoury, S., Arnaoudova, V.: Reassessing automatic evaluation metrics for code summarization tasks. In: Proceedings of the ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE) (2021) https://doi.org/10.1145/3468264.3468588

Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023)

Shi, L., Mu, F., Chen, X., et al.: Are we building on the rock? on the importance of data preprocessing for code summarization. In: Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. Association for Computing Machinery, ESEC/FSE 2022, pp. 107-119 (2022)

Sievertsen, H.H., Gino, F., Piovesan, M.: Cognitive fatigue influences students’ performance on standardized tests. Proc. Natl. Acad. Sci. 113(10), 2621–2624 (2016). https://doi.org/10.1073/pnas.1516947113CrossRef

Sridhara, G., Hill, E., Muppaneni, D., et al.: Towards automatically generating summary comments for java methods. In: Proceedings of the IEEE/ACM International Conference on Automated Software Engineering, ACM, pp. 43–52 (2010) https://doi.org/10.1145/1858996.1859006

Su, C.Y., Bansal, A., Jain, V., et al.: A language model of java methods with train/test deduplication. In: 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Demonstrations (FSE’23 Demos) (2023)

Sun, W., Fang, C., You, Y., et al.: Automatic code summarization via chatgpt: How far are we? arXiv preprint arXiv:2305.12865 (2023)

Tang, Y., da Costa, A.A.B., Zhang, J., et al.: Domain knowledge distillation from large language model: An empirical study in the autonomous driving domain. arXiv preprint arXiv:2307.11769 (2023)

Wan, Y., Zhao, Z., Yang, M., et al.: Improving automatic source code summarization via deep reinforcement learning. In: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. Association for Computing Machinery, New York, NY, USA, ASE ’18, pp. 397-407 (2018) https://doi.org/10.1145/3238147.3238206,

Wang, L., Yoon, K.J.: Knowledge distillation and student-teacher learning for visual intelligence: a review and new outlooks. IEEE Transact. Pattern Anal. Mach. Intell. 44(6), 3048–3068 (2021)CrossRef

Wang, Y., Wang, W., Joty, S., et al.: CodeT5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation. In: Moens MF, Huang X, Specia L, et al (eds) Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Online and Punta Cana, Dominican Republic, pp. 8696–8708, https://doi.org/10.18653/v1/2021.emnlp-main.685, https://aclanthology.org/2021.emnlp-main.685 (2021)

Xu, C., Xu, Y., Wang, S., et al.: Small models are valuable plug-ins for large language models. arXiv preprint arXiv:2305.08848 (2023)

Yu, Y., Zhuang, Y., Zhang, J., et al.: Large language model as attributed training data generator: a tale of diversity and bias. arXiv preprint arXiv:2306.15895 (2023)

Zagoruyko, S., Komodakis, N.: Paying more attention to attention: improving the performance of convolutional neural networks via attention transfer. In: International Conference on Learning Representations (2016)

Zhai, X., Kolesnikov, A., Houlsby, N., et al.: Scaling vision transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12104–12113 (2022)

Zhang, R., Han, J., Zhou, A., et al.: Llama-adapter: efficient fine-tuning of language models with zero-init attention. Parameters 7, 13B (2023)

Zügner, D., Kirschstein, T., Catasta, M., et al.: Language-agnostic representation learning of source code from structure and context. In: International Conference on Learning Representations (2021) https://openreview.net/forum?id=Xh5eMZVONGF

Titel: Distilled GPT for source code summarization
verfasst von: Chia-Yi Su
Collin McMillan
Publikationsdatum: 01.05.2024
Verlag: Springer US
Erschienen in: Automated Software Engineering / Ausgabe 1/2024
Print ISSN: 0928-8910
Elektronische ISSN: 1573-7535
DOI: https://doi.org/10.1007/s10515-024-00421-4

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 1/2024

Automated quantum software engineering

Software defect prediction: future directions and challenges

A security framework for mobile agent systems

Future of software development with generative AI

An extensive study of the effects of different deep learning models on code vulnerability detection in Python code

A class integration test order generation approach based on Sarsa algorithm

Premium Partner