【論文要約】"Aroma: Code Recommendation via Structural Code Search"

 


Abstract of this paper in 2 lines:

The paper proposes Aroma, a tool and technique for code recommendation via structural code search, which indexes a large code corpus and recommends relevant code snippets based on partial code input. [1]

Aroma retrieves and recommends relevant code snippets efficiently by using a matrix of sparse feature vectors and cosine similarity calculations. [2] [3]


Contributions of this paper:

Aroma is a tool and technique for code recommendation via structural code search, which indexes a large code corpus and recommends relevant code snippets based on partial code input. It clusters and intersects the results of the search to recommend a small set of succinct code snippets that contain the query snippet and appear in several methods in the corpus.

Aroma can be used to extend partially written code snippets, discover common extensions used by other programmers, cross-check against similar code written by others, and add extra code to fix common mistakes and errors. It has been implemented for four different languages and evaluated on 2000 randomly selected queries and 64 queries derived from code snippets obtained from Stack Overflow. The results indicate that Aroma is capable of efficiently retrieving and recommending relevant code snippets. [1]


Related papers:

1. Evaluating How Developers Use General-Purpose Web-Search for Code Retrieval
Masudur Rahman+8 others • 2018, arXiv: Software Engineering

2. CodeSearchNet Challenge: Evaluating the State of Semantic Code Search.
Hamel Husain+4 others • 2019, arXiv: Learning
492 citations

3. Siamese: scalable and incremental code clone search via multiple code representations
Chaiyong Ragkhitwetsagul+1 others • 2019, Empirical Software Engineering
51 citations

4. Building Bing Developer Assistant MSR-TR-2015-36
Yi Wei+3 others • 2015
5 citations

5. Sameness: an experiment in code search
Lee Martie+1 others • 2015, Mining Software Repositories
10 citations


Practical implications of this paper:

Aroma provides a tool and technique for code recommendation that can be used by programmers to search for similar code snippets and extend their own code.

It allows programmers to complete partially written code snippets, discover common extensions used by other programmers, cross-check against similar code written by others, and add extra code to fix common mistakes and errors.

Aroma has been implemented for four different languages and has been evaluated on a large code corpus, showing its capability to efficiently retrieve and recommend relevant code snippets.

The tool can be integrated into IDEs through the developed IDE plugin, making it convenient for programmers to use during their coding process.

The study conducted with 12 programmers using Aroma showed positive feedback, indicating that it is a useful tool for completing programming tasks and identifying common patterns in unfamiliar libraries.


Introduction of this paper:


Programmers often encounter situations where they need to search for similar code snippets to extend their own code or learn common usages.

Existing techniques like code-to-code search tools and pattern-based code completion tools have limitations in retrieving relevant and concise code snippets.

Aroma is proposed as a tool and technique for code recommendation via structural code search.

It indexes a large code corpus, takes a partial code snippet as input, and searches for method bodies containing the partial code snippet.

Aroma then clusters and intersects the search results to recommend a small set of succinct code snippets that both contain the query snippet and appear in multiple methods in the corpus.

The paper highlights the advantages of Aroma, including its ability to generate idiomatic recommendations, its flexibility to retrieve new and interesting code snippets, and its efficiency in real-time usage. [1] [2]


Literature survey of this paper:


Code-to-code search tools like FaCoY and Krugle retrieve relevant code snippets from a corpus based on a code query, but they do not provide concise recommendations from the search results.

Conventional code search techniques based on featurization and TF-IDF were compared with Aroma, and Aroma's pruning-based search technique outperformed both techniques.

Clone detectors like SourcererCC detect syntactically identical or highly similar code, but they are not suitable for code recommendation as they focus on finding highly similar code rather than code containing the query code snippet.

Other techniques like pattern mining and code completion, API documentation tools, and clone detection techniques have been explored, but they do not support code-to-code search and recommendation like Aroma does. [1]

Aroma's approach of structural code search and recommendation fills the gap in existing techniques and provides a valuable tool for programmers to find and extend similar code snippets. [2]


Methods used in this paper:


Aroma indexes a large code corpus, including thousands of open-source projects, to create a searchable database of code snippets.

Aroma takes a partial code snippet as input and searches the corpus for method bodies containing the partial code snippet.

Aroma clusters and intersects the search results to recommend a small set of succinct code snippets that both contain the query snippet and appear in multiple methods in the corpus.

The evaluation of Aroma involved creating 2000 randomly selected queries from the corpus and 64 queries derived from code snippets obtained from Stack Overflow.

Aroma was implemented for four different languages, and an IDE plugin was developed for Aroma.

A study was conducted where 12 programmers were asked to complete programming tasks using Aroma, and their feedback was collected.[1]

A comparison was made between Aroma and pattern-oriented code completion tools, specifically GraPacc, to evaluate Aroma's code recommendation capabilities.[2]

Note: The methods mentioned in the paper primarily focus on the implementation and evaluation of the Aroma tool for code recommendation via structural code search.



Data used in this paper:


Aroma indexes a large code corpus, including thousands of open-source projects, to create a searchable database of code snippets.

The evaluation of Aroma involved creating 2000 randomly selected queries from the corpus and 64 queries derived from code snippets obtained from Stack Overflow.

A study was conducted where 12 programmers were asked to complete programming tasks using Aroma, and their feedback was collected.

A comparison was made between Aroma and pattern-oriented code completion tools, specifically GraPacc, to evaluate Aroma's code recommendation capabilities. [1]

To substantiate the claim that new code often resembles existing code, an experiment was conducted on a large codebase in the Hack language. The corpus used for evaluation had over 37 million unique features. [2]

Note: The data used in this paper primarily consists of a large code corpus, including open-source projects, and queries derived from the corpus and Stack Overflow. Additionally, a study with programmers and a comparison with another code completion tool were conducted.


Results of the paper:


Aroma, a tool and technique for code recommendation via structural code search, was proposed and implemented for four different languages.

Aroma was evaluated on 2000 randomly selected queries from a code corpus and 64 queries derived from code snippets obtained from Stack Overflow.

The evaluation results indicated that Aroma is capable of efficiently retrieving and recommending relevant code snippets.

A comparison was made between Aroma and pattern-oriented code completion tools, and Aroma showed higher recall rates and the ability to create a precise ranked list of search results. [1]

A study was conducted where 12 programmers used Aroma to complete programming tasks, and their feedback was collected. [2]

The study results were not explicitly mentioned, but they likely provided insights into the effectiveness and usability of Aroma in real-world scenarios. [2]

Note: The specific quantitative results of the study with programmers were not provided in the available sources.


Algorithm of Aroma:


Aroma follows a multi-phase approach to generate code recommendations.

Featurization: Aroma parses the body of each method in the code corpus, creates parse trees, and extracts structural features from each parse tree .

Light-weight Search: Aroma takes a query code snippet, extracts custom features from the query and each method in the corpus, and computes the degree of overlap between the query and each method body. It outputs a list of the top few methods with the most overlap .

Prune and Rerank: Aroma reranks the list of method bodies retrieved from the previous phase using a more precise, but expensive algorithm for computing similarity .

Cluster and Intersect: Aroma clusters the reranked list of code snippets based on the similarity of method bodies. It then intersects the snippets in each cluster to generate a succinct and diverse set of code recommendations .

Note: The algorithm of Aroma involves featurization, light-weight search, prune and rerank, and cluster and intersect phases 
[
1
]
.

コメント

このブログの人気の投稿

【論文メモ】A systematic literature review on source code similarity measurement and clone detection: techniques, applications, and challenges

【論文メモ】<2022>コードクローン検索手法の調査

【論文】A Survey on Causal Inference<2021>