A code property graph of a program is a graph representation of the program obtained by merging its
abstract syntax trees (AST),
control-flow graphs (CFG) and
program dependence graphs (PDG) at statement and predicate nodes. The resulting graph is a property graph, which is the underlying graph model of
graph databases such as
Neo4j,
JanusGraph and
OrientDB where data is stored in the nodes and edges as
key-value pairs. In effect, code property graphs can be stored in graph databases and queried using graph query languages.
The code property graph of the function is obtained by merging its abstract syntax tree, control-flow graph, and program dependence graph at statements and predicates as seen in the following figure:
Implementations
Joern CPG. The original code property graph was implemented for C/C++ in 2013 at
University of Göttingen as part of the open-source code analysis tool Joern.[14] This original version has been discontinued and superseded by the open-source Joern Project,[15] which provides a formal code property graph specification[16] applicable to multiple programming languages. The project provides code property graph generators for C/C++, Java, Java bytecode, Kotlin, Python, JavaScript, TypeScript, LLVM bitcode, and x86 binaries (via the
Ghidra disassembler).
Plume CPG. Developed at
Stellenbosch University in 2020 and sponsored by Amazon Science, the open-source Plume[17] project provides a code property graph for Java bytecode compatible with the code property graph specification provided by the Joern project. The two projects merged in 2021.
Fraunhofer AISEC CPG. The
Fraunhofer Institute for Applied and Integrated Security [
de] provides open-source code property graph generators for C/C++, Java, Golang, and Python,[18] albeit without a formal schema specification. It also provides the Cloud Property Graph,[19] an extension of the code property graph concept that models details of cloud deployments.
Galois’ CPG for LLVM. Galois Inc. provides a code property graph based on the
LLVM compiler.[20] The graph represents code at different stages of the compilation and a mapping between these representations. It follows a custom schema that is defined in its documentation.
Machine learning on code property graphs
Code property graphs provide the basis for several machine-learning-based approaches to vulnerability discovery. In particular,
graph neural networks (GNN) have been employed to derive vulnerability detectors.[21][22][23][24][25][26][27]
^Wi, Seongil; Woo, Sijae; Whang, Joyce Jiyoung; Son, Sooel (25 April 2022). "HiddenCPG: Large-Scale Vulnerable Clone Detection Using Subgraph Isomorphism of Code Property Graphs". Proceedings of the ACM Web Conference 2022. pp. 755–766.
doi:
10.1145/3485447.3512235.
ISBN9781450390965.
S2CID248367462.
^Bowman, Benjamin; Huang, H. Howie (September 2020). "VGRAPH: A Robust Vulnerable Code Clone Detection System Using Code Property Triplets". 2020 IEEE European Symposium on Security and Privacy (EuroS&P). pp. 53–69.
doi:
10.1109/EuroSP48549.2020.00012.
ISBN978-1-7281-5087-1.
S2CID226268429.
^Du, Xiaoning; Chen, Bihuan; Li, Yuekang; Guo, Jianmin; Zhou, Yaqin; Liu, Yang; Jiang, Yu (May 2019). "LEOPARD: Identifying Vulnerable Code for Vulnerability Assessment Through Program Metrics". 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE). pp. 60–71.
arXiv:1901.11479.
doi:
10.1109/ICSE.2019.00024.
ISBN978-1-7281-0869-8.
S2CID59523689.
^Haojie, Zhang; Yujun, Li; Yiwei, Liu; Nanxin, Zhou (December 2021). "Vulmg: A Static Detection Solution for Source Code Vulnerabilities Based on Code Property Graph and Graph Attention Network". 2021 18th International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP). pp. 250–255.
doi:
10.1109/ICCWAMTIP53232.2021.9674145.
ISBN978-1-6654-1364-0.
S2CID246039350.
^Zheng, Weining; Jiang, Yuan; Su, Xiaohong (October 2021). "Vu1SPG: Vulnerability detection based on slice property graph representation learning". 2021 IEEE 32nd International Symposium on Software Reliability Engineering (ISSRE). pp. 457–467.
doi:
10.1109/ISSRE52982.2021.00054.
ISBN978-1-6654-2587-2.
S2CID246751595.
^Chakraborty, Saikat; Krishna, Rahul; Ding, Yangruibo; Ray, Baishakhi (2021). "Deep Learning based Vulnerability Detection: Are We There Yet". IEEE Transactions on Software Engineering. 48 (9): 3280–3296.
arXiv:2009.07235.
doi:
10.1109/TSE.2021.3087402.
S2CID221703797.
^Zhou, Li; Huang, Minhuan; Li, Yujun; Nie, Yuanping; Li, Jin; Liu, Yiwei (October 2021). "GraphEye: A Novel Solution for Detecting Vulnerable Functions Based on Graph Attention Network". 2021 IEEE Sixth International Conference on Data Science in Cyberspace (DSC). pp. 381–388.
arXiv:2202.02501.
doi:
10.1109/DSC53577.2021.00060.
ISBN978-1-6654-1815-7.
S2CID246634824.
^Ganz, Tom; Härterich, Martin; Warnecke, Alexander; Rieck, Konrad (15 November 2021). "Explaining Graph Neural Networks for Vulnerability Discovery". Proceedings of the 14th ACM Workshop on Artificial Intelligence and Security. pp. 145–156.
doi:10.1145/3474369.3486866.
ISBN9781450386579.
S2CID240001850.
^Duan, Xu; Wu, Jingzheng; Ji, Shouling; Rui, Zhiqing; Luo, Tianyue; Yang, Mutian; Wu, Yanjun (August 2019). "VulSniper: Focus Your Attention to Shoot Fine-Grained Vulnerabilities". Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence. pp. 4665–4671.
doi:10.24963/ijcai.2019/648.
ISBN978-0-9992411-4-1.
S2CID199466292.