PyPI - mcpp - Versions diffs - 1.0.0__tar.gz - Mend

mcpp 1.0.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (21) hide show

mcpp-1.0.0/LICENSE +21 -0
mcpp-1.0.0/PKG-INFO +152 -0
mcpp-1.0.0/README.md +111 -0
mcpp-1.0.0/pyproject.toml +34 -0
mcpp-1.0.0/requirements.txt +6 -0
mcpp-1.0.0/setup.cfg +4 -0
mcpp-1.0.0/src/mcpp/__init__.py +11 -0
mcpp-1.0.0/src/mcpp/__main__.py +75 -0
mcpp-1.0.0/src/mcpp/assets/__init__.py +0 -0
mcpp-1.0.0/src/mcpp/assets/config.yaml +21 -0
mcpp-1.0.0/src/mcpp/complexity.py +92 -0
mcpp-1.0.0/src/mcpp/config.py +27 -0
mcpp-1.0.0/src/mcpp/parse.py +81 -0
mcpp-1.0.0/src/mcpp/queries.py +70 -0
mcpp-1.0.0/src/mcpp/vulnerability.py +272 -0
mcpp-1.0.0/src/mcpp.egg-info/PKG-INFO +152 -0
mcpp-1.0.0/src/mcpp.egg-info/SOURCES.txt +19 -0
mcpp-1.0.0/src/mcpp.egg-info/dependency_links.txt +1 -0
mcpp-1.0.0/src/mcpp.egg-info/entry_points.txt +2 -0
mcpp-1.0.0/src/mcpp.egg-info/requires.txt +6 -0
mcpp-1.0.0/src/mcpp.egg-info/top_level.txt +1 -0

mcpp-1.0.0/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) 2023 Lukas Pirch
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

mcpp-1.0.0/PKG-INFO ADDED Viewed

@@ -0,0 +1,152 @@
+Metadata-Version: 2.1
+Name: mcpp
+Version: 1.0.0
+Summary: McCabe++ (mcpp): cyclomatic complexity and other vulnerability-related code metrics
+Author-email: Lukas Pirch <lukas.pirch@tu-berlin.de>
+License: MIT License
+        Copyright (c) 2023 Lukas Pirch
+        Permission is hereby granted, free of charge, to any person obtaining a copy
+        of this software and associated documentation files (the "Software"), to deal
+        in the Software without restriction, including without limitation the rights
+        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+        copies of the Software, and to permit persons to whom the Software is
+        furnished to do so, subject to the following conditions:
+        The above copyright notice and this permission notice shall be included in all
+        copies or substantial portions of the Software.
+        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+        SOFTWARE.
+Keywords: vulnerability,code metric,static analysis
+Classifier: License :: OSI Approved :: MIT License
+Classifier: Programming Language :: Python
+Classifier: Programming Language :: Python :: 3
+Requires-Python: >=3.9
+Description-Content-Type: text/markdown
+License-File: LICENSE
+Requires-Dist: hydra-core>=1.3.2
+Requires-Dist: tree-sitter>=0.22.3
+Requires-Dist: tree-sitter-c>=0.21.4
+Requires-Dist: tree-sitter-cpp>=0.22.3
+Requires-Dist: tqdm>=4.66.4
+Requires-Dist: loguru>=0.7.2
+# McCabe++ (mcpp)
+<img src="https://github.com/LPirch/mcpp/blob/master/media/mcpp.jpeg?raw=true" height=400/>
+`mcpp` measures typical code complexity metrics like McCabe's cyclomatic
+complexity.
+The goal of this project is to provide a re-usable script to analyze C/C++
+source code and extract complexity metrics from it. The implemented metrics
+are taken from the [paper](https://xiaoningdu.github.io/assets/pdf/leopard.pdf)
+> LEOPARD: Identifying Vulnerable Code for Vulnerability Assessment through Program Metrics
+This tool is released as part of our research in vulnerability discovery and
+has been used in our paper
+> SoK: Where to Fuzz? Assessing Target Selection Methods in Directed Fuzzing"
+See also the corresponding [repo](https://github.com/wsbrg/crashminer).
+## Complexity Metrics
+| Dimension            | ID | Metric Description             |
+|----------------------|----|--------------------------------|
+| CD1: Function        | C1 | cyclomatic complexity          |
+| CD2: Loop Structures | C2 | number of loops                |
+|                      | C3 | number of nested loops         |
+|                      | C4 | maximum nesting level of loops |
+## Vulnerability Metrics
+| Dimension               | ID  | Metric Description                                                        |
+|-------------------------|-----|---------------------------------------------------------------------------|
+| VD1: Dependency         | V1  | number of parameter variables                                             |
+|                         | V2  | number of variables as parameters for callee function                     |
+| VD2: Pointers           | V3  | number of pointer arithmetic                                              |
+|                         | V4  | number of variables involved in pointer arithmetic                        |
+|                         | V5  | maximum number of pointer arithmetic operations a variable is involved in |
+| VD3: Control Structures | V6  | number of nested control structures                                       |
+|                         | V7  | maximum nesting level of control structures                               |
+|                         | V8  | maximum number of control-dependent control structures                    |
+|                         | V9  | maximum number of data-dependent control structures                       |
+|                         | V10 | number of if structures without else                                      |
+|                         | V11 | number of variables involved in control predicates                        |
+## Setup
+Build a docker container which performs the setup automatically or run the
+installation on your local machine:
+```sh
+pip install .
+```
+> Note: It is recommended to install packages in virtual environments.
+## Usage
+### From Python
+Simply import `mcpp` and then use the extract function (or one of its variants).
+```python
+from pathlib import Path
+from mcpp import extract
+input_dir = Path("some/dir")
+in_files = list(input_dir.glob("**/*.c"))
+result = extract(in_files)
+# to extract only a subset of the metrics
+result = extract(in_files, ["V1", "C3"])
+# full list of metrics:
+from mcpp import METRICS
+print(list(METRICS.keys()))
+```
+### CLI
+Configuration parameters can be changed in `config.yaml` or directly on the CLI
+with e.g. `mcpp paths.out_root=some/dir`.
+Using all defaults:
+```sh
+mcpp                # with default params like input directory, see config.yaml
+```
+Changing params from command line:
+```sh
+mcpp in_path=/some/dir/single_source out_path=single_source_metrics.json
+mcpp metrics=\[C1,C2,V4\]
+```
+Or by passing a changed `config.yaml`:
+- `-cp` (config_path) specifies the absolute path to the directory where the config file is located
+- `-cn` (config_name) specifies the name of the config file
+```sh
+mcpp -cp /some/other/dir -cn myconfig.yaml
+```
+Try out the example:
+```sh
+mcpp in_path=examples/data/source paths.out_root=examples/data-out
+cat examples/data-out/complexity.json
+```

mcpp-1.0.0/README.md ADDED Viewed

@@ -0,0 +1,111 @@
+# McCabe++ (mcpp)
+<img src="https://github.com/LPirch/mcpp/blob/master/media/mcpp.jpeg?raw=true" height=400/>
+`mcpp` measures typical code complexity metrics like McCabe's cyclomatic
+complexity.
+The goal of this project is to provide a re-usable script to analyze C/C++
+source code and extract complexity metrics from it. The implemented metrics
+are taken from the [paper](https://xiaoningdu.github.io/assets/pdf/leopard.pdf)
+> LEOPARD: Identifying Vulnerable Code for Vulnerability Assessment through Program Metrics
+This tool is released as part of our research in vulnerability discovery and
+has been used in our paper
+> SoK: Where to Fuzz? Assessing Target Selection Methods in Directed Fuzzing"
+See also the corresponding [repo](https://github.com/wsbrg/crashminer).
+## Complexity Metrics
+| Dimension            | ID | Metric Description             |
+|----------------------|----|--------------------------------|
+| CD1: Function        | C1 | cyclomatic complexity          |
+| CD2: Loop Structures | C2 | number of loops                |
+|                      | C3 | number of nested loops         |
+|                      | C4 | maximum nesting level of loops |
+## Vulnerability Metrics
+| Dimension               | ID  | Metric Description                                                        |
+|-------------------------|-----|---------------------------------------------------------------------------|
+| VD1: Dependency         | V1  | number of parameter variables                                             |
+|                         | V2  | number of variables as parameters for callee function                     |
+| VD2: Pointers           | V3  | number of pointer arithmetic                                              |
+|                         | V4  | number of variables involved in pointer arithmetic                        |
+|                         | V5  | maximum number of pointer arithmetic operations a variable is involved in |
+| VD3: Control Structures | V6  | number of nested control structures                                       |
+|                         | V7  | maximum nesting level of control structures                               |
+|                         | V8  | maximum number of control-dependent control structures                    |
+|                         | V9  | maximum number of data-dependent control structures                       |
+|                         | V10 | number of if structures without else                                      |
+|                         | V11 | number of variables involved in control predicates                        |
+## Setup
+Build a docker container which performs the setup automatically or run the
+installation on your local machine:
+```sh
+pip install .
+```
+> Note: It is recommended to install packages in virtual environments.
+## Usage
+### From Python
+Simply import `mcpp` and then use the extract function (or one of its variants).
+```python
+from pathlib import Path
+from mcpp import extract
+input_dir = Path("some/dir")
+in_files = list(input_dir.glob("**/*.c"))
+result = extract(in_files)
+# to extract only a subset of the metrics
+result = extract(in_files, ["V1", "C3"])
+# full list of metrics:
+from mcpp import METRICS
+print(list(METRICS.keys()))
+```
+### CLI
+Configuration parameters can be changed in `config.yaml` or directly on the CLI
+with e.g. `mcpp paths.out_root=some/dir`.
+Using all defaults:
+```sh
+mcpp                # with default params like input directory, see config.yaml
+```
+Changing params from command line:
+```sh
+mcpp in_path=/some/dir/single_source out_path=single_source_metrics.json
+mcpp metrics=\[C1,C2,V4\]
+```
+Or by passing a changed `config.yaml`:
+- `-cp` (config_path) specifies the absolute path to the directory where the config file is located
+- `-cn` (config_name) specifies the name of the config file
+```sh
+mcpp -cp /some/other/dir -cn myconfig.yaml
+```
+Try out the example:
+```sh
+mcpp in_path=examples/data/source paths.out_root=examples/data-out
+cat examples/data-out/complexity.json
+```

mcpp-1.0.0/pyproject.toml ADDED Viewed

@@ -0,0 +1,34 @@
+[project]
+name = "mcpp"
+version = "1.0.0"
+description = "McCabe++ (mcpp): cyclomatic complexity and other vulnerability-related code metrics"
+readme = "README.md"
+authors = [{name = "Lukas Pirch", email="lukas.pirch@tu-berlin.de"}]
+license = {file = "LICENSE"}
+classifiers = [
+    "License :: OSI Approved :: MIT License",
+    "Programming Language :: Python",
+    "Programming Language :: Python :: 3",
+]
+keywords = ["vulnerability", "code metric", "static analysis"]
+requires-python = ">=3.9"
+dynamic = ["dependencies"]
+[tool.setuptools.dynamic]
+dependencies = {file = ["requirements.txt"]}
+[project.scripts]
+mcpp = "mcpp.__main__:main"
+[tool.setuptools.package-data]
+mcpp = ["assets/*.yaml"]
+[build-system]
+requires = [
+  "setuptools >= 40.9.0",
+]
+build-backend = "setuptools.build_meta"
+[tool.black]
+target-version = ["py311"]
+line-length = 120

mcpp-1.0.0/requirements.txt ADDED Viewed

@@ -0,0 +1,6 @@
+hydra-core>=1.3.2
+tree-sitter>=0.22.3
+tree-sitter-c>=0.21.4
+tree-sitter-cpp>=0.22.3
+tqdm>=4.66.4
+loguru>=0.7.2

mcpp-1.0.0/setup.cfg ADDED Viewed

@@ -0,0 +1,4 @@
+[egg_info]
+tag_build =
+tag_date = 0

mcpp-1.0.0/src/mcpp/__init__.py ADDED Viewed

@@ -0,0 +1,11 @@
+import os
+from importlib import resources
+from mcpp.__main__ import extract, extract_single, METRICS
+with resources.path("mcpp", "__init__.py") as root_path:
+    PKG_ROOT = root_path.parents[1].resolve()
+    REPO_ROOT = PKG_ROOT.parents[0].resolve()
+    os.environ['PKG_ROOT'] = str(PKG_ROOT)
+    os.environ['REPO_ROOT'] = str(REPO_ROOT)

mcpp-1.0.0/src/mcpp/__main__.py ADDED Viewed

@@ -0,0 +1,75 @@
+import json
+from pathlib import Path
+from typing import List
+from collections import defaultdict
+from importlib.resources import files
+import hydra
+from tqdm import tqdm
+from mcpp.config import Config
+from mcpp.parse import Sitter, get_call_names
+from mcpp.complexity import c1, c2, c3_c4
+from mcpp.vulnerability import v1, v2, v3_v4, v5, v6_v7, v8, v9, v10, v11
+with files("mcpp.assets") / "config.yaml" as p:
+    config_path = str(p.parent)
+    config_name = str(p.name)
+METRICS = {
+    "C1": c1,
+    "C2": c2,
+    "C3": c3_c4,
+    "C4": c3_c4,
+    "V1": v1,
+    "V2": v2,
+    "V3": v3_v4,
+    "V4": v3_v4,
+    "V5": v5,
+    "V6": v6_v7,
+    "V7": v6_v7,
+    "V8": v8,
+    "V9": v9,
+    "V10": v10,
+    "V11": v11
+}
+@hydra.main(
+    version_base=None,
+    config_path=config_path,
+    config_name=config_name)
+def main(cfg: Config):
+    if cfg.in_path.is_dir():
+        in_files = tqdm(list(cfg.in_path.glob("**/source")))
+    else:
+        in_files = [cfg.in_path]
+    results = extract(in_files, cfg.metrics)
+    with open(cfg.out_path, "w") as f:
+        json.dump(results, f, indent=4)
+def extract(in_files: List[Path], metrics: List[str] = list(METRICS.keys())):
+    metrics = [fun for name, fun in METRICS.items() if name in metrics]
+    sitter = Sitter("c", "cpp")
+    results = defaultdict(dict)
+    for path in in_files:
+        res = {}
+        tree, lang = sitter.parse_file(path)
+        root = tree.root_node
+        calls = set(get_call_names(sitter, root, lang))
+        for fun in metrics:
+            res.update(fun(root, sitter, lang, calls))
+        results[str(path)] = res
+    return results
+def extract_single(in_file: Path, metrics: List[str]):
+    return extract([in_file], metrics)
+if __name__ == '__main__':
+    main()

mcpp-1.0.0/src/mcpp/assets/__init__.py ADDED Viewed

File without changes

mcpp-1.0.0/src/mcpp/assets/config.yaml ADDED Viewed

@@ -0,0 +1,21 @@
+defaults:
+  - /mcpp.config
+  - _self_
+in_path: ${paths.data_root}/CrashMiner/functions
+out_path: ${paths.out_root}/complexity.json
+metrics: [C1, C2, C3, C4, V1, V2, V3, V4, V5, V6, V7, V8, V9, V10, V11]
+paths:
+  repo_root: ${oc.env:REPO_ROOT}
+  lib_root: ${paths.repo_root}/lib
+  data_root: ./data
+  out_root: ./out-data
+  log_root: ${paths.out_root}/logs
+hydra:
+  run:
+    dir: ${paths.log_root}/mcpp-${now:%Y-%m-%d-%H-%M-%S}

mcpp-1.0.0/src/mcpp/complexity.py ADDED Viewed

@@ -0,0 +1,92 @@
+from mcpp.parse import Sitter
+from mcpp.queries import Q_FOR_STMT, Q_DO_STMT, Q_WHILE_STMT, \
+    Q_BINARY_EXPR, Q_CONDITION
+def c1(root, sitter, lang, calls=None):
+    """Cyclomatic complexity (McCabe):
+        number conditional predicates + number of loop statements + 1
+    """
+    sitter.add_queries({
+        "Q_BINARY_EXPR": Q_BINARY_EXPR,
+        "Q_CONDITION": Q_CONDITION,
+        "Q_FOR_STMT": Q_FOR_STMT,
+        "Q_DO_STMT": Q_DO_STMT,
+        "Q_WHILE_STMT": Q_WHILE_STMT
+    })
+    logical_ops = [
+        "&", "&&",
+        "|", "||"
+    ]
+    complexity = c2(root, sitter, lang, calls)["C2"]
+    conditions = sitter.captures("Q_CONDITION", root, lang)
+    for condition, tag in conditions:
+        if tag == "condition":
+            bin_expr = sitter.captures("Q_BINARY_EXPR", condition, lang)
+            for expr, _ in bin_expr:
+                if len(expr.children) != 3:
+                    continue
+                left, op, right = expr.children
+                if op.text.decode() in logical_ops:
+                    complexity += 1
+    complexity += 1
+    return {
+        "C1": complexity
+    }
+def c2(root, sitter, lang, calls=None):
+    """number of for, while and do-while loops"""
+    sitter.add_queries({
+        "Q_FOR_STMT": Q_FOR_STMT,
+        "Q_WHILE_STMT": Q_WHILE_STMT
+    })
+    complexity = 0
+    for query in ("Q_FOR_STMT", "Q_WHILE_STMT"):
+        complexity += len(sitter.captures(query, root, lang))
+    return {
+        "C2": complexity
+    }
+def c3_c4(root, sitter, lang, calls=None):
+    """
+    C3: number of nested for, while and do-while loops
+    C4: maximum nesting depth
+    - count all loops that have some loop ancestor
+    - count ancestors that are also loops
+    """
+    sitter.add_queries({
+        "Q_FOR_STMT": Q_FOR_STMT,
+        "Q_DO_STMT": Q_DO_STMT,
+        "Q_WHILE_STMT": Q_WHILE_STMT
+    })
+    c3_val = 0
+    c4_val = 0
+    for query in ("Q_FOR_STMT", "Q_DO_STMT", "Q_WHILE_STMT"):
+        for loop_node, _ in sitter.captures(query, root, lang):
+            nesting_level = _loop_nesting_level(loop_node)
+            if nesting_level > 0:
+                c3_val += 1
+            c4_val = max(c4_val, nesting_level)
+    return {
+        "C3": c3_val,
+        "C4": c4_val
+    }
+def _loop_nesting_level(node):
+    loop_types = [
+        "do_statement",
+        "while_statement",
+        "for_statement"
+    ]
+    parent = node.parent
+    num_loop_ancestors = 0
+    while parent is not None:
+        if parent.type in loop_types:
+            num_loop_ancestors += 1
+        parent = parent.parent
+    return num_loop_ancestors

mcpp-1.0.0/src/mcpp/config.py ADDED Viewed

@@ -0,0 +1,27 @@
+from typing import List
+from dataclasses import dataclass
+from pathlib import Path
+from hydra.core.config_store import ConfigStore
+@dataclass
+class PathConfig:
+    repo_root: Path
+    lib_root: Path
+    data_root: Path
+    out_root: Path
+    log_root: Path
+@dataclass
+class Config:
+    in_path: Path
+    out_path: Path
+    metrics: List[str]
+    paths: PathConfig
+cs = ConfigStore.instance()
+cs.store(name='mcpp.config', node=Config)

mcpp-1.0.0/src/mcpp/parse.py ADDED Viewed

@@ -0,0 +1,81 @@
+from dataclasses import dataclass
+from pathlib import Path
+from importlib.resources import files
+from tree_sitter import Language, Parser
+import tree_sitter_c as ts_c
+import tree_sitter_cpp as ts_cpp
+from mcpp.queries import Q_ERROR_NODE, Q_CALL_NAME, Q_IDENTIFIER
+LANGS = {
+    "c": Language(ts_c.language()),
+    "cpp": Language(ts_cpp.language())
+}
+class Sitter(object):
+    def __init__(self, lib_path: Path, *languages):
+        self.langs = {k:v for k, v in LANGS.items() if k in languages}
+        self.parser = {lang: self._init_parser(lang) for lang in languages}
+        self.queries = {}
+        self.queries = {"Q_ERROR_NODE": Q_ERROR_NODE}
+    def _init_parser(self, language: str):
+        parser = Parser()
+        parser.set_language(self.langs[language])
+        return parser
+    def parse_lang(self, source: str, lang: str):
+        return self.parser[lang].parse(bytes(source, "utf-8"))
+    def parse(self, source: str):
+        min_errors = None
+        best_tree = None
+        best_lang = None
+        for lang in self.langs.keys():
+            tree = self.parse_lang(source, lang)
+            num_errors = self._count_error_nodes(tree, lang)
+            if min_errors is None or num_errors < min_errors:
+                best_tree = tree
+                best_lang = lang
+                min_errors = num_errors
+        return best_tree, best_lang
+    def parse_file(self, path: Path):
+        with open(path, "r") as f:
+            return self.parse(f.read())
+    def _count_error_nodes(self, tree, lang):
+        query = self.langs[lang].query(self.queries["Q_ERROR_NODE"])
+        return len(query.captures(tree.root_node))
+    def add_queries(self, queries):
+        self.queries.update(queries)
+    def captures(self, query, node, lang):
+        lang = self.langs[lang]
+        return lang.query(self.queries[query]).captures(node)
+def get_call_names(sitter, root, lang):
+    """ Return all function call names. """
+    call_names = []
+    sitter.add_queries({"Q_CALL_NAME": Q_CALL_NAME})
+    for node, tag in sitter.captures("Q_CALL_NAME", root, lang):
+        if tag == "name":
+            call_names.append(node.text.decode())
+    return call_names
+def get_identifiers(sitter, root, lang, filter=None):
+    """ Return all identifier names, optionally filtered by list of known function names. """
+    identifiers = []
+    sitter.add_queries({"Q_IDENTIFIER": Q_IDENTIFIER})
+    for node, _ in sitter.captures("Q_IDENTIFIER", root, lang):
+        identifier = node.text.decode()
+        if filter is None or identifier not in filter:
+            identifiers.append(identifier)
+    return identifiers

mcpp-1.0.0/src/mcpp/queries.py ADDED Viewed

@@ -0,0 +1,70 @@
+Q_ERROR_NODE = """
+(ERROR) @error_node
+"""
+Q_FOR_STMT = """
+(for_statement) @for_stmt
+"""
+Q_DO_STMT = """
+(do_statement) @do_stmt
+"""
+Q_WHILE_STMT = """
+(while_statement) @while_stmt
+"""
+Q_IF_STMT = """
+(if_statement) @if_stmt
+"""
+Q_SWITCH_STMT = """
+(switch_statement) @switch_stmt
+"""
+Q_CONDITION = """
+(_
+    condition: ((_) @condition)
+) @control_stmnt
+"""
+Q_BINARY_EXPR = """
+(binary_expression) @binary_expression
+"""
+Q_CALL_NAME = """
+(call_expression
+    function: ((identifier) @name)
+) @call
+"""
+Q_ARGLIST = """
+(call_expression
+    arguments: ((argument_list) @args)
+) @call
+"""
+Q_IDENTIFIER = """
+(identifier) @variable
+"""
+Q_FUNCTION_PARAMETER = """
+(parameter_declaration) @param
+"""
+Q_POINTER_EXPR = """
+(pointer_expression) @pointer
+"""
+Q_ASSIGNMENT_EXPR = """
+(assignment_expression) @assignment
+"""
+Q_IF_WITHOUT_ELSE = """
+(if_statement
+    condition: ((_) @if)
+    consequence: ((_) @then)
+    !alternative
+) @if_stmt
+"""

mcpp-1.0.0/src/mcpp/vulnerability.py ADDED Viewed

@@ -0,0 +1,272 @@
+from collections import Counter
+import threading
+from mcpp.parse import Sitter, get_identifiers
+from mcpp.queries import Q_ARGLIST, Q_IDENTIFIER, Q_FUNCTION_PARAMETER, \
+    Q_POINTER_EXPR, Q_ASSIGNMENT_EXPR, Q_BINARY_EXPR, Q_CALL_NAME, \
+    Q_IF_STMT, Q_SWITCH_STMT, Q_DO_STMT, Q_WHILE_STMT, Q_FOR_STMT, Q_CONDITION, \
+    Q_IF_WITHOUT_ELSE
+def v1(root, sitter, lang, calls=None):
+    """
+    V1: number of variables as parameters for callee functions
+    """
+    sitter.add_queries({
+        "Q_ARGLIST": Q_ARGLIST
+    })
+    vars_in_calls = []
+    arg_lists = [m for m, tag in sitter.captures("Q_ARGLIST", root, lang) if tag == "args"]
+    for arg_list in arg_lists:
+        variables = get_identifiers(sitter, arg_list, lang, filter=calls)
+        vars_in_calls.extend(variables)
+    return {
+        "V1": len(vars_in_calls)
+    }
+def v2(root, sitter, lang, calls=None):
+    """
+    V2: number of variables as parameters for callee functions
+    """
+    sitter.add_queries({
+        "Q_FUNCTION_PARAMETER": Q_FUNCTION_PARAMETER
+    })
+    params = sitter.captures("Q_FUNCTION_PARAMETER", root, lang)
+    return {
+        "V2": len(params)
+    }
+def v3_v4(root, sitter, lang, calls=None):
+    """
+    V3: number of pointer arithmetic operations
+    V4: number of variables involved in pointer arithmetics
+    """
+    sitter.add_queries({
+        "Q_POINTER_EXPR": Q_POINTER_EXPR
+    })
+    arith_ops = [
+        "+", "++", "+=",
+        "-", "--", "-=",
+        "*=",  # * excluded (same as pointer reference)
+        "/", "/=",
+        "^", "^=",
+        "&=",  # & excluded (same as pointer dereference)
+        "|", "|="
+    ]
+    pointer_arith = []
+    pointer_arith_vars = []
+    for pointer, _ in sitter.captures("Q_POINTER_EXPR", root, lang):
+        if any(arith in pointer.parent.text.decode() for arith in arith_ops):
+            pointer_arith.append(pointer)
+            variables = get_identifiers(sitter, pointer.parent, lang, filter=calls)
+            pointer_arith_vars.extend(variables)
+    return {
+        "V3": len(pointer_arith),
+        "V4": len(pointer_arith_vars)
+    }
+def v5(root, sitter, lang, calls=None):
+    """
+    V5: maximum number of pointer arithmetic operations a variable is involved in
+    """
+    sitter.add_queries({
+        "Q_BINARY_EXPR": Q_BINARY_EXPR,
+        "Q_ASSIGNMENT_EXPR": Q_ASSIGNMENT_EXPR,
+        "Q_CALL_NAME": Q_CALL_NAME
+    })
+    arith_ops = [
+        "+", "++", "+=",
+        "-", "--", "-=",
+        "*", "*=",
+        "/", "/="
+    ]
+    var_count = Counter()
+    candidates = sitter.captures("Q_BINARY_EXPR", root, lang) + sitter.captures("Q_ASSIGNMENT_EXPR", root, lang)
+    for node, _ in candidates:
+        if len(node.children) != 3:
+            continue
+        op_text = node.children[1].text.decode()
+        if any(arith in op_text for arith in arith_ops):
+            variables = get_identifiers(sitter, node, lang, filter=calls)
+            var_count.update(variables)
+    if len(var_count) > 0:
+        max_count = var_count.most_common(1)[0][1]
+    else:
+        max_count = 0
+    return {
+        "V5": max_count
+    }
+def v6_v7(root, sitter, lang, calls=None):
+    """
+    V6: number of nested control structures
+    V7: maximum level of control nesting
+    """
+    queries = {
+        "Q_IF_STMT": Q_IF_STMT,
+        "Q_SWITCH_STMT": Q_SWITCH_STMT,
+        "Q_DO_STMT": Q_DO_STMT,
+        "Q_WHILE_STMT": Q_WHILE_STMT,
+        "Q_FOR_STMT": Q_FOR_STMT
+    }
+    sitter.add_queries(queries)
+    nested_controls = []
+    max_nesting_level = 0
+    for q in queries.keys():
+        for node, _ in sitter.captures(q, root, lang):
+            nesting_level = _control_nesting_level(node)
+            if nesting_level > 0:
+                nested_controls.append(node)
+                max_nesting_level = max(max_nesting_level, nesting_level)
+    return {
+        "V6": len(nested_controls),
+        "V7": max_nesting_level
+    }
+def _control_nesting_level(node):
+    control_types = [
+        "if_statement",
+        "switch_statement",
+        "do_statement",
+        "while_statement",
+        "for_statement"
+    ]
+    parent = node.parent
+    num_control_ancestors = 0
+    while parent is not None:
+        if parent.type in control_types:
+            num_control_ancestors += 1
+        parent = parent.parent
+    return num_control_ancestors
+def v8(root, sitter, lang, calls=None):
+    """
+    V8: maximum number of control-dependent control structures
+    """
+    queries = {
+        "Q_IF_STMT": Q_IF_STMT,
+        "Q_SWITCH_STMT": Q_SWITCH_STMT,
+        "Q_DO_STMT": Q_DO_STMT,
+        "Q_WHILE_STMT": Q_WHILE_STMT,
+        "Q_FOR_STMT": Q_FOR_STMT,
+        "Q_CONDITION": Q_CONDITION
+    }
+    sitter.add_queries(queries)
+    # count dependent controls under another control: key = start_byte of parent in function
+    control_dependent_controls = Counter()
+    threads = []
+    thread_lock = threading.Lock()
+    for q in queries.keys():
+        t = threading.Thread(target=_v8_single_query,
+                             args=(root, sitter, lang, calls, q,
+                                   control_dependent_controls, thread_lock))
+        t.start()
+        threads.append(t)
+    for t in threads:
+        t.join()
+    return {
+        "V8": max([0] + list(control_dependent_controls.values()))
+    }
+def _v8_single_query(root, sitter, lang, calls, query, control_dependent_controls, thread_lock):
+    for node, _ in sitter.captures(query, root, lang):
+        parents = _traverse_parent_controls(node)
+        if len(parents) > 0:
+            with thread_lock:
+                control_dependent_controls[parents[-1].start_byte] += 1
+def _traverse_parent_controls(node):
+    """ Climb up the AST and emit all control nodes. """
+    control_types = [
+        "if_statement",
+        "switch_statement",
+        "do_statement",
+        "while_statement",
+        "for_statement"
+    ]
+    parent_controls = []
+    parent = node.parent
+    while parent is not None:
+        if parent.type in control_types:
+            parent_controls.append(parent)
+        parent = parent.parent
+    return parent_controls
+def v9(root, sitter, lang, calls=None):
+    """
+    V9: maximum number of data-dependent control structures
+    """
+    sitter.add_queries({
+        "Q_CONDITION": Q_CONDITION,
+        "Q_BINARY_EXPR": Q_BINARY_EXPR
+    })
+    logical_ops = [
+        "&", "&&",
+        "|", "||"
+    ]
+    conditions = sitter.captures("Q_CONDITION", root, lang)
+    var_count = Counter()
+    for condition, tag in conditions:
+        if tag == "condition":
+            bin_expr = sitter.captures("Q_BINARY_EXPR", condition, lang)
+            for expr, _ in bin_expr:
+                if len(expr.children) != 3:
+                    continue
+                left, op, right = expr.children
+                if op.text.decode() in logical_ops:
+                    var_count.update(get_identifiers(sitter, expr, lang, filter=calls))
+    return {
+        "V9": max([0] + list(var_count.values()))
+    }
+def v10(root, sitter, lang, calls=None):
+    """
+    V10: number of if statements without else
+    """
+    sitter.add_queries({
+        "Q_IF_WITHOUT_ELSE": Q_IF_WITHOUT_ELSE
+    })
+    if_without_else = sitter.captures("Q_IF_WITHOUT_ELSE", root, lang)
+    return {
+        "V10": len(if_without_else)
+    }
+def v11(root, sitter, lang, calls=None):
+    """
+    V11: number of variables in control structures (in each predicate)
+    """
+    sitter.add_queries({
+        "Q_CONDITION": Q_CONDITION
+    })
+    num_controlled_vars = 0
+    conditions = sitter.captures("Q_CONDITION", root, lang)
+    for condition, _ in conditions:
+        num_controlled_vars += len(get_identifiers(sitter, condition, lang, filter=calls))
+    return {
+        "V11": num_controlled_vars
+    }

mcpp-1.0.0/src/mcpp.egg-info/PKG-INFO ADDED Viewed

@@ -0,0 +1,152 @@
+Metadata-Version: 2.1
+Name: mcpp
+Version: 1.0.0
+Summary: McCabe++ (mcpp): cyclomatic complexity and other vulnerability-related code metrics
+Author-email: Lukas Pirch <lukas.pirch@tu-berlin.de>
+License: MIT License
+        Copyright (c) 2023 Lukas Pirch
+        Permission is hereby granted, free of charge, to any person obtaining a copy
+        of this software and associated documentation files (the "Software"), to deal
+        in the Software without restriction, including without limitation the rights
+        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+        copies of the Software, and to permit persons to whom the Software is
+        furnished to do so, subject to the following conditions:
+        The above copyright notice and this permission notice shall be included in all
+        copies or substantial portions of the Software.
+        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+        SOFTWARE.
+Keywords: vulnerability,code metric,static analysis
+Classifier: License :: OSI Approved :: MIT License
+Classifier: Programming Language :: Python
+Classifier: Programming Language :: Python :: 3
+Requires-Python: >=3.9
+Description-Content-Type: text/markdown
+License-File: LICENSE
+Requires-Dist: hydra-core>=1.3.2
+Requires-Dist: tree-sitter>=0.22.3
+Requires-Dist: tree-sitter-c>=0.21.4
+Requires-Dist: tree-sitter-cpp>=0.22.3
+Requires-Dist: tqdm>=4.66.4
+Requires-Dist: loguru>=0.7.2
+# McCabe++ (mcpp)
+<img src="https://github.com/LPirch/mcpp/blob/master/media/mcpp.jpeg?raw=true" height=400/>
+`mcpp` measures typical code complexity metrics like McCabe's cyclomatic
+complexity.
+The goal of this project is to provide a re-usable script to analyze C/C++
+source code and extract complexity metrics from it. The implemented metrics
+are taken from the [paper](https://xiaoningdu.github.io/assets/pdf/leopard.pdf)
+> LEOPARD: Identifying Vulnerable Code for Vulnerability Assessment through Program Metrics
+This tool is released as part of our research in vulnerability discovery and
+has been used in our paper
+> SoK: Where to Fuzz? Assessing Target Selection Methods in Directed Fuzzing"
+See also the corresponding [repo](https://github.com/wsbrg/crashminer).
+## Complexity Metrics
+| Dimension            | ID | Metric Description             |
+|----------------------|----|--------------------------------|
+| CD1: Function        | C1 | cyclomatic complexity          |
+| CD2: Loop Structures | C2 | number of loops                |
+|                      | C3 | number of nested loops         |
+|                      | C4 | maximum nesting level of loops |
+## Vulnerability Metrics
+| Dimension               | ID  | Metric Description                                                        |
+|-------------------------|-----|---------------------------------------------------------------------------|
+| VD1: Dependency         | V1  | number of parameter variables                                             |
+|                         | V2  | number of variables as parameters for callee function                     |
+| VD2: Pointers           | V3  | number of pointer arithmetic                                              |
+|                         | V4  | number of variables involved in pointer arithmetic                        |
+|                         | V5  | maximum number of pointer arithmetic operations a variable is involved in |
+| VD3: Control Structures | V6  | number of nested control structures                                       |
+|                         | V7  | maximum nesting level of control structures                               |
+|                         | V8  | maximum number of control-dependent control structures                    |
+|                         | V9  | maximum number of data-dependent control structures                       |
+|                         | V10 | number of if structures without else                                      |
+|                         | V11 | number of variables involved in control predicates                        |
+## Setup
+Build a docker container which performs the setup automatically or run the
+installation on your local machine:
+```sh
+pip install .
+```
+> Note: It is recommended to install packages in virtual environments.
+## Usage
+### From Python
+Simply import `mcpp` and then use the extract function (or one of its variants).
+```python
+from pathlib import Path
+from mcpp import extract
+input_dir = Path("some/dir")
+in_files = list(input_dir.glob("**/*.c"))
+result = extract(in_files)
+# to extract only a subset of the metrics
+result = extract(in_files, ["V1", "C3"])
+# full list of metrics:
+from mcpp import METRICS
+print(list(METRICS.keys()))
+```
+### CLI
+Configuration parameters can be changed in `config.yaml` or directly on the CLI
+with e.g. `mcpp paths.out_root=some/dir`.
+Using all defaults:
+```sh
+mcpp                # with default params like input directory, see config.yaml
+```
+Changing params from command line:
+```sh
+mcpp in_path=/some/dir/single_source out_path=single_source_metrics.json
+mcpp metrics=\[C1,C2,V4\]
+```
+Or by passing a changed `config.yaml`:
+- `-cp` (config_path) specifies the absolute path to the directory where the config file is located
+- `-cn` (config_name) specifies the name of the config file
+```sh
+mcpp -cp /some/other/dir -cn myconfig.yaml
+```
+Try out the example:
+```sh
+mcpp in_path=examples/data/source paths.out_root=examples/data-out
+cat examples/data-out/complexity.json
+```

mcpp-1.0.0/src/mcpp.egg-info/SOURCES.txt ADDED Viewed

@@ -0,0 +1,19 @@
+LICENSE
+README.md
+pyproject.toml
+requirements.txt
+src/mcpp/__init__.py
+src/mcpp/__main__.py
+src/mcpp/complexity.py
+src/mcpp/config.py
+src/mcpp/parse.py
+src/mcpp/queries.py
+src/mcpp/vulnerability.py
+src/mcpp.egg-info/PKG-INFO
+src/mcpp.egg-info/SOURCES.txt
+src/mcpp.egg-info/dependency_links.txt
+src/mcpp.egg-info/entry_points.txt
+src/mcpp.egg-info/requires.txt
+src/mcpp.egg-info/top_level.txt
+src/mcpp/assets/__init__.py
+src/mcpp/assets/config.yaml

mcpp-1.0.0/src/mcpp.egg-info/dependency_links.txt ADDED Viewed

	@@ -0,0 +1 @@
1	+

mcpp-1.0.0/src/mcpp.egg-info/entry_points.txt ADDED Viewed

	@@ -0,0 +1,2 @@
1	+ [console_scripts]
2	+ mcpp = mcpp.__main__:main

mcpp-1.0.0/src/mcpp.egg-info/requires.txt ADDED Viewed

@@ -0,0 +1,6 @@
+hydra-core>=1.3.2
+tree-sitter>=0.22.3
+tree-sitter-c>=0.21.4
+tree-sitter-cpp>=0.22.3
+tqdm>=4.66.4
+loguru>=0.7.2

mcpp-1.0.0/src/mcpp.egg-info/top_level.txt ADDED Viewed

	@@ -0,0 +1 @@
1	+ mcpp