pystaar 1.0.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (70) hide show
  1. pystaar-1.0.0/LICENSE +6 -0
  2. pystaar-1.0.0/PKG-INFO +104 -0
  3. pystaar-1.0.0/README.md +71 -0
  4. pystaar-1.0.0/pyproject.toml +61 -0
  5. pystaar-1.0.0/setup.cfg +4 -0
  6. pystaar-1.0.0/src/pystaar/__init__.py +104 -0
  7. pystaar-1.0.0/src/pystaar/_data/DATA_SOURCE.md +144 -0
  8. pystaar-1.0.0/src/pystaar/_data/example_ai_cov_dense_s1_b1.csv +164 -0
  9. pystaar-1.0.0/src/pystaar/_data/example_ai_cov_dense_s1_b2.csv +164 -0
  10. pystaar-1.0.0/src/pystaar/_data/example_ai_cov_dense_s2_b1.csv +164 -0
  11. pystaar-1.0.0/src/pystaar/_data/example_ai_cov_dense_s2_b2.csv +164 -0
  12. pystaar-1.0.0/src/pystaar/_data/example_ai_cov_sparse_s1_b1.csv +164 -0
  13. pystaar-1.0.0/src/pystaar/_data/example_ai_cov_sparse_s1_b2.csv +164 -0
  14. pystaar-1.0.0/src/pystaar/_data/example_ai_cov_sparse_s2_b1.csv +164 -0
  15. pystaar-1.0.0/src/pystaar/_data/example_ai_cov_sparse_s2_b2.csv +164 -0
  16. pystaar-1.0.0/src/pystaar/_data/example_ai_pop_groups.csv +10001 -0
  17. pystaar-1.0.0/src/pystaar/_data/example_ai_pop_weights_1_1.csv +4 -0
  18. pystaar-1.0.0/src/pystaar/_data/example_ai_pop_weights_1_25.csv +4 -0
  19. pystaar-1.0.0/src/pystaar/_data/example_geno.mtx +7488 -0
  20. pystaar-1.0.0/src/pystaar/_data/example_glm_binary_spa_cov_filter.csv +164 -0
  21. pystaar-1.0.0/src/pystaar/_data/example_glmmkin_binary_spa_dense_XW.csv +4 -0
  22. pystaar-1.0.0/src/pystaar/_data/example_glmmkin_binary_spa_dense_XXWX_inv.csv +10001 -0
  23. pystaar-1.0.0/src/pystaar/_data/example_glmmkin_binary_spa_dense_cov_filter.csv +164 -0
  24. pystaar-1.0.0/src/pystaar/_data/example_glmmkin_binary_spa_dense_fitted.csv +10001 -0
  25. pystaar-1.0.0/src/pystaar/_data/example_glmmkin_binary_spa_dense_scaled_residuals.csv +10001 -0
  26. pystaar-1.0.0/src/pystaar/_data/example_glmmkin_binary_spa_sparse_XW.csv +4 -0
  27. pystaar-1.0.0/src/pystaar/_data/example_glmmkin_binary_spa_sparse_XXWX_inv.csv +10001 -0
  28. pystaar-1.0.0/src/pystaar/_data/example_glmmkin_binary_spa_sparse_cov_filter.csv +164 -0
  29. pystaar-1.0.0/src/pystaar/_data/example_glmmkin_binary_spa_sparse_fitted.csv +10001 -0
  30. pystaar-1.0.0/src/pystaar/_data/example_glmmkin_binary_spa_sparse_scaled_residuals.csv +10001 -0
  31. pystaar-1.0.0/src/pystaar/_data/example_glmmkin_cov.csv +164 -0
  32. pystaar-1.0.0/src/pystaar/_data/example_glmmkin_cov_cond_dense.csv +164 -0
  33. pystaar-1.0.0/src/pystaar/_data/example_glmmkin_cov_cond_sparse.csv +164 -0
  34. pystaar-1.0.0/src/pystaar/_data/example_glmmkin_cov_rare_maf_0_01.csv +154 -0
  35. pystaar-1.0.0/src/pystaar/_data/example_glmmkin_scaled_residuals.csv +10001 -0
  36. pystaar-1.0.0/src/pystaar/_data/example_kins_dense.mtx +22502 -0
  37. pystaar-1.0.0/src/pystaar/_data/example_kins_sparse.mtx +22502 -0
  38. pystaar-1.0.0/src/pystaar/_data/example_pheno_related.csv +10001 -0
  39. pystaar-1.0.0/src/pystaar/_data/example_pheno_unrelated.csv +10001 -0
  40. pystaar-1.0.0/src/pystaar/_data/example_phred.csv +164 -0
  41. pystaar-1.0.0/src/pystaar/_data/nonexample601_geno.mtx +7638 -0
  42. pystaar-1.0.0/src/pystaar/_data/nonexample601_kins_dense.mtx +22502 -0
  43. pystaar-1.0.0/src/pystaar/_data/nonexample601_kins_sparse.mtx +22502 -0
  44. pystaar-1.0.0/src/pystaar/_data/nonexample601_pheno_related.csv +10001 -0
  45. pystaar-1.0.0/src/pystaar/_data/nonexample601_pheno_unrelated.csv +10001 -0
  46. pystaar-1.0.0/src/pystaar/_data/nonexample601_phred.csv +145 -0
  47. pystaar-1.0.0/src/pystaar/_data/nonexample602_geno.mtx +6176 -0
  48. pystaar-1.0.0/src/pystaar/_data/nonexample602_kins_dense.mtx +22502 -0
  49. pystaar-1.0.0/src/pystaar/_data/nonexample602_kins_sparse.mtx +22502 -0
  50. pystaar-1.0.0/src/pystaar/_data/nonexample602_pheno_related.csv +10001 -0
  51. pystaar-1.0.0/src/pystaar/_data/nonexample602_pheno_unrelated.csv +10001 -0
  52. pystaar-1.0.0/src/pystaar/_data/nonexample602_phred.csv +148 -0
  53. pystaar-1.0.0/src/pystaar/_data/r_cov.csv +164 -0
  54. pystaar-1.0.0/src/pystaar/_data/r_scaled_residuals.csv +10001 -0
  55. pystaar-1.0.0/src/pystaar/data.py +202 -0
  56. pystaar-1.0.0/src/pystaar/models.py +541 -0
  57. pystaar-1.0.0/src/pystaar/staar_core.py +1873 -0
  58. pystaar-1.0.0/src/pystaar/staar_stats.py +147 -0
  59. pystaar-1.0.0/src/pystaar/workflows.py +1454 -0
  60. pystaar-1.0.0/src/pystaar.egg-info/PKG-INFO +104 -0
  61. pystaar-1.0.0/src/pystaar.egg-info/SOURCES.txt +68 -0
  62. pystaar-1.0.0/src/pystaar.egg-info/dependency_links.txt +1 -0
  63. pystaar-1.0.0/src/pystaar.egg-info/requires.txt +7 -0
  64. pystaar-1.0.0/src/pystaar.egg-info/top_level.txt +1 -0
  65. pystaar-1.0.0/tests/test_api_contract.py +53 -0
  66. pystaar-1.0.0/tests/test_data.py +24 -0
  67. pystaar-1.0.0/tests/test_models.py +102 -0
  68. pystaar-1.0.0/tests/test_staar_core.py +73 -0
  69. pystaar-1.0.0/tests/test_staar_stats.py +33 -0
  70. pystaar-1.0.0/tests/test_workflows.py +1121 -0
pystaar-1.0.0/LICENSE ADDED
@@ -0,0 +1,6 @@
1
+ Copyright (c) 2026 pySTAAR contributors.
2
+ All rights reserved.
3
+
4
+ This software is proprietary and confidential. No permission is granted to use,
5
+ copy, modify, merge, publish, distribute, sublicense, or sell this software
6
+ without prior written authorization from the copyright holder.
pystaar-1.0.0/PKG-INFO ADDED
@@ -0,0 +1,104 @@
1
+ Metadata-Version: 2.4
2
+ Name: pystaar
3
+ Version: 1.0.0
4
+ Summary: Python migration of the STAAR R package
5
+ Author: STAAR migration
6
+ License-Expression: LicenseRef-Proprietary
7
+ Project-URL: Homepage, https://github.com/xiaozhouwang/pySTAAR
8
+ Project-URL: Repository, https://github.com/xiaozhouwang/pySTAAR
9
+ Project-URL: Documentation, https://github.com/xiaozhouwang/pySTAAR/tree/main/docs
10
+ Project-URL: Issues, https://github.com/xiaozhouwang/pySTAAR/issues
11
+ Keywords: staar,genomics,genetics,rare-variant,association-testing
12
+ Classifier: Development Status :: 5 - Production/Stable
13
+ Classifier: Intended Audience :: Science/Research
14
+ Classifier: Operating System :: OS Independent
15
+ Classifier: Programming Language :: Python :: 3
16
+ Classifier: Programming Language :: Python :: 3 :: Only
17
+ Classifier: Programming Language :: Python :: 3.9
18
+ Classifier: Programming Language :: Python :: 3.10
19
+ Classifier: Programming Language :: Python :: 3.11
20
+ Classifier: Programming Language :: Python :: 3.12
21
+ Classifier: Programming Language :: Python :: 3.13
22
+ Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
23
+ Requires-Python: >=3.9
24
+ Description-Content-Type: text/markdown
25
+ License-File: LICENSE
26
+ Requires-Dist: numpy>=1.24
27
+ Requires-Dist: scipy>=1.10
28
+ Requires-Dist: pandas>=2.0
29
+ Requires-Dist: pyyaml>=6.0
30
+ Provides-Extra: dev
31
+ Requires-Dist: pytest>=7.0; extra == "dev"
32
+ Dynamic: license-file
33
+
34
+ # pySTAAR
35
+
36
+ Python 版 STAAR(R 包)迁移项目,面向中文统计遗传/基因组分析用户。
37
+
38
+ For English docs, see [`docs/README.md`](docs/README.md).
39
+
40
+ ## 项目定位
41
+
42
+ - 已完成计划内功能迁移(到 `STAAR-56`)。
43
+ - 默认 workflow 入口覆盖:STAAR、条件分析、Binary SPA、单变异得分检验、AI-STAAR。
44
+ - 当前 parity 基线为 pure-Python 路径(related workflows 不依赖预计算 R 协方差文件)。
45
+
46
+ ## 快速安装
47
+
48
+ 普通用户(发布版):
49
+
50
+ ```bash
51
+ pip install pystaar
52
+ ```
53
+
54
+ 本仓库开发模式:
55
+
56
+ ```bash
57
+ pip install -e '.[dev]'
58
+ ```
59
+
60
+ ## 快速运行
61
+
62
+ ```python
63
+ from pystaar import staar_unrelated_glm
64
+
65
+ res = staar_unrelated_glm(
66
+ dataset="example",
67
+ seed=600,
68
+ rare_maf_cutoff=0.05,
69
+ )
70
+ print("STAAR-O:", res["results_STAAR_O"])
71
+ ```
72
+
73
+ ## R 用户迁移入口
74
+
75
+ - 完整迁移说明:[`docs/migration_from_r.md`](docs/migration_from_r.md)
76
+ - 15 分钟迁移清单:[`docs/migration_r_quickstart_cn.md`](docs/migration_r_quickstart_cn.md)
77
+ - 数据目录模板:[`docs/data_directory_template_cn.md`](docs/data_directory_template_cn.md)
78
+
79
+ ## 文档导航
80
+
81
+ - 中文快速入门:[`docs/README_CN.md`](docs/README_CN.md)
82
+ - 英文快速入门:[`docs/README.md`](docs/README.md)
83
+ - 安装与环境:[`docs/installation.md`](docs/installation.md)
84
+ - 性能对比总览(Python vs R):[`docs/performance_comparison.md`](docs/performance_comparison.md)
85
+ - 性能口径说明:官方跨平台结论以 OpenBLAS backend 为准;macOS Accelerate 本地参考见 `examples/1kg_parity/README.md`。
86
+ - 本地 1KG 对比示例(数据级 + 模拟完整 workflow):[`examples/1kg_parity/README.md`](examples/1kg_parity/README.md)
87
+ - 教程:
88
+ - [`docs/tutorials/01_basic_staar.md`](docs/tutorials/01_basic_staar.md)
89
+ - [`docs/tutorials/02_binary_spa.md`](docs/tutorials/02_binary_spa.md)
90
+ - [`docs/tutorials/03_related_samples.md`](docs/tutorials/03_related_samples.md)
91
+ - [`docs/tutorials/04_conditional.md`](docs/tutorials/04_conditional.md)
92
+ - [`docs/tutorials/05_ai_staar.md`](docs/tutorials/05_ai_staar.md)
93
+ - API 文档:
94
+ - [`docs/api/null_models.md`](docs/api/null_models.md)
95
+ - [`docs/api/staar_functions.md`](docs/api/staar_functions.md)
96
+ - [`docs/api/output_fields.md`](docs/api/output_fields.md)
97
+ - [`docs/api/utilities.md`](docs/api/utilities.md)
98
+ - [`docs/api/stability.md`](docs/api/stability.md)
99
+ - 变更记录:[`CHANGELOG.md`](CHANGELOG.md)
100
+
101
+ ## 一致性说明
102
+
103
+ - 历史偏差记录 `DEV-001` 已关闭,仅保留历史背景。
104
+ - 当前状态与 release 口径以 [`reports/summary.md`](reports/summary.md) 和 [`reports/deviations.md`](reports/deviations.md) 为准。
@@ -0,0 +1,71 @@
1
+ # pySTAAR
2
+
3
+ Python 版 STAAR(R 包)迁移项目,面向中文统计遗传/基因组分析用户。
4
+
5
+ For English docs, see [`docs/README.md`](docs/README.md).
6
+
7
+ ## 项目定位
8
+
9
+ - 已完成计划内功能迁移(到 `STAAR-56`)。
10
+ - 默认 workflow 入口覆盖:STAAR、条件分析、Binary SPA、单变异得分检验、AI-STAAR。
11
+ - 当前 parity 基线为 pure-Python 路径(related workflows 不依赖预计算 R 协方差文件)。
12
+
13
+ ## 快速安装
14
+
15
+ 普通用户(发布版):
16
+
17
+ ```bash
18
+ pip install pystaar
19
+ ```
20
+
21
+ 本仓库开发模式:
22
+
23
+ ```bash
24
+ pip install -e '.[dev]'
25
+ ```
26
+
27
+ ## 快速运行
28
+
29
+ ```python
30
+ from pystaar import staar_unrelated_glm
31
+
32
+ res = staar_unrelated_glm(
33
+ dataset="example",
34
+ seed=600,
35
+ rare_maf_cutoff=0.05,
36
+ )
37
+ print("STAAR-O:", res["results_STAAR_O"])
38
+ ```
39
+
40
+ ## R 用户迁移入口
41
+
42
+ - 完整迁移说明:[`docs/migration_from_r.md`](docs/migration_from_r.md)
43
+ - 15 分钟迁移清单:[`docs/migration_r_quickstart_cn.md`](docs/migration_r_quickstart_cn.md)
44
+ - 数据目录模板:[`docs/data_directory_template_cn.md`](docs/data_directory_template_cn.md)
45
+
46
+ ## 文档导航
47
+
48
+ - 中文快速入门:[`docs/README_CN.md`](docs/README_CN.md)
49
+ - 英文快速入门:[`docs/README.md`](docs/README.md)
50
+ - 安装与环境:[`docs/installation.md`](docs/installation.md)
51
+ - 性能对比总览(Python vs R):[`docs/performance_comparison.md`](docs/performance_comparison.md)
52
+ - 性能口径说明:官方跨平台结论以 OpenBLAS backend 为准;macOS Accelerate 本地参考见 `examples/1kg_parity/README.md`。
53
+ - 本地 1KG 对比示例(数据级 + 模拟完整 workflow):[`examples/1kg_parity/README.md`](examples/1kg_parity/README.md)
54
+ - 教程:
55
+ - [`docs/tutorials/01_basic_staar.md`](docs/tutorials/01_basic_staar.md)
56
+ - [`docs/tutorials/02_binary_spa.md`](docs/tutorials/02_binary_spa.md)
57
+ - [`docs/tutorials/03_related_samples.md`](docs/tutorials/03_related_samples.md)
58
+ - [`docs/tutorials/04_conditional.md`](docs/tutorials/04_conditional.md)
59
+ - [`docs/tutorials/05_ai_staar.md`](docs/tutorials/05_ai_staar.md)
60
+ - API 文档:
61
+ - [`docs/api/null_models.md`](docs/api/null_models.md)
62
+ - [`docs/api/staar_functions.md`](docs/api/staar_functions.md)
63
+ - [`docs/api/output_fields.md`](docs/api/output_fields.md)
64
+ - [`docs/api/utilities.md`](docs/api/utilities.md)
65
+ - [`docs/api/stability.md`](docs/api/stability.md)
66
+ - 变更记录:[`CHANGELOG.md`](CHANGELOG.md)
67
+
68
+ ## 一致性说明
69
+
70
+ - 历史偏差记录 `DEV-001` 已关闭,仅保留历史背景。
71
+ - 当前状态与 release 口径以 [`reports/summary.md`](reports/summary.md) 和 [`reports/deviations.md`](reports/deviations.md) 为准。
@@ -0,0 +1,61 @@
1
+ [build-system]
2
+ requires = ["setuptools>=77"]
3
+ build-backend = "setuptools.build_meta"
4
+
5
+ [project]
6
+ name = "pystaar"
7
+ version = "1.0.0"
8
+ description = "Python migration of the STAAR R package"
9
+ readme = "README.md"
10
+ requires-python = ">=3.9"
11
+ license = "LicenseRef-Proprietary"
12
+ license-files = ["LICENSE"]
13
+ authors = [
14
+ { name = "STAAR migration" }
15
+ ]
16
+ keywords = [
17
+ "staar",
18
+ "genomics",
19
+ "genetics",
20
+ "rare-variant",
21
+ "association-testing",
22
+ ]
23
+ classifiers = [
24
+ "Development Status :: 5 - Production/Stable",
25
+ "Intended Audience :: Science/Research",
26
+ "Operating System :: OS Independent",
27
+ "Programming Language :: Python :: 3",
28
+ "Programming Language :: Python :: 3 :: Only",
29
+ "Programming Language :: Python :: 3.9",
30
+ "Programming Language :: Python :: 3.10",
31
+ "Programming Language :: Python :: 3.11",
32
+ "Programming Language :: Python :: 3.12",
33
+ "Programming Language :: Python :: 3.13",
34
+ "Topic :: Scientific/Engineering :: Bio-Informatics",
35
+ ]
36
+ dependencies = [
37
+ "numpy>=1.24",
38
+ "scipy>=1.10",
39
+ "pandas>=2.0",
40
+ "pyyaml>=6.0",
41
+ ]
42
+
43
+ [project.urls]
44
+ Homepage = "https://github.com/xiaozhouwang/pySTAAR"
45
+ Repository = "https://github.com/xiaozhouwang/pySTAAR"
46
+ Documentation = "https://github.com/xiaozhouwang/pySTAAR/tree/main/docs"
47
+ Issues = "https://github.com/xiaozhouwang/pySTAAR/issues"
48
+
49
+ [project.optional-dependencies]
50
+ dev = [
51
+ "pytest>=7.0",
52
+ ]
53
+
54
+ [tool.setuptools]
55
+ package-dir = {"" = "src"}
56
+
57
+ [tool.setuptools.packages.find]
58
+ where = ["src"]
59
+
60
+ [tool.setuptools.package-data]
61
+ pystaar = ["_data/*"]
@@ -0,0 +1,4 @@
1
+ [egg_info]
2
+ tag_build =
3
+ tag_date = 0
4
+
@@ -0,0 +1,104 @@
1
+ """Python migration of the STAAR R package.
2
+
3
+ Public API includes:
4
+ - R-compatible names (e.g., ``STAAR`` / ``AI_STAAR`` / ``CCT``).
5
+ - Python workflow entry points used by parity scenarios.
6
+ """
7
+
8
+ from .models import (
9
+ fit_null_glm,
10
+ fit_null_glm_binary_spa,
11
+ fit_null_glmmkin,
12
+ fit_null_glmmkin_binary_spa,
13
+ )
14
+ from .staar_core import (
15
+ _get_eigensolver_runtime_info,
16
+ ai_staar,
17
+ indiv_score_test_region,
18
+ indiv_score_test_region_cond,
19
+ matrix_flip,
20
+ staar,
21
+ staar_binary_spa,
22
+ staar_cond,
23
+ )
24
+ from .staar_stats import cct
25
+ from .workflows import (
26
+ ai_staar_related_dense_glmmkin,
27
+ ai_staar_related_dense_glmmkin_find_weight,
28
+ ai_staar_related_sparse_glmmkin,
29
+ ai_staar_related_sparse_glmmkin_find_weight,
30
+ ai_staar_unrelated_glm,
31
+ ai_staar_unrelated_glm_find_weight,
32
+ clear_runtime_caches,
33
+ get_runtime_cache_info,
34
+ indiv_score_related_dense_glmmkin,
35
+ indiv_score_related_dense_glmmkin_cond,
36
+ indiv_score_related_sparse_glmmkin,
37
+ indiv_score_related_sparse_glmmkin_cond,
38
+ indiv_score_unrelated_glm,
39
+ indiv_score_unrelated_glm_cond,
40
+ staar_related_dense_binary_spa,
41
+ staar_related_dense_glmmkin,
42
+ staar_related_dense_glmmkin_cond,
43
+ staar_related_sparse_binary_spa,
44
+ staar_related_sparse_glmmkin,
45
+ staar_related_sparse_glmmkin_cond,
46
+ staar_unrelated_binary_spa,
47
+ staar_unrelated_glm,
48
+ staar_unrelated_glm_cond,
49
+ )
50
+
51
+ # R-compatible aliases from STAAR NAMESPACE exports.
52
+ CCT = cct
53
+ fit_null_glm_Binary_SPA = fit_null_glm_binary_spa
54
+ fit_null_glmmkin_Binary_SPA = fit_null_glmmkin_binary_spa
55
+ STAAR = staar
56
+ STAAR_cond = staar_cond
57
+ Indiv_Score_Test_Region = indiv_score_test_region
58
+ Indiv_Score_Test_Region_cond = indiv_score_test_region_cond
59
+ STAAR_Binary_SPA = staar_binary_spa
60
+ AI_STAAR = ai_staar
61
+
62
+ __all__ = [
63
+ "CCT",
64
+ "fit_null_glm",
65
+ "fit_null_glmmkin",
66
+ "fit_null_glm_Binary_SPA",
67
+ "fit_null_glmmkin_Binary_SPA",
68
+ "matrix_flip",
69
+ "STAAR",
70
+ "STAAR_cond",
71
+ "STAAR_Binary_SPA",
72
+ "Indiv_Score_Test_Region",
73
+ "Indiv_Score_Test_Region_cond",
74
+ "AI_STAAR",
75
+ "staar_unrelated_glm",
76
+ "staar_related_sparse_glmmkin",
77
+ "staar_related_dense_glmmkin",
78
+ "staar_unrelated_glm_cond",
79
+ "staar_related_sparse_glmmkin_cond",
80
+ "staar_related_dense_glmmkin_cond",
81
+ "staar_unrelated_binary_spa",
82
+ "staar_related_sparse_binary_spa",
83
+ "staar_related_dense_binary_spa",
84
+ "indiv_score_unrelated_glm",
85
+ "indiv_score_related_sparse_glmmkin",
86
+ "indiv_score_related_dense_glmmkin",
87
+ "indiv_score_unrelated_glm_cond",
88
+ "indiv_score_related_sparse_glmmkin_cond",
89
+ "indiv_score_related_dense_glmmkin_cond",
90
+ "ai_staar_unrelated_glm",
91
+ "ai_staar_related_sparse_glmmkin",
92
+ "ai_staar_related_dense_glmmkin",
93
+ "ai_staar_unrelated_glm_find_weight",
94
+ "ai_staar_related_sparse_glmmkin_find_weight",
95
+ "ai_staar_related_dense_glmmkin_find_weight",
96
+ "get_runtime_cache_info",
97
+ "clear_runtime_caches",
98
+ "get_eigensolver_runtime_info",
99
+ ]
100
+
101
+
102
+ def get_eigensolver_runtime_info():
103
+ """Return backend-aware eigensolver runtime selection metadata."""
104
+ return _get_eigensolver_runtime_info()
@@ -0,0 +1,144 @@
1
+ # Data Source and Fingerprints
2
+
3
+ - Document date: 2026-02-07
4
+ - Scope: files under `data/` used by Python parity workflows.
5
+
6
+ ## Source
7
+
8
+ - Upstream project: `https://github.com/xihaoli/STAAR`
9
+ - Upstream commit for baseline extraction: `9db9dd504905b9f469146f670e5f6dbe3e08d01a`
10
+ - Baseline source record: `baselines/SOURCE.md`
11
+ - Raw input used to generate `data/`: `baselines/example_sim_data.rds`
12
+ - Dataset type: simulated example dataset generated by baseline extraction scripts
13
+ - License context:
14
+ - Upstream package license: `GPL-3` (`../STAAR/DESCRIPTION`)
15
+ - License text available at `../STAAR/LICENSE.md`
16
+
17
+ ## Raw Input Checksum
18
+
19
+ | file | size_bytes | sha256 |
20
+ |---|---:|---|
21
+ | `baselines/example_sim_data.rds` | 1312814 | `89498efaa1140720d512e92b4fb80d5141dfbec174dca54df29d553ef7b92fe6` |
22
+ | `baselines/nonexample601_sim_data.rds` | 1307967 | `3104c42d8fff5cd75f824bb10e282366b593c6f22ad924885263bdee63513241` |
23
+ | `baselines/nonexample602_sim_data.rds` | 1304535 | `97f2757233fd25d4fcdb3171cdcb8f341aa330f65078ad4753137bf1de2d77fb` |
24
+
25
+ ## Exported Runtime Files
26
+
27
+ | file | size_bytes | sha256 |
28
+ |---|---:|---|
29
+ | `data/example_geno.mtx` | 76674 | `c7fe50fdb539ff71601b4542c78d5d829d67ec5f9932f1dde5830b6ad3692569` |
30
+ | `data/example_kins_sparse.mtx` | 300091 | `e6dbc9c0597345876e29317b22404d69f4b64915ecce9a02299439c399cd56c2` |
31
+ | `data/example_kins_dense.mtx` | 300091 | `e6dbc9c0597345876e29317b22404d69f4b64915ecce9a02299439c399cd56c2` |
32
+ | `data/example_phred.csv` | 27924 | `272e43cfe806a2a980de6293fb353ee4cbb5d66722d8ef765b59c4eecad6d0bb` |
33
+ | `data/example_pheno_unrelated.csv` | 381409 | `e0f4caf7398060761040d4f709b546e6531a6808b713ea5681e8921b5becd8f3` |
34
+ | `data/example_pheno_related.csv` | 429554 | `3f0a8ba32fe7bcf1c8eafc6666b64b3bbbf0b9a145f58b5a065442ff2f3c6396` |
35
+ | `data/nonexample601_geno.mtx` | 77624 | `2b9de0a4d95f22a9b7d7912e3189ccfbc170efd80566aac34b53f09a6182ec5d` |
36
+ | `data/nonexample601_kins_sparse.mtx` | 300091 | `5783413a57c288fac6964946e1c4bd5d9453ca3c87a24847fd4619da5e037243` |
37
+ | `data/nonexample601_kins_dense.mtx` | 300091 | `5783413a57c288fac6964946e1c4bd5d9453ca3c87a24847fd4619da5e037243` |
38
+ | `data/nonexample601_phred.csv` | 24686 | `576e4f60f5d42653c18eb157e2c3847710f97538cf81e5fa44320adbabb16741` |
39
+ | `data/nonexample601_pheno_unrelated.csv` | 381071 | `c0fb8111ed7128fddc2b13e2ed6986d4566a4732e43960a82f925942d72f0358` |
40
+ | `data/nonexample601_pheno_related.csv` | 429309 | `26a2cc47bfb5f902a37495bad541be760a81cf8972ed306f4c82f8c67b6665c4` |
41
+ | `data/nonexample602_geno.mtx` | 62711 | `1ca6579addfa90c0cca6cbd77b4d272f2bf1b9ed36a4c881111e191783f3af15` |
42
+ | `data/nonexample602_kins_sparse.mtx` | 300091 | `5783413a57c288fac6964946e1c4bd5d9453ca3c87a24847fd4619da5e037243` |
43
+ | `data/nonexample602_kins_dense.mtx` | 300091 | `5783413a57c288fac6964946e1c4bd5d9453ca3c87a24847fd4619da5e037243` |
44
+ | `data/nonexample602_phred.csv` | 25214 | `808a8ddd4780b409ed32cd62f5a1dafb6452db83494e96e3358d8953fc6675a1` |
45
+ | `data/nonexample602_pheno_unrelated.csv` | 381102 | `4a6a588cabee87e60064354b9811cdf9181a0f5c47e27418edb49b284091c965` |
46
+ | `data/nonexample602_pheno_related.csv` | 429283 | `8692f9b8051d11dbc4d6b7173fe890c395705c2e637fc22b0fb549ff0edad750` |
47
+ | `data/example_glmmkin_cov.csv` | 555749 | `f6893ba9f034143c0b8eab7fda2fe020e8e43ac0280fbd1bd3824fa20c3311e6` |
48
+ | `data/example_glmmkin_cov_rare_maf_0_01.csv` | 495166 | `c08479c923d9207da12c4e4d8faa64cf80390291223700b423eb15a1cf649ebc` |
49
+ | `data/example_glmmkin_scaled_residuals.csv` | 182493 | `55ebe906e9c63fb25e8bbc4bf4f4ac05966401af8a66bbba972cd48a4a3c29ca` |
50
+ | `data/example_glmmkin_cov_cond_sparse.csv` | 554222 | `a816f345142065cd3a95c3616f9c00e64f433e26ea57704b9dd1670b604130b2` |
51
+ | `data/example_glmmkin_cov_cond_dense.csv` | 555130 | `05ad3b8d587404a850182310edac55fdc11e48c3092ddb954653145250c43a55` |
52
+ | `data/example_glmmkin_binary_spa_sparse_fitted.csv` | 188525 | `8c827840b987e39d853de2ef505f5d730c6085e52339d8d2f549a136e12ffa93` |
53
+ | `data/example_glmmkin_binary_spa_sparse_scaled_residuals.csv` | 197780 | `5613078915493f8152ba5b10e0bd1b66aaa218166d912122500d867e008144cd` |
54
+ | `data/example_glmmkin_binary_spa_sparse_XW.csv` | 643616 | `d89f82fa11888f012abf4806651a2f4df1f0f2fccc5cb397e3879e39945c3ea6` |
55
+ | `data/example_glmmkin_binary_spa_sparse_XXWX_inv.csv` | 614222 | `cdf1b7cf565f5ce68f00f8b77bfc37d8151a5689dc233eee1957d09656e95ae1` |
56
+ | `data/example_glmmkin_binary_spa_dense_fitted.csv` | 188469 | `82e3baef157b2d90d14aa042f117e4d2871a9041079a0eb6390951f160f5429f` |
57
+ | `data/example_glmmkin_binary_spa_dense_scaled_residuals.csv` | 197716 | `084a4bbfa11a8827af81d2625d6e8eedfbee53654a843543679992bb662b56c5` |
58
+ | `data/example_glmmkin_binary_spa_dense_XW.csv` | 557284 | `846283deb00c944da332684ccddf38c36598fe21cddce5929cce908cfd2a2f3d` |
59
+ | `data/example_glmmkin_binary_spa_dense_XXWX_inv.csv` | 614480 | `3bb01dabd6258d4c56e75b275f4ca95faa4503e89d2c258e6e68a460c5e5a36e` |
60
+ | `data/example_glm_binary_spa_cov_filter.csv` | 571828 | `5ec312432b33f86ce4774053d70e40dea641a012699298c9411b4fe541909711` |
61
+ | `data/example_glmmkin_binary_spa_sparse_cov_filter.csv` | 571750 | `30d089be8804cedb8789e51bf4572ccbfcaf36dd4734d1bbc50ad71b9232271a` |
62
+ | `data/example_glmmkin_binary_spa_dense_cov_filter.csv` | 571635 | `2349ab58cab8516b45612d09fc46554e29f33d46f6550ba798051b136f4164d1` |
63
+ | `data/example_ai_pop_groups.csv` | 60012 | `8563a26d5075f4664322997f703b40e6314666744fd3051ea94a3354d17de17c` |
64
+ | `data/example_ai_pop_weights_1_1.csv` | 148 | `e710f254e379c81ec26bcb543d4c88048ae12df46855cfcac8ebbe4192dca2a0` |
65
+ | `data/example_ai_pop_weights_1_25.csv` | 148 | `454abfff9449d29ecd6ee26a9b18c66fa1da8c698c5e94ca6481497e1b92ec51` |
66
+ | `data/example_ai_cov_sparse_s1_b1.csv` | 567327 | `518a448dbf542b663161482a2bcfc1d583651ef59d53ac4088cce7b4357529e8` |
67
+ | `data/example_ai_cov_sparse_s1_b2.csv` | 567952 | `193b1d6d8e3947c30128878d1a7f624bd30e793fc18d2bb581bcc851f5c477dd` |
68
+ | `data/example_ai_cov_sparse_s2_b1.csv` | 515145 | `1a1b0f072cbfc51d172b50b0ecf402d67ac8f9663b27e872b6eed83a24a23c62` |
69
+ | `data/example_ai_cov_sparse_s2_b2.csv` | 517885 | `0a76a6891c6e911bfa4be1ce1d6165d5f306f5ff35fbd1d4b2ec14dddafe9895` |
70
+ | `data/example_ai_cov_dense_s1_b1.csv` | 567248 | `9db30d895f8cf289eb32b8ffe7fea715fb1fea7cb8574c139add0b3c29b7e470` |
71
+ | `data/example_ai_cov_dense_s1_b2.csv` | 568040 | `2801f7b4be47f16875ba5eb5175ab84f01dc728c99b0fc9d56bfd5fa5d2242fd` |
72
+ | `data/example_ai_cov_dense_s2_b1.csv` | 515109 | `5efca5a228e054ec6c887301c545e5e832098ab3c3e42cac75e156f52c3cc9ae` |
73
+ | `data/example_ai_cov_dense_s2_b2.csv` | 517701 | `d9ccdd07c0d7a7278f45bffa96f4d15f879a676da44d806247b6d33e708504df` |
74
+
75
+ ## Structured Fingerprints
76
+
77
+ - `data/example_pheno_unrelated.csv`: rows=10000, cols=3, columns=`Y,X1,X2`
78
+ - `data/example_pheno_related.csv`: rows=10000, cols=4, columns=`Y,X1,X2,id`
79
+ - `data/example_phred.csv`: rows=163, cols=10, columns=`Z1,Z2,Z3,Z4,Z5,Z6,Z7,Z8,Z9,Z10`
80
+ - `data/example_geno.mtx`: shape=`10000x163`, nnz=7486
81
+ - `data/example_kins_sparse.mtx`: shape=`10000x10000`, nnz=35000
82
+ - `data/example_kins_dense.mtx`: shape=`10000x10000`, nnz=35000
83
+ - `data/nonexample601_pheno_unrelated.csv`: rows=10000, cols=3, columns=`Y,X1,X2`
84
+ - `data/nonexample601_pheno_related.csv`: rows=10000, cols=4, columns=`Y,X1,X2,id`
85
+ - `data/nonexample601_phred.csv`: rows=144, cols=10, columns=`Z1,Z2,Z3,Z4,Z5,Z6,Z7,Z8,Z9,Z10`
86
+ - `data/nonexample601_geno.mtx`: shape=`10000x144`, nnz=7636
87
+ - `data/nonexample601_kins_sparse.mtx`: shape=`10000x10000`, nnz=35000
88
+ - `data/nonexample601_kins_dense.mtx`: shape=`10000x10000`, nnz=35000
89
+ - `data/nonexample602_pheno_unrelated.csv`: rows=10000, cols=3, columns=`Y,X1,X2`
90
+ - `data/nonexample602_pheno_related.csv`: rows=10000, cols=4, columns=`Y,X1,X2,id`
91
+ - `data/nonexample602_phred.csv`: rows=147, cols=10, columns=`Z1,Z2,Z3,Z4,Z5,Z6,Z7,Z8,Z9,Z10`
92
+ - `data/nonexample602_geno.mtx`: shape=`10000x147`, nnz=6174
93
+ - `data/nonexample602_kins_sparse.mtx`: shape=`10000x10000`, nnz=35000
94
+ - `data/nonexample602_kins_dense.mtx`: shape=`10000x10000`, nnz=35000
95
+ - `data/example_glmmkin_cov.csv`: shape=`163x163`
96
+ - `data/example_glmmkin_cov_rare_maf_0_01.csv`: shape=`153x153`
97
+ - `data/example_glmmkin_cov_cond_sparse.csv`: shape=`163x163`
98
+ - `data/example_glmmkin_cov_cond_dense.csv`: shape=`163x163`
99
+ - `data/example_glmmkin_binary_spa_sparse_fitted.csv`: shape=`10000x1`
100
+ - `data/example_glmmkin_binary_spa_sparse_scaled_residuals.csv`: shape=`10000x1`
101
+ - `data/example_glmmkin_binary_spa_sparse_XW.csv`: shape=`3x10000`
102
+ - `data/example_glmmkin_binary_spa_sparse_XXWX_inv.csv`: shape=`10000x3`
103
+ - `data/example_glmmkin_binary_spa_dense_fitted.csv`: shape=`10000x1`
104
+ - `data/example_glmmkin_binary_spa_dense_scaled_residuals.csv`: shape=`10000x1`
105
+ - `data/example_glmmkin_binary_spa_dense_XW.csv`: shape=`3x10000`
106
+ - `data/example_glmmkin_binary_spa_dense_XXWX_inv.csv`: shape=`10000x3`
107
+ - `data/example_glm_binary_spa_cov_filter.csv`: shape=`163x163`
108
+ - `data/example_glmmkin_binary_spa_sparse_cov_filter.csv`: shape=`163x163`
109
+ - `data/example_glmmkin_binary_spa_dense_cov_filter.csv`: shape=`163x163`
110
+ - `data/example_ai_pop_groups.csv`: rows=10000, cols=1, columns=`pop_group`
111
+ - `data/example_ai_pop_weights_1_1.csv`: rows=3, cols=3, columns=`population,B1,B2`
112
+ - `data/example_ai_pop_weights_1_25.csv`: rows=3, cols=3, columns=`population,B1,B2`
113
+ - `data/example_ai_cov_sparse_s1_b1.csv`: shape=`163x163`
114
+ - `data/example_ai_cov_sparse_s1_b2.csv`: shape=`163x163`
115
+ - `data/example_ai_cov_sparse_s2_b1.csv`: shape=`163x163`
116
+ - `data/example_ai_cov_sparse_s2_b2.csv`: shape=`163x163`
117
+ - `data/example_ai_cov_dense_s1_b1.csv`: shape=`163x163`
118
+ - `data/example_ai_cov_dense_s1_b2.csv`: shape=`163x163`
119
+ - `data/example_ai_cov_dense_s2_b1.csv`: shape=`163x163`
120
+ - `data/example_ai_cov_dense_s2_b2.csv`: shape=`163x163`
121
+
122
+ ## Generation Path
123
+
124
+ `data/` exports are generated by:
125
+
126
+ - `scripts/export_example_data.R`
127
+ - `baselines/scripts/extract_related_sparse_glmmkin_cond.R` (conditional covariance export)
128
+ - `baselines/scripts/extract_related_dense_glmmkin_cond.R` (conditional covariance export)
129
+ - `baselines/scripts/extract_related_sparse_binary_spa.R` (related binary SPA null-model components)
130
+ - `baselines/scripts/extract_related_dense_binary_spa.R` (related binary SPA null-model components)
131
+ - `baselines/scripts/extract_unrelated_binary_spa_filter.R` (unrelated binary SPA prefilter covariance export)
132
+ - `baselines/scripts/extract_related_sparse_binary_spa_filter.R` (related sparse binary SPA prefilter covariance export)
133
+ - `baselines/scripts/extract_related_dense_binary_spa_filter.R` (related dense binary SPA prefilter covariance export)
134
+ - `baselines/scripts/extract_ai_staar_unrelated.R` (AI-STAAR ancestry metadata export)
135
+ - `baselines/scripts/extract_ai_staar_related_sparse.R` (AI-STAAR sparse related covariance export)
136
+ - `baselines/scripts/extract_ai_staar_related_dense.R` (AI-STAAR dense related covariance export)
137
+ - `../STAAR` one-shot extraction script (`/tmp/extract_nonexample601.R`) for non-clone `nonexample601` dataset + sentinels (`STAAR-47`)
138
+ - `../STAAR` one-shot extraction script (`/tmp/extract_nonexample602.R`) for second-seed `nonexample602` dataset (`STAAR-54`)
139
+
140
+ Command:
141
+
142
+ - `Rscript scripts/export_example_data.R`
143
+ - `Rscript /tmp/extract_nonexample601.R` (executed in `../STAAR`, then copied `nonexample601_*` outputs)
144
+ - `Rscript /tmp/extract_nonexample602.R` (executed in `../STAAR`, then copied `nonexample602_*` outputs)