mta 0.0.6__tar.gz → 0.0.8__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Potentially problematic release.
This version of mta might be problematic. Click here for more details.
- mta-0.0.8/MANIFEST.in +5 -0
- mta-0.0.8/PKG-INFO +344 -0
- mta-0.0.8/README.md +314 -0
- mta-0.0.8/mta/mta.py +956 -0
- mta-0.0.8/mta.egg-info/PKG-INFO +344 -0
- {mta-0.0.6 → mta-0.0.8}/mta.egg-info/SOURCES.txt +4 -1
- mta-0.0.8/mta.egg-info/requires.txt +4 -0
- mta-0.0.8/pyproject.toml +37 -0
- mta-0.0.8/requirements.txt +4 -0
- mta-0.0.8/setup.py +43 -0
- mta-0.0.6/PKG-INFO +0 -14
- mta-0.0.6/README.md +0 -52
- mta-0.0.6/mta/mta.py +0 -851
- mta-0.0.6/mta/mta_spark.py +0 -431
- mta-0.0.6/mta.egg-info/PKG-INFO +0 -14
- mta-0.0.6/setup.py +0 -18
- {mta-0.0.6 → mta-0.0.8}/mta/__init__.py +0 -0
- {mta-0.0.6 → mta-0.0.8}/mta/data/data.csv.gz +0 -0
- {mta-0.0.6 → mta-0.0.8}/mta.egg-info/dependency_links.txt +0 -0
- {mta-0.0.6 → mta-0.0.8}/mta.egg-info/top_level.txt +0 -0
- {mta-0.0.6 → mta-0.0.8}/setup.cfg +0 -0
mta-0.0.8/MANIFEST.in
ADDED
mta-0.0.8/PKG-INFO
ADDED
|
@@ -0,0 +1,344 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: mta
|
|
3
|
+
Version: 0.0.8
|
|
4
|
+
Summary: Multi-Touch Attribution Models for Marketing Analytics
|
|
5
|
+
Home-page: https://github.com/eeghor/mta
|
|
6
|
+
Author: Igor Korostil
|
|
7
|
+
Author-email: Igor Korostil <eeghor@gmail.com>
|
|
8
|
+
License: MIT
|
|
9
|
+
Project-URL: Homepage, https://github.com/eeghor/mta
|
|
10
|
+
Project-URL: Issues, https://github.com/eeghor/mta/issues
|
|
11
|
+
Keywords: attribution,marketing,multi-touch,analytics,markov,shapley
|
|
12
|
+
Classifier: Development Status :: 3 - Alpha
|
|
13
|
+
Classifier: Intended Audience :: Developers
|
|
14
|
+
Classifier: Intended Audience :: Science/Research
|
|
15
|
+
Classifier: License :: OSI Approved :: MIT License
|
|
16
|
+
Classifier: Programming Language :: Python :: 3.8
|
|
17
|
+
Classifier: Programming Language :: Python :: 3.9
|
|
18
|
+
Classifier: Programming Language :: Python :: 3.10
|
|
19
|
+
Classifier: Programming Language :: Python :: 3.11
|
|
20
|
+
Classifier: Programming Language :: Python :: 3.12
|
|
21
|
+
Requires-Python: >=3.8
|
|
22
|
+
Description-Content-Type: text/markdown
|
|
23
|
+
Requires-Dist: pandas>=1.3.0
|
|
24
|
+
Requires-Dist: numpy>=1.20.0
|
|
25
|
+
Requires-Dist: scikit-learn>=0.24.0
|
|
26
|
+
Requires-Dist: arrow>=1.0.0
|
|
27
|
+
Dynamic: author
|
|
28
|
+
Dynamic: home-page
|
|
29
|
+
Dynamic: requires-python
|
|
30
|
+
|
|
31
|
+
# Multi-Touch Attribution (MTA)
|
|
32
|
+
|
|
33
|
+
A comprehensive Python library for multi-touch attribution modeling in marketing analytics. This library implements various attribution models to help marketers understand the contribution of different touchpoints in the customer journey.
|
|
34
|
+
|
|
35
|
+
## 🎯 Features
|
|
36
|
+
|
|
37
|
+
### Attribution Models Implemented
|
|
38
|
+
|
|
39
|
+
- **First Touch**: 100% credit to the first interaction
|
|
40
|
+
- **Last Touch**: 100% credit to the last interaction before conversion
|
|
41
|
+
- **Linear**: Equal credit distribution across all touchpoints
|
|
42
|
+
- **Position-Based (U-Shaped)**: Customizable weights for first/last touch with remaining credit distributed to middle touches
|
|
43
|
+
- **Time Decay**: Higher credit to more recent touchpoints
|
|
44
|
+
- **Markov Chain**: Probabilistic model using transition matrices
|
|
45
|
+
- **Shapley Value**: Game-theoretic fair allocation based on marginal contributions
|
|
46
|
+
- **Shao's Model**: Probabilistic Shapley-equivalent approach
|
|
47
|
+
- **Logistic Regression**: Machine learning-based ensemble attribution
|
|
48
|
+
- **Additive Hazard**: Survival analysis-based attribution
|
|
49
|
+
|
|
50
|
+
## 📦 Installation
|
|
51
|
+
|
|
52
|
+
```bash
|
|
53
|
+
pip install mta
|
|
54
|
+
```
|
|
55
|
+
|
|
56
|
+
Or install from source:
|
|
57
|
+
|
|
58
|
+
```bash
|
|
59
|
+
git clone https://github.com/eeghor/mta.git
|
|
60
|
+
cd mta
|
|
61
|
+
pip install -e .
|
|
62
|
+
```
|
|
63
|
+
|
|
64
|
+
## 🚀 Quick Start
|
|
65
|
+
|
|
66
|
+
### Basic Usage
|
|
67
|
+
|
|
68
|
+
```python
|
|
69
|
+
from mta import MTA
|
|
70
|
+
|
|
71
|
+
# Initialize with your data
|
|
72
|
+
mta = MTA(data="your_data.csv", allow_loops=False, add_timepoints=True)
|
|
73
|
+
|
|
74
|
+
# Run a single attribution model
|
|
75
|
+
mta.linear(share="proportional", normalize=True)
|
|
76
|
+
mta.show()
|
|
77
|
+
|
|
78
|
+
# Chain multiple models
|
|
79
|
+
(mta.linear(share="proportional")
|
|
80
|
+
.time_decay(count_direction="right")
|
|
81
|
+
.markov(sim=False)
|
|
82
|
+
.shapley()
|
|
83
|
+
.show())
|
|
84
|
+
```
|
|
85
|
+
|
|
86
|
+
### Using Configuration
|
|
87
|
+
|
|
88
|
+
```python
|
|
89
|
+
from mta import MTA, MTAConfig
|
|
90
|
+
|
|
91
|
+
# Create custom configuration
|
|
92
|
+
config = MTAConfig(
|
|
93
|
+
allow_loops=False,
|
|
94
|
+
add_timepoints=True,
|
|
95
|
+
sep=" > ",
|
|
96
|
+
normalize_by_default=True
|
|
97
|
+
)
|
|
98
|
+
|
|
99
|
+
mta = MTA(data="data.csv", config=config)
|
|
100
|
+
```
|
|
101
|
+
|
|
102
|
+
### Working with DataFrames
|
|
103
|
+
|
|
104
|
+
```python
|
|
105
|
+
import pandas as pd
|
|
106
|
+
from mta import MTA
|
|
107
|
+
|
|
108
|
+
# Load your data
|
|
109
|
+
df = pd.read_csv("customer_journeys.csv")
|
|
110
|
+
|
|
111
|
+
# Initialize MTA with DataFrame
|
|
112
|
+
mta = MTA(data=df, allow_loops=False)
|
|
113
|
+
|
|
114
|
+
# Run attribution models
|
|
115
|
+
mta.first_touch().last_touch().linear().show()
|
|
116
|
+
```
|
|
117
|
+
|
|
118
|
+
## 📊 Data Format
|
|
119
|
+
|
|
120
|
+
Your input data should be a CSV file or pandas DataFrame with the following columns:
|
|
121
|
+
|
|
122
|
+
```
|
|
123
|
+
path,total_conversions,total_null,exposure_times
|
|
124
|
+
alpha > beta > gamma,10,5,2023-01-01 10:00:00 > 2023-01-01 11:00:00 > 2023-01-01 12:00:00
|
|
125
|
+
beta > gamma,5,3,2023-01-02 09:00:00 > 2023-01-02 10:00:00
|
|
126
|
+
```
|
|
127
|
+
|
|
128
|
+
**Required Columns:**
|
|
129
|
+
|
|
130
|
+
- `path`: Customer journey as channel names separated by `>` (or custom separator)
|
|
131
|
+
- `total_conversions`: Number of conversions for this path
|
|
132
|
+
- `total_null`: Number of non-conversions for this path
|
|
133
|
+
- `exposure_times`: Timestamps of channel exposures (optional, can be auto-generated)
|
|
134
|
+
|
|
135
|
+
## 🎨 Advanced Usage
|
|
136
|
+
|
|
137
|
+
### Position-Based Attribution with Custom Weights
|
|
138
|
+
|
|
139
|
+
```python
|
|
140
|
+
# Give 30% to first touch, 30% to last touch, 40% distributed to middle
|
|
141
|
+
mta.position_based(first_weight=30, last_weight=30, normalize=True)
|
|
142
|
+
```
|
|
143
|
+
|
|
144
|
+
### Time Decay with Direction Control
|
|
145
|
+
|
|
146
|
+
```python
|
|
147
|
+
# Count from left (earliest gets lowest credit)
|
|
148
|
+
mta.time_decay(count_direction="left")
|
|
149
|
+
|
|
150
|
+
# Count from right (latest gets highest credit - more common)
|
|
151
|
+
mta.time_decay(count_direction="right")
|
|
152
|
+
```
|
|
153
|
+
|
|
154
|
+
### Markov Chain Attribution
|
|
155
|
+
|
|
156
|
+
```python
|
|
157
|
+
# Analytical calculation (faster)
|
|
158
|
+
mta.markov(sim=False, normalize=True)
|
|
159
|
+
|
|
160
|
+
# Simulation-based (more flexible, handles complex scenarios)
|
|
161
|
+
mta.markov(sim=True, normalize=True)
|
|
162
|
+
```
|
|
163
|
+
|
|
164
|
+
### Shapley Value Attribution
|
|
165
|
+
|
|
166
|
+
```python
|
|
167
|
+
# With custom coalition size
|
|
168
|
+
mta.shapley(max_coalition_size=3, normalize=True)
|
|
169
|
+
```
|
|
170
|
+
|
|
171
|
+
### Logistic Regression Ensemble
|
|
172
|
+
|
|
173
|
+
```python
|
|
174
|
+
# Custom sampling and iteration parameters
|
|
175
|
+
mta.logistic_regression(
|
|
176
|
+
test_size=0.25,
|
|
177
|
+
sample_rows=0.5,
|
|
178
|
+
sample_features=0.5,
|
|
179
|
+
n_iterations=1000,
|
|
180
|
+
normalize=True
|
|
181
|
+
)
|
|
182
|
+
```
|
|
183
|
+
|
|
184
|
+
### Export Results
|
|
185
|
+
|
|
186
|
+
```python
|
|
187
|
+
# Compare all models
|
|
188
|
+
results_df = mta.compare_models()
|
|
189
|
+
|
|
190
|
+
# Export to various formats
|
|
191
|
+
mta.export_results("attribution_results.csv", format="csv")
|
|
192
|
+
mta.export_results("attribution_results.json", format="json")
|
|
193
|
+
mta.export_results("attribution_results.xlsx", format="excel")
|
|
194
|
+
```
|
|
195
|
+
|
|
196
|
+
## 📈 Example: Complete Analysis Pipeline
|
|
197
|
+
|
|
198
|
+
```python
|
|
199
|
+
from mta import MTA
|
|
200
|
+
import pandas as pd
|
|
201
|
+
|
|
202
|
+
# Load data
|
|
203
|
+
mta = MTA(
|
|
204
|
+
data="customer_journeys.csv",
|
|
205
|
+
allow_loops=False, # Remove consecutive duplicate channels
|
|
206
|
+
add_timepoints=True # Auto-generate timestamps if missing
|
|
207
|
+
)
|
|
208
|
+
|
|
209
|
+
# Run all heuristic models
|
|
210
|
+
(mta
|
|
211
|
+
.first_touch()
|
|
212
|
+
.last_touch()
|
|
213
|
+
.linear(share="proportional")
|
|
214
|
+
.position_based(first_weight=40, last_weight=40)
|
|
215
|
+
.time_decay(count_direction="right"))
|
|
216
|
+
|
|
217
|
+
# Run algorithmic models
|
|
218
|
+
(mta
|
|
219
|
+
.markov(sim=False)
|
|
220
|
+
.shapley(max_coalition_size=2)
|
|
221
|
+
.shao()
|
|
222
|
+
.logistic_regression(n_iterations=2000)
|
|
223
|
+
.additive_hazard(epochs=20))
|
|
224
|
+
|
|
225
|
+
# Display and export results
|
|
226
|
+
results = mta.compare_models()
|
|
227
|
+
mta.export_results("full_attribution_analysis.csv")
|
|
228
|
+
|
|
229
|
+
# Access specific model results
|
|
230
|
+
print(f"Markov Attribution: {mta.attribution['markov']}")
|
|
231
|
+
print(f"Shapley Attribution: {mta.attribution['shapley']}")
|
|
232
|
+
```
|
|
233
|
+
|
|
234
|
+
## 🔬 Model Comparison
|
|
235
|
+
|
|
236
|
+
| Model | Type | Strengths | Use Case |
|
|
237
|
+
| ------------------- | ---------------- | ------------------------ | ---------------------------- |
|
|
238
|
+
| First/Last Touch | Heuristic | Simple, fast | Quick baseline |
|
|
239
|
+
| Linear | Heuristic | Fair, interpretable | Equal value assumption |
|
|
240
|
+
| Position-Based | Heuristic | Balances first/last | Awareness + conversion focus |
|
|
241
|
+
| Time Decay | Heuristic | Recency-weighted | When recent matters more |
|
|
242
|
+
| Markov Chain | Algorithmic | Considers path structure | Sequential dependency |
|
|
243
|
+
| Shapley Value | Algorithmic | Game-theoretic fairness | Complex interactions |
|
|
244
|
+
| Logistic Regression | Machine Learning | Data-driven | Large datasets |
|
|
245
|
+
| Additive Hazard | Statistical | Time-to-event modeling | Survival analysis fans |
|
|
246
|
+
|
|
247
|
+
## 🛠️ Requirements
|
|
248
|
+
|
|
249
|
+
- Python >= 3.8
|
|
250
|
+
- pandas >= 1.3.0
|
|
251
|
+
- numpy >= 1.20.0
|
|
252
|
+
- scikit-learn >= 0.24.0
|
|
253
|
+
- arrow >= 1.0.0
|
|
254
|
+
|
|
255
|
+
## 📝 Citation
|
|
256
|
+
|
|
257
|
+
If you use this library in your research, please cite:
|
|
258
|
+
|
|
259
|
+
```bibtex
|
|
260
|
+
@software{mta2024,
|
|
261
|
+
author = {Igor Korostil},
|
|
262
|
+
title = {MTA: Multi-Touch Attribution Library},
|
|
263
|
+
year = {2024},
|
|
264
|
+
url = {https://github.com/eeghor/mta}
|
|
265
|
+
}
|
|
266
|
+
```
|
|
267
|
+
|
|
268
|
+
## 📚 References
|
|
269
|
+
|
|
270
|
+
This library implements models and techniques from the following research papers:
|
|
271
|
+
|
|
272
|
+
1. **Nisar, T. M., & Yeung, M. (2015)**
|
|
273
|
+
_Purchase Conversions and Attribution Modeling in Online Advertising: An Empirical Investigation_
|
|
274
|
+
[PDF](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2612997)
|
|
275
|
+
|
|
276
|
+
2. **Shao, X., & Li, L. (2011)**
|
|
277
|
+
_Data-driven Multi-touch Attribution Models_
|
|
278
|
+
Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
|
|
279
|
+
[PDF](https://dl.acm.org/doi/10.1145/2020408.2020453)
|
|
280
|
+
|
|
281
|
+
3. **Dalessandro, B., Perlich, C., Stitelman, O., & Provost, F. (2012)**
|
|
282
|
+
_Causally Motivated Attribution for Online Advertising_
|
|
283
|
+
Proceedings of the Sixth International Workshop on Data Mining for Online Advertising
|
|
284
|
+
[PDF](https://dl.acm.org/doi/10.1145/2351356.2351363)
|
|
285
|
+
|
|
286
|
+
4. **Cano-Berlanga, S., Giménez-Gómez, J. M., & Vilella, C. (2017)**
|
|
287
|
+
_Attribution Models and the Cooperative Game Theory_
|
|
288
|
+
Expert Systems with Applications, 87, 277-286
|
|
289
|
+
[PDF](https://www.sciencedirect.com/science/article/abs/pii/S0957417417304505)
|
|
290
|
+
|
|
291
|
+
5. **Ren, K., Fang, Y., Zhang, W., Liu, S., Li, J., Zhang, Y., Yu, Y., & Wang, J. (2018)**
|
|
292
|
+
_Learning Multi-touch Conversion Attribution with Dual-attention Mechanisms for Online Advertising_
|
|
293
|
+
Proceedings of the 27th ACM International Conference on Information and Knowledge Management
|
|
294
|
+
[PDF](https://dl.acm.org/doi/10.1145/3269206.3271676)
|
|
295
|
+
|
|
296
|
+
6. **Zhang, Y., Wei, Y., & Ren, J. (2014)**
|
|
297
|
+
_Multi-Touch Attribution in Online Advertising with Survival Theory_
|
|
298
|
+
2014 IEEE International Conference on Data Mining
|
|
299
|
+
[PDF](https://ieeexplore.ieee.org/document/7023387)
|
|
300
|
+
|
|
301
|
+
7. **Geyik, S. C., Saxena, A., & Dasdan, A. (2014)**
|
|
302
|
+
_Multi-Touch Attribution Based Budget Allocation in Online Advertising_
|
|
303
|
+
Proceedings of the 8th International Workshop on Data Mining for Online Advertising
|
|
304
|
+
[PDF](https://dl.acm.org/doi/10.1145/2648584.2648586)
|
|
305
|
+
|
|
306
|
+
### Model-to-Paper Mapping
|
|
307
|
+
|
|
308
|
+
- **Linear & Position-Based**: Baseline models referenced across multiple papers
|
|
309
|
+
- **Time Decay**: Nisar & Yeung (2015), Zhang et al. (2014)
|
|
310
|
+
- **Markov Chain**: Shao & Li (2011), Dalessandro et al. (2012)
|
|
311
|
+
- **Shapley Value**: Cano-Berlanga et al. (2017)
|
|
312
|
+
- **Logistic Regression**: Dalessandro et al. (2012), Ren et al. (2018)
|
|
313
|
+
- **Additive Hazard**: Zhang et al. (2014)
|
|
314
|
+
|
|
315
|
+
## 🤝 Contributing
|
|
316
|
+
|
|
317
|
+
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
|
|
318
|
+
|
|
319
|
+
1. Fork the repository
|
|
320
|
+
2. Create your feature branch (`git checkout -b feature/AmazingFeature`)
|
|
321
|
+
3. Commit your changes (`git commit -m 'Add some AmazingFeature'`)
|
|
322
|
+
4. Push to the branch (`git push origin feature/AmazingFeature`)
|
|
323
|
+
5. Open a Pull Request
|
|
324
|
+
|
|
325
|
+
## 📄 License
|
|
326
|
+
|
|
327
|
+
This project is licensed under the MIT License - see the LICENSE file for details.
|
|
328
|
+
|
|
329
|
+
## 🙏 Acknowledgments
|
|
330
|
+
|
|
331
|
+
- Inspired by various academic papers on marketing attribution
|
|
332
|
+
- Built with pandas, numpy, and scikit-learn
|
|
333
|
+
- Special thanks to the open-source community
|
|
334
|
+
|
|
335
|
+
## 📧 Contact
|
|
336
|
+
|
|
337
|
+
Igor Korostil - eeghor@gmail.com
|
|
338
|
+
|
|
339
|
+
Project Link: [https://github.com/eeghor/mta](https://github.com/eeghor/mta)
|
|
340
|
+
|
|
341
|
+
## 🐛 Known Issues
|
|
342
|
+
|
|
343
|
+
- Shapley value computation can be slow for large numbers of channels
|
|
344
|
+
- Additive hazard model requires evenly-spaced time points for best results
|
mta-0.0.8/README.md
ADDED
|
@@ -0,0 +1,314 @@
|
|
|
1
|
+
# Multi-Touch Attribution (MTA)
|
|
2
|
+
|
|
3
|
+
A comprehensive Python library for multi-touch attribution modeling in marketing analytics. This library implements various attribution models to help marketers understand the contribution of different touchpoints in the customer journey.
|
|
4
|
+
|
|
5
|
+
## 🎯 Features
|
|
6
|
+
|
|
7
|
+
### Attribution Models Implemented
|
|
8
|
+
|
|
9
|
+
- **First Touch**: 100% credit to the first interaction
|
|
10
|
+
- **Last Touch**: 100% credit to the last interaction before conversion
|
|
11
|
+
- **Linear**: Equal credit distribution across all touchpoints
|
|
12
|
+
- **Position-Based (U-Shaped)**: Customizable weights for first/last touch with remaining credit distributed to middle touches
|
|
13
|
+
- **Time Decay**: Higher credit to more recent touchpoints
|
|
14
|
+
- **Markov Chain**: Probabilistic model using transition matrices
|
|
15
|
+
- **Shapley Value**: Game-theoretic fair allocation based on marginal contributions
|
|
16
|
+
- **Shao's Model**: Probabilistic Shapley-equivalent approach
|
|
17
|
+
- **Logistic Regression**: Machine learning-based ensemble attribution
|
|
18
|
+
- **Additive Hazard**: Survival analysis-based attribution
|
|
19
|
+
|
|
20
|
+
## 📦 Installation
|
|
21
|
+
|
|
22
|
+
```bash
|
|
23
|
+
pip install mta
|
|
24
|
+
```
|
|
25
|
+
|
|
26
|
+
Or install from source:
|
|
27
|
+
|
|
28
|
+
```bash
|
|
29
|
+
git clone https://github.com/eeghor/mta.git
|
|
30
|
+
cd mta
|
|
31
|
+
pip install -e .
|
|
32
|
+
```
|
|
33
|
+
|
|
34
|
+
## 🚀 Quick Start
|
|
35
|
+
|
|
36
|
+
### Basic Usage
|
|
37
|
+
|
|
38
|
+
```python
|
|
39
|
+
from mta import MTA
|
|
40
|
+
|
|
41
|
+
# Initialize with your data
|
|
42
|
+
mta = MTA(data="your_data.csv", allow_loops=False, add_timepoints=True)
|
|
43
|
+
|
|
44
|
+
# Run a single attribution model
|
|
45
|
+
mta.linear(share="proportional", normalize=True)
|
|
46
|
+
mta.show()
|
|
47
|
+
|
|
48
|
+
# Chain multiple models
|
|
49
|
+
(mta.linear(share="proportional")
|
|
50
|
+
.time_decay(count_direction="right")
|
|
51
|
+
.markov(sim=False)
|
|
52
|
+
.shapley()
|
|
53
|
+
.show())
|
|
54
|
+
```
|
|
55
|
+
|
|
56
|
+
### Using Configuration
|
|
57
|
+
|
|
58
|
+
```python
|
|
59
|
+
from mta import MTA, MTAConfig
|
|
60
|
+
|
|
61
|
+
# Create custom configuration
|
|
62
|
+
config = MTAConfig(
|
|
63
|
+
allow_loops=False,
|
|
64
|
+
add_timepoints=True,
|
|
65
|
+
sep=" > ",
|
|
66
|
+
normalize_by_default=True
|
|
67
|
+
)
|
|
68
|
+
|
|
69
|
+
mta = MTA(data="data.csv", config=config)
|
|
70
|
+
```
|
|
71
|
+
|
|
72
|
+
### Working with DataFrames
|
|
73
|
+
|
|
74
|
+
```python
|
|
75
|
+
import pandas as pd
|
|
76
|
+
from mta import MTA
|
|
77
|
+
|
|
78
|
+
# Load your data
|
|
79
|
+
df = pd.read_csv("customer_journeys.csv")
|
|
80
|
+
|
|
81
|
+
# Initialize MTA with DataFrame
|
|
82
|
+
mta = MTA(data=df, allow_loops=False)
|
|
83
|
+
|
|
84
|
+
# Run attribution models
|
|
85
|
+
mta.first_touch().last_touch().linear().show()
|
|
86
|
+
```
|
|
87
|
+
|
|
88
|
+
## 📊 Data Format
|
|
89
|
+
|
|
90
|
+
Your input data should be a CSV file or pandas DataFrame with the following columns:
|
|
91
|
+
|
|
92
|
+
```
|
|
93
|
+
path,total_conversions,total_null,exposure_times
|
|
94
|
+
alpha > beta > gamma,10,5,2023-01-01 10:00:00 > 2023-01-01 11:00:00 > 2023-01-01 12:00:00
|
|
95
|
+
beta > gamma,5,3,2023-01-02 09:00:00 > 2023-01-02 10:00:00
|
|
96
|
+
```
|
|
97
|
+
|
|
98
|
+
**Required Columns:**
|
|
99
|
+
|
|
100
|
+
- `path`: Customer journey as channel names separated by `>` (or custom separator)
|
|
101
|
+
- `total_conversions`: Number of conversions for this path
|
|
102
|
+
- `total_null`: Number of non-conversions for this path
|
|
103
|
+
- `exposure_times`: Timestamps of channel exposures (optional, can be auto-generated)
|
|
104
|
+
|
|
105
|
+
## 🎨 Advanced Usage
|
|
106
|
+
|
|
107
|
+
### Position-Based Attribution with Custom Weights
|
|
108
|
+
|
|
109
|
+
```python
|
|
110
|
+
# Give 30% to first touch, 30% to last touch, 40% distributed to middle
|
|
111
|
+
mta.position_based(first_weight=30, last_weight=30, normalize=True)
|
|
112
|
+
```
|
|
113
|
+
|
|
114
|
+
### Time Decay with Direction Control
|
|
115
|
+
|
|
116
|
+
```python
|
|
117
|
+
# Count from left (earliest gets lowest credit)
|
|
118
|
+
mta.time_decay(count_direction="left")
|
|
119
|
+
|
|
120
|
+
# Count from right (latest gets highest credit - more common)
|
|
121
|
+
mta.time_decay(count_direction="right")
|
|
122
|
+
```
|
|
123
|
+
|
|
124
|
+
### Markov Chain Attribution
|
|
125
|
+
|
|
126
|
+
```python
|
|
127
|
+
# Analytical calculation (faster)
|
|
128
|
+
mta.markov(sim=False, normalize=True)
|
|
129
|
+
|
|
130
|
+
# Simulation-based (more flexible, handles complex scenarios)
|
|
131
|
+
mta.markov(sim=True, normalize=True)
|
|
132
|
+
```
|
|
133
|
+
|
|
134
|
+
### Shapley Value Attribution
|
|
135
|
+
|
|
136
|
+
```python
|
|
137
|
+
# With custom coalition size
|
|
138
|
+
mta.shapley(max_coalition_size=3, normalize=True)
|
|
139
|
+
```
|
|
140
|
+
|
|
141
|
+
### Logistic Regression Ensemble
|
|
142
|
+
|
|
143
|
+
```python
|
|
144
|
+
# Custom sampling and iteration parameters
|
|
145
|
+
mta.logistic_regression(
|
|
146
|
+
test_size=0.25,
|
|
147
|
+
sample_rows=0.5,
|
|
148
|
+
sample_features=0.5,
|
|
149
|
+
n_iterations=1000,
|
|
150
|
+
normalize=True
|
|
151
|
+
)
|
|
152
|
+
```
|
|
153
|
+
|
|
154
|
+
### Export Results
|
|
155
|
+
|
|
156
|
+
```python
|
|
157
|
+
# Compare all models
|
|
158
|
+
results_df = mta.compare_models()
|
|
159
|
+
|
|
160
|
+
# Export to various formats
|
|
161
|
+
mta.export_results("attribution_results.csv", format="csv")
|
|
162
|
+
mta.export_results("attribution_results.json", format="json")
|
|
163
|
+
mta.export_results("attribution_results.xlsx", format="excel")
|
|
164
|
+
```
|
|
165
|
+
|
|
166
|
+
## 📈 Example: Complete Analysis Pipeline
|
|
167
|
+
|
|
168
|
+
```python
|
|
169
|
+
from mta import MTA
|
|
170
|
+
import pandas as pd
|
|
171
|
+
|
|
172
|
+
# Load data
|
|
173
|
+
mta = MTA(
|
|
174
|
+
data="customer_journeys.csv",
|
|
175
|
+
allow_loops=False, # Remove consecutive duplicate channels
|
|
176
|
+
add_timepoints=True # Auto-generate timestamps if missing
|
|
177
|
+
)
|
|
178
|
+
|
|
179
|
+
# Run all heuristic models
|
|
180
|
+
(mta
|
|
181
|
+
.first_touch()
|
|
182
|
+
.last_touch()
|
|
183
|
+
.linear(share="proportional")
|
|
184
|
+
.position_based(first_weight=40, last_weight=40)
|
|
185
|
+
.time_decay(count_direction="right"))
|
|
186
|
+
|
|
187
|
+
# Run algorithmic models
|
|
188
|
+
(mta
|
|
189
|
+
.markov(sim=False)
|
|
190
|
+
.shapley(max_coalition_size=2)
|
|
191
|
+
.shao()
|
|
192
|
+
.logistic_regression(n_iterations=2000)
|
|
193
|
+
.additive_hazard(epochs=20))
|
|
194
|
+
|
|
195
|
+
# Display and export results
|
|
196
|
+
results = mta.compare_models()
|
|
197
|
+
mta.export_results("full_attribution_analysis.csv")
|
|
198
|
+
|
|
199
|
+
# Access specific model results
|
|
200
|
+
print(f"Markov Attribution: {mta.attribution['markov']}")
|
|
201
|
+
print(f"Shapley Attribution: {mta.attribution['shapley']}")
|
|
202
|
+
```
|
|
203
|
+
|
|
204
|
+
## 🔬 Model Comparison
|
|
205
|
+
|
|
206
|
+
| Model | Type | Strengths | Use Case |
|
|
207
|
+
| ------------------- | ---------------- | ------------------------ | ---------------------------- |
|
|
208
|
+
| First/Last Touch | Heuristic | Simple, fast | Quick baseline |
|
|
209
|
+
| Linear | Heuristic | Fair, interpretable | Equal value assumption |
|
|
210
|
+
| Position-Based | Heuristic | Balances first/last | Awareness + conversion focus |
|
|
211
|
+
| Time Decay | Heuristic | Recency-weighted | When recent matters more |
|
|
212
|
+
| Markov Chain | Algorithmic | Considers path structure | Sequential dependency |
|
|
213
|
+
| Shapley Value | Algorithmic | Game-theoretic fairness | Complex interactions |
|
|
214
|
+
| Logistic Regression | Machine Learning | Data-driven | Large datasets |
|
|
215
|
+
| Additive Hazard | Statistical | Time-to-event modeling | Survival analysis fans |
|
|
216
|
+
|
|
217
|
+
## 🛠️ Requirements
|
|
218
|
+
|
|
219
|
+
- Python >= 3.8
|
|
220
|
+
- pandas >= 1.3.0
|
|
221
|
+
- numpy >= 1.20.0
|
|
222
|
+
- scikit-learn >= 0.24.0
|
|
223
|
+
- arrow >= 1.0.0
|
|
224
|
+
|
|
225
|
+
## 📝 Citation
|
|
226
|
+
|
|
227
|
+
If you use this library in your research, please cite:
|
|
228
|
+
|
|
229
|
+
```bibtex
|
|
230
|
+
@software{mta2024,
|
|
231
|
+
author = {Igor Korostil},
|
|
232
|
+
title = {MTA: Multi-Touch Attribution Library},
|
|
233
|
+
year = {2024},
|
|
234
|
+
url = {https://github.com/eeghor/mta}
|
|
235
|
+
}
|
|
236
|
+
```
|
|
237
|
+
|
|
238
|
+
## 📚 References
|
|
239
|
+
|
|
240
|
+
This library implements models and techniques from the following research papers:
|
|
241
|
+
|
|
242
|
+
1. **Nisar, T. M., & Yeung, M. (2015)**
|
|
243
|
+
_Purchase Conversions and Attribution Modeling in Online Advertising: An Empirical Investigation_
|
|
244
|
+
[PDF](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2612997)
|
|
245
|
+
|
|
246
|
+
2. **Shao, X., & Li, L. (2011)**
|
|
247
|
+
_Data-driven Multi-touch Attribution Models_
|
|
248
|
+
Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
|
|
249
|
+
[PDF](https://dl.acm.org/doi/10.1145/2020408.2020453)
|
|
250
|
+
|
|
251
|
+
3. **Dalessandro, B., Perlich, C., Stitelman, O., & Provost, F. (2012)**
|
|
252
|
+
_Causally Motivated Attribution for Online Advertising_
|
|
253
|
+
Proceedings of the Sixth International Workshop on Data Mining for Online Advertising
|
|
254
|
+
[PDF](https://dl.acm.org/doi/10.1145/2351356.2351363)
|
|
255
|
+
|
|
256
|
+
4. **Cano-Berlanga, S., Giménez-Gómez, J. M., & Vilella, C. (2017)**
|
|
257
|
+
_Attribution Models and the Cooperative Game Theory_
|
|
258
|
+
Expert Systems with Applications, 87, 277-286
|
|
259
|
+
[PDF](https://www.sciencedirect.com/science/article/abs/pii/S0957417417304505)
|
|
260
|
+
|
|
261
|
+
5. **Ren, K., Fang, Y., Zhang, W., Liu, S., Li, J., Zhang, Y., Yu, Y., & Wang, J. (2018)**
|
|
262
|
+
_Learning Multi-touch Conversion Attribution with Dual-attention Mechanisms for Online Advertising_
|
|
263
|
+
Proceedings of the 27th ACM International Conference on Information and Knowledge Management
|
|
264
|
+
[PDF](https://dl.acm.org/doi/10.1145/3269206.3271676)
|
|
265
|
+
|
|
266
|
+
6. **Zhang, Y., Wei, Y., & Ren, J. (2014)**
|
|
267
|
+
_Multi-Touch Attribution in Online Advertising with Survival Theory_
|
|
268
|
+
2014 IEEE International Conference on Data Mining
|
|
269
|
+
[PDF](https://ieeexplore.ieee.org/document/7023387)
|
|
270
|
+
|
|
271
|
+
7. **Geyik, S. C., Saxena, A., & Dasdan, A. (2014)**
|
|
272
|
+
_Multi-Touch Attribution Based Budget Allocation in Online Advertising_
|
|
273
|
+
Proceedings of the 8th International Workshop on Data Mining for Online Advertising
|
|
274
|
+
[PDF](https://dl.acm.org/doi/10.1145/2648584.2648586)
|
|
275
|
+
|
|
276
|
+
### Model-to-Paper Mapping
|
|
277
|
+
|
|
278
|
+
- **Linear & Position-Based**: Baseline models referenced across multiple papers
|
|
279
|
+
- **Time Decay**: Nisar & Yeung (2015), Zhang et al. (2014)
|
|
280
|
+
- **Markov Chain**: Shao & Li (2011), Dalessandro et al. (2012)
|
|
281
|
+
- **Shapley Value**: Cano-Berlanga et al. (2017)
|
|
282
|
+
- **Logistic Regression**: Dalessandro et al. (2012), Ren et al. (2018)
|
|
283
|
+
- **Additive Hazard**: Zhang et al. (2014)
|
|
284
|
+
|
|
285
|
+
## 🤝 Contributing
|
|
286
|
+
|
|
287
|
+
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
|
|
288
|
+
|
|
289
|
+
1. Fork the repository
|
|
290
|
+
2. Create your feature branch (`git checkout -b feature/AmazingFeature`)
|
|
291
|
+
3. Commit your changes (`git commit -m 'Add some AmazingFeature'`)
|
|
292
|
+
4. Push to the branch (`git push origin feature/AmazingFeature`)
|
|
293
|
+
5. Open a Pull Request
|
|
294
|
+
|
|
295
|
+
## 📄 License
|
|
296
|
+
|
|
297
|
+
This project is licensed under the MIT License - see the LICENSE file for details.
|
|
298
|
+
|
|
299
|
+
## 🙏 Acknowledgments
|
|
300
|
+
|
|
301
|
+
- Inspired by various academic papers on marketing attribution
|
|
302
|
+
- Built with pandas, numpy, and scikit-learn
|
|
303
|
+
- Special thanks to the open-source community
|
|
304
|
+
|
|
305
|
+
## 📧 Contact
|
|
306
|
+
|
|
307
|
+
Igor Korostil - eeghor@gmail.com
|
|
308
|
+
|
|
309
|
+
Project Link: [https://github.com/eeghor/mta](https://github.com/eeghor/mta)
|
|
310
|
+
|
|
311
|
+
## 🐛 Known Issues
|
|
312
|
+
|
|
313
|
+
- Shapley value computation can be slow for large numbers of channels
|
|
314
|
+
- Additive hazard model requires evenly-spaced time points for best results
|