PyPI - streamfuels - Versions diffs - 0.1.0__tar.gz - Mend

streamfuels 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (16) hide show

streamfuels-0.1.0/PKG-INFO +99 -0
streamfuels-0.1.0/README.md +67 -0
streamfuels-0.1.0/pyproject.toml +3 -0
streamfuels-0.1.0/setup.cfg +4 -0
streamfuels-0.1.0/setup.py +31 -0
streamfuels-0.1.0/streamfuels/__init__.py +1 -0
streamfuels-0.1.0/streamfuels/datasets/__init__.py +4 -0
streamfuels-0.1.0/streamfuels/datasets/auxiliary_functions.py +368 -0
streamfuels-0.1.0/streamfuels/datasets/dataset_loader.py +321 -0
streamfuels-0.1.0/streamfuels/datasets/extract.py +462 -0
streamfuels-0.1.0/streamfuels/datasets/transform.py +889 -0
streamfuels-0.1.0/streamfuels.egg-info/PKG-INFO +99 -0
streamfuels-0.1.0/streamfuels.egg-info/SOURCES.txt +14 -0
streamfuels-0.1.0/streamfuels.egg-info/dependency_links.txt +1 -0
streamfuels-0.1.0/streamfuels.egg-info/requires.txt +8 -0
streamfuels-0.1.0/streamfuels.egg-info/top_level.txt +1 -0

streamfuels-0.1.0/PKG-INFO ADDED Viewed

@@ -0,0 +1,99 @@
+Metadata-Version: 2.4
+Name: streamfuels
+Version: 0.1.0
+Summary: Data processing and analysis tools for fuel market research
+Home-page: https://github.com/streamfuels/streamfuels
+Author: StreamFuels
+Author-email: lucascstxv@gmail.com
+License: MIT
+Classifier: Programming Language :: Python :: 3
+Classifier: License :: OSI Approved :: MIT License
+Classifier: Operating System :: OS Independent
+Requires-Python: >=3.9
+Description-Content-Type: text/markdown
+Requires-Dist: pandas>=1.2.0
+Requires-Dist: requests>=2.25.0
+Requires-Dist: beautifulsoup4>=4.9.0
+Requires-Dist: unidecode>=1.1.1
+Requires-Dist: numpy>=1.19.0
+Requires-Dist: editdistance>=0.5.3
+Requires-Dist: setuptools
+Requires-Dist: tqdm==4.65.0
+Dynamic: author
+Dynamic: author-email
+Dynamic: classifier
+Dynamic: description
+Dynamic: description-content-type
+Dynamic: home-page
+Dynamic: license
+Dynamic: requires-dist
+Dynamic: requires-python
+Dynamic: summary
+# StreamFuels
+StreamFuels is a collection of tools for processing and analyzing fuel market data, focusing on petroleum derivatives, natural gas, and biofuels market across different regions of Brazil.
+***monthly_sales_state()***:
+Monthly fuel sales data by state from the ANP database
+***yearly_sales_state()***:
+Yearly fuel sales data by state from ANP database
+***yearly_sales_city()***:
+Yearly fuel sales data by city from ANP database
+***monthly_operations_state()***:
+Monthly oil production, NGL production, natural gas production, reinjection, flaring and losses, self-consumption, and available natural gas. It provides a comprehensive view of petroleum and gas operations.
+<!-- ## Installation
+```bash
+pip install streamfuels
+``` -->
+To run locally, in your target python environment and in this project folder type:
+```bash
+pip install -e .
+```
+After that you can import using the target python environment:
+```python
+from streamfuels.datasets import DatasetLoader
+loader = DatasetLoader()
+result, flag = loader.yearly_sales_state()
+df, metadata = loader.read_tsf(path_tsf=result)
+```
+### Yearly sales of petroleum derivatives in the states of Brazil.
+```python
+result, flag = loader.yearly_sales_state()
+```
+![image](https://github.com/user-attachments/assets/ab1d0ac8-9574-4229-81e6-2e3ef32e959c)
+### Monthly sales of petroleum derivatives in the states of Brazil.
+```python
+result, flag = loader.monthly_sales_state()
+```
+![image](https://github.com/user-attachments/assets/4894d0cf-eb92-421b-8b8a-d0a1522ccc0d)
+### Monthly oil and gas operations in the states of Brazil.
+```python
+result, flag = loader.monthly_operations_state()
+```
+![image](https://github.com/user-attachments/assets/ab9b18b5-54ee-41f8-8948-9458b6e96343)
+### Yearly sales of petroleum derivatives in the cities of Brazil.
+```python
+result, flag = loader.yearly_sales_city()
+```
+![image](https://github.com/user-attachments/assets/26ac0d96-73f9-43a8-b9bf-47106cafeba4)

streamfuels-0.1.0/README.md ADDED Viewed

@@ -0,0 +1,67 @@
+# StreamFuels
+StreamFuels is a collection of tools for processing and analyzing fuel market data, focusing on petroleum derivatives, natural gas, and biofuels market across different regions of Brazil.
+***monthly_sales_state()***:
+Monthly fuel sales data by state from the ANP database
+***yearly_sales_state()***:
+Yearly fuel sales data by state from ANP database
+***yearly_sales_city()***:
+Yearly fuel sales data by city from ANP database
+***monthly_operations_state()***:
+Monthly oil production, NGL production, natural gas production, reinjection, flaring and losses, self-consumption, and available natural gas. It provides a comprehensive view of petroleum and gas operations.
+<!-- ## Installation
+```bash
+pip install streamfuels
+``` -->
+To run locally, in your target python environment and in this project folder type:
+```bash
+pip install -e .
+```
+After that you can import using the target python environment:
+```python
+from streamfuels.datasets import DatasetLoader
+loader = DatasetLoader()
+result, flag = loader.yearly_sales_state()
+df, metadata = loader.read_tsf(path_tsf=result)
+```
+### Yearly sales of petroleum derivatives in the states of Brazil.
+```python
+result, flag = loader.yearly_sales_state()
+```
+![image](https://github.com/user-attachments/assets/ab1d0ac8-9574-4229-81e6-2e3ef32e959c)
+### Monthly sales of petroleum derivatives in the states of Brazil.
+```python
+result, flag = loader.monthly_sales_state()
+```
+![image](https://github.com/user-attachments/assets/4894d0cf-eb92-421b-8b8a-d0a1522ccc0d)
+### Monthly oil and gas operations in the states of Brazil.
+```python
+result, flag = loader.monthly_operations_state()
+```
+![image](https://github.com/user-attachments/assets/ab9b18b5-54ee-41f8-8948-9458b6e96343)
+### Yearly sales of petroleum derivatives in the cities of Brazil.
+```python
+result, flag = loader.yearly_sales_city()
+```
+![image](https://github.com/user-attachments/assets/26ac0d96-73f9-43a8-b9bf-47106cafeba4)

streamfuels-0.1.0/pyproject.toml ADDED Viewed

@@ -0,0 +1,3 @@
+[build-system]
+requires = ["setuptools", "wheel"]
+build-backend = "setuptools.build_meta"

streamfuels-0.1.0/setup.cfg ADDED Viewed

@@ -0,0 +1,4 @@
+[egg_info]
+tag_build =
+tag_date = 0

streamfuels-0.1.0/setup.py ADDED Viewed

@@ -0,0 +1,31 @@
+from setuptools import setup, find_packages
+setup(
+    name='streamfuels',
+    version='0.1.0',
+    packages=find_packages(),
+    install_requires=[
+        'pandas>=1.2.0',
+        'requests>=2.25.0',
+        'beautifulsoup4>=4.9.0',
+        'unidecode>=1.1.1',
+        'numpy>=1.19.0',
+        'editdistance>=0.5.3',
+        'setuptools',
+        'tqdm==4.65.0'
+    ],
+    author='StreamFuels',
+    author_email='lucascstxv@gmail.com',
+    description='Data processing and analysis tools for fuel market research',
+    long_description=open('README.md').read(),
+    long_description_content_type='text/markdown',
+    url='https://github.com/streamfuels/streamfuels',
+    classifiers=[
+        'Programming Language :: Python :: 3',
+        'License :: OSI Approved :: MIT License',
+        'Operating System :: OS Independent',
+    ],
+    python_requires='>=3.9',
+    license='MIT',
+    license_files='LICENSE',
+)

streamfuels-0.1.0/streamfuels/__init__.py ADDED Viewed

	@@ -0,0 +1 @@
1	+ __version__ = "0.1.0"

streamfuels-0.1.0/streamfuels/datasets/__init__.py ADDED Viewed

@@ -0,0 +1,4 @@
+from .dataset_loader import DatasetLoader
+from .extract import download_anp_data
+__all__ = ['DatasetLoader', 'download_anp_data']

streamfuels-0.1.0/streamfuels/datasets/auxiliary_functions.py ADDED Viewed

@@ -0,0 +1,368 @@
+import os
+import zipfile
+from unidecode import unidecode
+import re
+import pandas as pd
+import numpy as np
+def znorm(x):
+    std = np.std(x)
+    if std == 0:
+        return x - np.mean(x)
+    return (x - np.mean(x)) / std
+def translate_fuel_name(fuel_name):
+    fuel_mapping = {
+        'ethanol': 'Etanol hidratado',
+        'gasoline-r': 'Gasolina C',
+        'gasoline-a': 'Gasolina de aviação',
+        'fuel oil': 'Óleo combustível',
+        'LPG': 'GLP',
+        'diesel': 'Óleo diesel',
+        'kerosene-i': 'Querosene iluminante',
+        'kerosene-a': 'Querosene de aviação',
+        'etanol': 'ethanol'
+    }
+    if fuel_name.lower() not in fuel_mapping:
+        print(f"Fuel name '{fuel_name}' not found in mapping.")
+    return fuel_mapping.get(fuel_name.lower(), "Invalid")
+def prod_to_en(prod):
+    prods = {
+        'petroleo': 'petroleum',
+        'lgn': 'NGL',
+        'gasnatural': 'natural gas'
+    }
+    return prods.get(prod.lower(), "Invalid")
+def fuel_pt_to_en(fuel_name):
+    fuel_mapping = {
+        'etanolhidratado':'ethanol',
+        'gasolinac':'gasoline-r',
+        'gasolinadeaviacao':'gasoline-a',
+        'oleocombustivel':'fuel oil',
+        'glp':'LPG',
+        'oleodiesel':'diesel',
+        'queroseneiluminante':'kerosene-i',
+        'querosenedeaviacao':'kerosene-a',
+        'asfalto':'asphalt',
+        'etanol': 'ethanol'
+    }
+    if fuel_name.lower() not in fuel_mapping:
+        print(f"Fuel name '{fuel_name}' not found in mapping.")
+    return fuel_mapping.get(fuel_name.lower(), "Invalid")
+def get_default_download_dir():
+    """Return the default directory for downloads."""
+    default_dir = os.path.join(os.path.expanduser("~"), ".streamfuels")
+    if not os.path.exists(default_dir):
+        os.makedirs(default_dir)
+    return default_dir
+def unzip_and_delete(zip_file_path):
+    """
+    Unzips a ZIP file and deletes the original ZIP file after extraction.
+    Parameters:
+    - zip_file_path: Path to the ZIP file.
+    """
+    # Check if the file exists and is a zip file
+    if not zipfile.is_zipfile(zip_file_path):
+        print(f"The file at {zip_file_path} is not a valid zip file.")
+        return
+    try:
+        # Create a ZipFile object in read mode
+        with zipfile.ZipFile(zip_file_path, 'r') as zip_ref:
+            # Extract all the contents into the directory of the zip file
+            extract_path = os.path.dirname(zip_file_path)
+            zip_ref.extractall(extract_path)
+            print(f"Extracted all contents to {extract_path}")
+        # Remove the original ZIP file
+        os.remove(zip_file_path)
+        print(f"Deleted original zip file: {zip_file_path}")
+    except Exception as e:
+        print(f"An error occurred: {e}")
+def parse_string(string):
+    return re.sub(r'[^a-zA-Z0-9]', '', unidecode(str(string).lower()))
+def mes_para_numero(mes):
+    """Convert month name to number, handling different input types.
+    Args:
+        mes: Month name as string, float, or other type. If float, will be converted to string first.
+    Returns:
+        str: Two-digit month number as string
+    """
+    # Handle different input types
+    if mes is None or pd.isna(mes):
+        return '01'  # Default to January if None or NaN
+    # Convert to string if needed
+    if not isinstance(mes, str):
+        mes = str(mes)
+    # Remove decimal part if present
+    if '.' in mes:
+        mes = mes.split('.')[0]
+    # Map of month names to numbers
+    meses = {
+        'JAN': '01', 'FEV': '02', 'MAR': '03', 'ABR': '04',
+        'MAI': '05', 'JUN': '06', 'JUL': '07', 'AGO': '08',
+        'SET': '09', 'OUT': '10', 'NOV': '11', 'DEZ': '12',
+        # Add direct number mapping for numeric months
+        '1': '01', '2': '02', '3': '03', '4': '04',
+        '5': '05', '6': '06', '7': '07', '8': '08',
+        '9': '09', '10': '10', '11': '11', '12': '12'
+    }
+    # Try to get the month number, defaulting to '01' if not found
+    try:
+        return meses.get(mes.upper(), '01')
+    except AttributeError:
+        # If any other error occurs, default to January
+        return '01'
+def ensure_folder_exists(parts):
+    """
+    Checks if a folder exists, and creates it (including any necessary parent directories)
+    if it doesn't.
+    Parameters:
+    - folder_path: The path to the folder to check and create.
+    """
+    file_path = get_default_download_dir()
+    p = os.path.join(file_path, *parts)
+    if not os.path.exists(p):
+        os.makedirs(p)
+    return p
+def estado_para_sigla(estado):
+    # Mapeamento dos nomes dos estados para suas siglas
+    estados = {
+        'acre': 'ac',
+        'alagoas': 'al',
+        'amapa': 'ap',
+        'amazonas': 'am',
+        'bahia': 'ba',
+        'ceara': 'ce',
+        'distritofederal': 'df',
+        'espiritosanto': 'es',
+        'goias': 'go',
+        'maranhao': 'ma',
+        'matogrosso': 'mt',
+        'matogrossodosul': 'ms',
+        'minasgerais': 'mg',
+        'para': 'pa',
+        'paraiba': 'pb',
+        'parana': 'pr',
+        'pernambuco': 'pe',
+        'piaui': 'pi',
+        'riodejaneiro': 'rj',
+        'riograndedonorte': 'rn',
+        'riograndedosul': 'rs',
+        'rondonia': 'ro',
+        'roraima': 'rr',
+        'santacatarina': 'sc',
+        'saopaulo': 'sp',
+        'sergipe': 'se',
+        'tocantins': 'to'
+    }
+    return estados.get(estado, 'Undefined')
+def obter_max_min_datas(df, col_data, mes_ou_ano):
+    """Get the maximum and minimum dates from a DataFrame column.
+    Args:
+        df: DataFrame containing the data
+        col_data: Column name containing date information
+        mes_ou_ano: Type of date information ('ano' for year, any other value for month)
+    Returns:
+        tuple: (max_date, min_date)
+    """
+    # Make a copy to avoid changing the original DataFrame
+    date_series = df[col_data].copy()
+    # Filter out any NaN values
+    date_series = date_series[~pd.isna(date_series)]
+    if date_series.empty:
+        print(f"Warning: No valid dates found in column {col_data}")
+        # Return default values if no valid dates are found
+        return ('2020', '2000') if mes_ou_ano == 'ano' else ('202001', '200001')
+    # Process differently based on date type
+    if mes_ou_ano == 'ano':
+        try:
+            # Try to convert directly to int
+            date_series = date_series.astype(int)
+        except ValueError:
+            # If that fails, clean the data first
+            date_series = date_series.astype(str)
+            # Extract just the year part if there's a decimal
+            date_series = date_series.apply(lambda x: x.split('.')[0] if '.' in x else x)
+            # Remove any non-digit characters
+            date_series = date_series.str.extract(r'(\d+)', expand=False)
+            # Convert to integer
+            date_series = pd.to_numeric(date_series, errors='coerce')
+            # Drop any NaN values that might have been introduced
+            date_series = date_series.dropna()
+        max_date = date_series.max()
+        min_date = date_series.min()
+    else:
+        # For monthly data
+        try:
+            # Clean the data first
+            date_series = date_series.astype(str)
+            # Remove dashes to get format YYYYMM
+            clean_dates = date_series.str.replace("-", "")
+            # Convert to numeric and handle errors
+            numeric_dates = pd.to_numeric(clean_dates, errors='coerce')
+            # Drop any NaN values
+            numeric_dates = numeric_dates.dropna()
+            max_date = numeric_dates.max()
+            min_date = numeric_dates.min()
+        except Exception as e:
+            print(f"Error processing dates: {e}")
+            print("Sample dates:", date_series.head())
+            # Use default values if processing fails
+            max_date = 202001
+            min_date = 200001
+    return max_date, min_date
+def kg_to_m3(material, kg):
+    #https://www.gov.br/anp/pt-br/centrais-de-conteudo/publicacoes/anuario-estatistico/arquivos-anuario-estatistico-2022/outras-pecas-documentais/fatores-conversao-2022.pdf
+    densidades = { #em TERA / M3
+        'etanolanidro': 0.79100,
+        'etanolhidratado': 0.80900,
+        'asfalto': 1025.00,
+        'biodieselb100': 880.00,
+        'gasolinac': 754.25,
+        'gasolinadeaviacao': 726.00,
+        'glp': 552.00,
+        'lgn': 580.00,
+        'oleodiesel': 840.00,
+        'oleocombustivel': 1013.00,
+        'petroleo': 849.76,
+        'querosenedeaviacao': 799.00,
+        'queroseneiluminante': 799.00,
+        'solventes': 741.00
+    }
+    if material in densidades:
+        densidade = densidades[material] / 1e3  # Convertendo para kg/m³
+        m3 = kg / densidade
+        return m3
+    else:
+        return "Material não encontrado na lista."
+def registrar_meses_duplicados(df, produto, local, tempo):
+    #os.remove(f'timestamps_duplicadas_{tempo}.csv') if os.path.exists(f'timestamps_duplicadas_{tempo}.csv') else None
+    df_c = df.copy()
+    df_c['duplicatas'] = df_c.groupby('timestamp')['timestamp'].transform('count') - 1
+    df_c = df_c[df_c['duplicatas']>=1]
+    df_c['derivado'] = produto
+    df_c['local'] = local
+    df_c.to_csv(f'timestamps_duplicadas_{tempo}.csv', mode='a', header=False, index=False)
+def combinar_valores_unicos_colunas(df, colunas):
+    # Agrupar pelo conjunto de colunas e resetar o índice para transformar em DataFrame
+    df_unicos = df[colunas].drop_duplicates().reset_index(drop=True)
+    # Converter o DataFrame resultante em uma lista de tuplas
+    combinacoes_existentes = [tuple(x) for x in df_unicos.values]
+    return combinacoes_existentes
+def first_non_nan_value(df, column_name):
+    """
+    Find the first non-NaN value in the specified column of a DataFrame.
+    Args:
+    df (DataFrame): The pandas DataFrame.
+    column_name (str): The name of the column to search for non-NaN values.
+    Returns:
+    The first non-NaN value in the specified column, or None if no non-NaN values are found.
+    """
+    first_non_nan_index = df[column_name].first_valid_index()
+    if first_non_nan_index is not None:
+        return df[column_name].iloc[first_non_nan_index]
+    else:
+        return None
+def last_non_nan_value(df, column_name):
+    """
+    Find the last non-NaN value in the specified column of a DataFrame.
+    Args:
+    df (DataFrame): The pandas DataFrame.
+    column_name (str): The name of the column to search for non-NaN values.
+    Returns:
+    The last non-NaN value in the specified column, or None if no non-NaN values are found.
+    """
+    last_non_nan_index = df[column_name].last_valid_index()
+    if last_non_nan_index is not None:
+        return df[column_name].iloc[last_non_nan_index]
+    else:
+        return None
+def find_first_sequence(arr):
+    """
+    Find the first sequence of consecutive elements in the given array.
+    Args:
+        arr (list): The input list of integers.
+    Returns:
+        list: The list containing the first sequence of consecutive elements.
+    """
+    if not arr:
+        return []  # Return an empty list if the input array is empty
+    sequence = [arr[0]]  # Start with the first element
+    for i in range(1, len(arr)):
+        # If the current element is consecutive with the previous one, add it to the sequence
+        if arr[i] == sequence[-1] + 1:
+            sequence.append(arr[i])
+        else:
+            break  # Break the loop when the sequence breaks
+    return sequence
+def find_last_sequence(arr):
+    """
+    Find the last sequence of consecutive elements in the given array.
+    Args:
+        arr (list): The input list of integers.
+    Returns:
+        list: The list containing the last sequence of consecutive elements.
+    """
+    if not arr:
+        return []  # Return an empty list if the input array is empty
+    sequence = [arr[-1]]  # Start with the last element
+    for i in range(len(arr) - 2, -1, -1):
+        # If the current element is consecutive with the next one, add it to the sequence
+        if arr[i] == sequence[-1] - 1:
+            sequence.append(arr[i])
+        else:
+            break  # Break the loop when the sequence breaks
+    sequence.reverse()  # Reverse the sequence to have it in ascending order
+    return sequence