PyPI - eegdash - Versions diffs - 0.3.5.dev183002612__tar.gz → 0.3.6.dev97__tar.gz - Mend

eegdash 0.3.5.dev183002612tar.gz → 0.3.6.dev97tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.

This version of eegdash might be problematic. Click here for more details.

Files changed (58) hide show

{eegdash-0.3.5.dev183002612/eegdash.egg-info → eegdash-0.3.6.dev97}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: eegdash
-Version: 0.3.5.dev183002612
+Version: 0.3.6.dev97
 Summary: EEG data for machine learning
 Author-email: Young Truong <dt.young112@gmail.com>, Arnaud Delorme <adelorme@gmail.com>, Aviv Dotan <avivd220@gmail.com>, Oren Shriki <oren70@gmail.com>, Bruno Aristimunha <b.aristimunha@gmail.com>
 License-Expression: GPL-3.0-only
@@ -59,6 +59,7 @@ Requires-Dist: numpydoc; extra == "docs"
 Requires-Dist: memory_profiler; extra == "docs"
 Requires-Dist: ipython; extra == "docs"
 Requires-Dist: lightgbm; extra == "docs"
+Requires-Dist: plotly; extra == "docs"
 Provides-Extra: all
 Requires-Dist: eegdash[docs]; extra == "all"
 Requires-Dist: eegdash[dev]; extra == "all"

eegdash-0.3.6.dev97/docs/source/dataset_summary.rst ADDED Viewed

@@ -0,0 +1,201 @@
+.. meta::
+   :hide_sidebar: true
+:html_theme.sidebar_secondary.remove:
+:html_theme.sidebar_primary.remove:
+.. _data_summary:
+EEGDash
+========
+To leverage recent and ongoing advancements in large-scale computational methods and to ensure the preservation of scientific data generated from publicly funded research, the EEG-DaSh data archive will create a data-sharing resource for MEEG (EEG, MEG) data contributed by collaborators for machine learning (ML) and deep learning (DL) applications.
+The archive is currently still in :bdg-danger:`beta testing` mode, so be kind.
+.. raw:: html
+  <figure class="eegdash-figure" style="margin: 0 0 1.25rem 0;">
+.. raw:: html
+  :file: ../build/dataset_bubble.html
+.. raw:: html
+  <figcaption class="eegdash-caption">
+    Figure: Dataset landscape. Each bubble represents a dataset: x-axis shows the number of records,
+    y-axis the number of subjects, bubble area encodes on-disk size, and color indicates sampling frequency band.
+    Hover for details and use the legend to highlight groups.
+  </figcaption>
+  </figure>
+.. raw:: html
+  <figure class="eegdash-figure" style="margin: 1.0rem 0 0 0;">
+MEEG Datasets Table
+===================
+The data in EEG-DaSh originates from a collaboration involving 25 laboratories, encompassing 27,053 participants. This extensive collection includes MEEG data, which is a combination of EEG and MEG signals. The data is sourced from various studies conducted by these labs,
+involving both healthy subjects and clinical populations with conditions such as ADHD, depression, schizophrenia, dementia, autism, and psychosis. Additionally, data spans different mental states like sleep, meditation, and cognitive tasks.
+In addition, EEG-DaSh will incorporate a subset of the data converted from `NEMAR <https://nemar.org/>`__, which includes 330 MEEG BIDS-formatted datasets, further expanding the archive with well-curated, standardized neuroelectromagnetic data.
+.. raw:: html
+  :file: ../build/dataset_summary_table.html
+.. raw:: html
+  <figcaption class="eegdash-caption">
+    Table: Sortable catalogue of EEG‑DaSh datasets. Use the “Filters” button to open column filters;
+    click a column header to jump directly to a filter pane. The Total row is pinned at the bottom.
+    * means that we use the median value across multiple recordings in the dataset, and empty cells
+    when the metainformation is not extracted yet.
+  </figcaption>
+  </figure>
+.. raw:: html
+  <!-- jQuery + DataTables core -->
+  <script src="https://code.jquery.com/jquery-3.7.1.min.js"></script>
+  <link rel="stylesheet" href="https://cdn.datatables.net/v/bm/dt-1.13.4/datatables.min.css"/>
+  <script src="https://cdn.datatables.net/v/bm/dt-1.13.4/datatables.min.js"></script>
+  <!-- Buttons + SearchPanes (+ Select required by SearchPanes) -->
+  <link rel="stylesheet" href="https://cdn.datatables.net/buttons/2.4.2/css/buttons.dataTables.min.css">
+  <script src="https://cdn.datatables.net/buttons/2.4.2/js/dataTables.buttons.min.js"></script>
+  <link rel="stylesheet" href="https://cdn.datatables.net/select/1.7.0/css/select.dataTables.min.css">
+  <link rel="stylesheet" href="https://cdn.datatables.net/searchpanes/2.3.1/css/searchPanes.dataTables.min.css">
+  <script src="https://cdn.datatables.net/select/1.7.0/js/dataTables.select.min.js"></script>
+  <script src="https://cdn.datatables.net/searchpanes/2.3.1/js/dataTables.searchPanes.min.js"></script>
+  <style>
+    /* Styling for the Total row (placed in tfoot) */
+    table.sd-table tfoot td {
+      font-weight: 600;
+      border-top: 2px solid rgba(0,0,0,0.2);
+      background: #f9fafb;
+      /* Match body cell padding to keep perfect alignment */
+      padding: 8px 10px !important;
+      vertical-align: middle;
+    }
+    /* Right-align numeric-like columns (2..8) consistently for body & footer */
+    table.sd-table tbody td:nth-child(n+2),
+    table.sd-table tfoot td:nth-child(n+2) {
+      text-align: right;
+    }
+    /* Keep first column (Dataset/Total) left-aligned */
+    table.sd-table tbody td:first-child,
+    table.sd-table tfoot td:first-child {
+      text-align: left;
+    }
+  </style>
+  <script>
+  // Helper: robustly extract values for SearchPanes when needed
+  function tagsArrayFromHtml(html) {
+    if (html == null) return [];
+    // If it's numeric or plain text, just return as a single value
+    if (typeof html === 'number') return [String(html)];
+    if (typeof html === 'string' && html.indexOf('<') === -1) return [html.trim()];
+    // Else parse any .tag elements inside HTML
+    var tmp = document.createElement('div');
+    tmp.innerHTML = html;
+    var tags = Array.from(tmp.querySelectorAll('.tag')).map(function(el){
+      return (el.textContent || '').trim();
+    });
+    return tags.length ? tags : [tmp.textContent.trim()];
+  }
+  // Helper: parse human-readable sizes like "4.31 GB" into bytes (number)
+  function parseSizeToBytes(text) {
+    if (!text) return 0;
+    var s = String(text).trim();
+    var m = s.match(/([\d,.]+)\s*(TB|GB|MB|KB|B)/i);
+    if (!m) return 0;
+    var value = parseFloat(m[1].replace(/,/g, ''));
+    var unit = m[2].toUpperCase();
+    var factor = { B:1, KB:1024, MB:1024**2, GB:1024**3, TB:1024**4 }[unit] || 1;
+    return value * factor;
+  }
+  $(function () {
+    // 1) Move the "Total" row into <tfoot> so sorting/filtering never moves it
+    $('.sortable').each(function(){
+      var $t = $(this);
+      var $tbody = $t.find('tbody');
+      var $total = $tbody.find('tr').filter(function(){
+        return $(this).find('td').eq(0).text().trim() === 'Total';
+      });
+      if ($total.length) {
+        var $tfoot = $t.find('tfoot');
+        if (!$tfoot.length) $tfoot = $('<tfoot/>').appendTo($t);
+        $total.appendTo($tfoot);
+      }
+    });
+    // 2) Initialize DataTable with SearchPanes button
+    var FILTER_COLS = [1,2,3,4,5,6];
+    // Detect the index of the size column by header text
+    var sizeIdx = (function(){
+      var idx = -1;
+      $('.sortable thead th').each(function(i){
+        var t = $(this).text().trim().toLowerCase();
+        if (t === 'size on disk' || t === 'size') idx = i;
+      });
+      return idx;
+    })();
+    var table = $('.sortable').DataTable({
+      dom: 'Blfrtip',
+      paging: false,
+      searching: true,
+      info: false,
+      language: {
+        search: 'Filter dataset:',
+        searchPanes: { collapse: { 0: 'Filters', _: 'Filters (%d)' } }
+      },
+      buttons: [{
+        extend: 'searchPanes',
+        text: 'Filters',
+        config: { cascadePanes: true, viewTotal: true, layout: 'columns-4', initCollapsed: false }
+      }],
+      columnDefs: (function(){
+        var defs = [
+          { searchPanes: { show: true }, targets: FILTER_COLS }
+        ];
+        if (sizeIdx !== -1) {
+          defs.push({
+            targets: sizeIdx,
+            render: function(data, type) {
+              if (type === 'sort' || type === 'type') {
+                return parseSizeToBytes(data);
+              }
+              return data;
+            }
+          });
+        }
+        return defs;
+      })()
+    });
+    // 3) UX: click a header to open the relevant filter pane
+    $('.sortable thead th').each(function (i) {
+      if ([1,2,3,4].indexOf(i) === -1) return;
+      $(this).css('cursor','pointer').attr('title','Click to filter this column');
+      $(this).on('click', function () {
+        table.button('.buttons-searchPanes').trigger();
+        setTimeout(function () {
+          var idx = [1,2,3,4].indexOf(i);
+          var $container = $(table.searchPanes.container());
+          var $pane = $container.find('.dtsp-pane').eq(idx);
+          var $title = $pane.find('.dtsp-title');
+          if ($title.length) $title.trigger('click');
+        }, 0);
+      });
+    });
+  });
+  </script>

{eegdash-0.3.5.dev183002612 → eegdash-0.3.6.dev97}/eegdash/__init__.py RENAMED Viewed

@@ -7,4 +7,4 @@ __init__mongo_client()
 __all__ = ["EEGDash", "EEGDashDataset", "EEGChallengeDataset"]
-__version__ = "0.3.5.dev183002612"
+__version__ = "0.3.6.dev97"

{eegdash-0.3.5.dev183002612 → eegdash-0.3.6.dev97}/eegdash/api.py RENAMED Viewed

@@ -90,12 +90,16 @@ class EEGDash:
     ) -> list[Mapping[str, Any]]:
         """Find records in the MongoDB collection.
-        This method can be called in two ways:
+        This method supports four usage patterns:
         1. With a pre-built MongoDB query dictionary (positional argument):
            >>> eegdash.find({"dataset": "ds002718", "subject": {"$in": ["012", "013"]}})
         2. With user-friendly keyword arguments for simple and multi-value queries:
            >>> eegdash.find(dataset="ds002718", subject="012")
            >>> eegdash.find(dataset="ds002718", subject=["012", "013"])
+        3. With an explicit empty query to return all documents:
+           >>> eegdash.find({})  # fetches all records (use with care)
+        4. By combining a raw query with kwargs (merged via logical AND):
+           >>> eegdash.find({"dataset": "ds002718"}, subject=["012", "013"])  # yields {"$and":[{"dataset":"ds002718"}, {"subject":{"$in":["012","013"]}}]}
         Parameters
         ----------
@@ -110,26 +114,34 @@ class EEGDash:
         list:
             A list of DB records (string-keyed dictionaries) that match the query.
-        Raises
-        ------
-        ValueError
-            If both a `query` dictionary and keyword arguments are provided.
         """
-        if query is not None and kwargs:
-            raise ValueError(
-                "Provide either a positional 'query' dictionary or keyword arguments, not both."
-            )
-        final_query = {}
-        if query is not None:
-            final_query = query
-        elif kwargs:
-            final_query = self._build_query_from_kwargs(**kwargs)
+        final_query: dict[str, Any] | None = None
+        # Accept explicit empty dict {} to mean "match all"
+        raw_query = query if isinstance(query, dict) else None
+        kwargs_query = self._build_query_from_kwargs(**kwargs) if kwargs else None
+        # Determine presence, treating {} as a valid raw query
+        has_raw = isinstance(raw_query, dict)
+        has_kwargs = kwargs_query is not None
+        if has_raw and has_kwargs:
+            # Detect conflicting constraints on the same field (e.g., task specified
+            # differently in both places) and raise a clear error instead of silently
+            # producing an empty result.
+            self._raise_if_conflicting_constraints(raw_query, kwargs_query)
+            # Merge with logical AND so both constraints apply
+            if raw_query:  # non-empty dict adds constraints
+                final_query = {"$and": [raw_query, kwargs_query]}
+            else:  # {} adds nothing; use kwargs_query only
+                final_query = kwargs_query
+        elif has_raw:
+            # May be {} meaning match-all, or a non-empty dict
+            final_query = raw_query
+        elif has_kwargs:
+            final_query = kwargs_query
         else:
-            # By default, an empty query {} returns all documents.
-            # This can be dangerous, so we can either allow it or raise an error.
-            # Let's require an explicit query for safety.
+            # Avoid accidental full scans
             raise ValueError(
                 "find() requires a query dictionary or at least one keyword argument. "
                 "To find all documents, use find({})."
@@ -224,9 +236,12 @@ class EEGDash:
         return record
     def _build_query_from_kwargs(self, **kwargs) -> dict[str, Any]:
-        """Builds and validates a MongoDB query from user-friendly keyword arguments.
+        """Build and validate a MongoDB query from user-friendly keyword arguments.
-        Translates list values into MongoDB's `$in` operator.
+        Improvements:
+        - Reject None values and empty/whitespace-only strings
+        - For list/tuple/set values: strip strings, drop None/empties, deduplicate, and use `$in`
+        - Preserve scalars as exact matches
         """
         # 1. Validate that all provided keys are allowed for querying
         unknown_fields = set(kwargs.keys()) - self._ALLOWED_QUERY_FIELDS
@@ -239,19 +254,108 @@ class EEGDash:
         # 2. Construct the query dictionary
         query = {}
         for key, value in kwargs.items():
-            if isinstance(value, (list, tuple)):
-                if not value:
+            # None is not a valid constraint
+            if value is None:
+                raise ValueError(
+                    f"Received None for query parameter '{key}'. Provide a concrete value."
+                )
+            # Handle list-like values as multi-constraints
+            if isinstance(value, (list, tuple, set)):
+                cleaned: list[Any] = []
+                for item in value:
+                    if item is None:
+                        continue
+                    if isinstance(item, str):
+                        item = item.strip()
+                        if not item:
+                            continue
+                    cleaned.append(item)
+                # Deduplicate while preserving order
+                cleaned = list(dict.fromkeys(cleaned))
+                if not cleaned:
                     raise ValueError(
                         f"Received an empty list for query parameter '{key}'. This is not supported."
                     )
-                # If the value is a list, use the `$in` operator for multi-search
-                query[key] = {"$in": value}
+                query[key] = {"$in": cleaned}
             else:
-                # Otherwise, it's a direct match
+                # Scalars: trim strings and validate
+                if isinstance(value, str):
+                    value = value.strip()
+                    if not value:
+                        raise ValueError(
+                            f"Received an empty string for query parameter '{key}'."
+                        )
                 query[key] = value
         return query
+    # --- Query merging and conflict detection helpers ---
+    def _extract_simple_constraint(self, query: dict[str, Any], key: str):
+        """Extract a simple constraint for a given key from a query dict.
+        Supports only top-level equality (key: value) and $in (key: {"$in": [...]})
+        constraints. Returns a tuple (kind, value) where kind is "eq" or "in". If the
+        key is not present or uses other operators, returns None.
+        """
+        if not isinstance(query, dict) or key not in query:
+            return None
+        val = query[key]
+        if isinstance(val, dict):
+            if "$in" in val and isinstance(val["$in"], (list, tuple)):
+                return ("in", list(val["$in"]))
+            return None  # unsupported operator shape for conflict checking
+        else:
+            return ("eq", val)
+    def _raise_if_conflicting_constraints(
+        self, raw_query: dict[str, Any], kwargs_query: dict[str, Any]
+    ) -> None:
+        """Raise ValueError if both query sources define incompatible constraints.
+        We conservatively check only top-level fields with simple equality or $in
+        constraints. If a field appears in both queries and constraints are mutually
+        exclusive, raise an explicit error to avoid silent empty result sets.
+        """
+        if not raw_query or not kwargs_query:
+            return
+        # Only consider fields we generally allow; skip meta operators like $and
+        raw_keys = set(raw_query.keys()) & self._ALLOWED_QUERY_FIELDS
+        kw_keys = set(kwargs_query.keys()) & self._ALLOWED_QUERY_FIELDS
+        dup_keys = raw_keys & kw_keys
+        for key in dup_keys:
+            rc = self._extract_simple_constraint(raw_query, key)
+            kc = self._extract_simple_constraint(kwargs_query, key)
+            if rc is None or kc is None:
+                # If either side is non-simple, skip conflict detection for this key
+                continue
+            r_kind, r_val = rc
+            k_kind, k_val = kc
+            # Normalize to sets when appropriate for simpler checks
+            if r_kind == "eq" and k_kind == "eq":
+                if r_val != k_val:
+                    raise ValueError(
+                        f"Conflicting constraints for '{key}': query={r_val!r} vs kwargs={k_val!r}"
+                    )
+            elif r_kind == "in" and k_kind == "eq":
+                if k_val not in r_val:
+                    raise ValueError(
+                        f"Conflicting constraints for '{key}': query in {r_val!r} vs kwargs={k_val!r}"
+                    )
+            elif r_kind == "eq" and k_kind == "in":
+                if r_val not in k_val:
+                    raise ValueError(
+                        f"Conflicting constraints for '{key}': query={r_val!r} vs kwargs in {k_val!r}"
+                    )
+            elif r_kind == "in" and k_kind == "in":
+                if len(set(r_val).intersection(k_val)) == 0:
+                    raise ValueError(
+                        f"Conflicting constraints for '{key}': disjoint sets {r_val!r} and {k_val!r}"
+                    )
     def load_eeg_data_from_s3(self, s3path: str) -> xr.DataArray:
         """Load an EEGLAB .set file from an AWS S3 URI and return it as an xarray DataArray.
@@ -676,10 +780,8 @@ class EEGDashDataset(BaseConcatDataset):
             # If list is provided, let _build_query_from_kwargs turn it into $in later.
             query_kwargs.setdefault("dataset", dataset)
-        if query and query_kwargs:
-            raise ValueError(
-                "Provide either a 'query' dictionary or keyword arguments for filtering, not both."
-            )
+        # Allow mixing raw DB query with additional keyword filters. Both will be
+        # merged by EEGDash.find() (logical AND), so we do not raise here.
         try:
             if records is not None:
@@ -723,7 +825,7 @@ class EEGDashDataset(BaseConcatDataset):
                                 **base_dataset_kwargs,
                             )
                         )
-            elif query or query_kwargs:
+            elif query is not None or query_kwargs:
                 # This is the DB query path that we are improving
                 datasets = self.find_datasets(
                     query=query,
@@ -786,6 +888,10 @@ class EEGDashDataset(BaseConcatDataset):
         """
         datasets: list[EEGDashBaseDataset] = []
+        # Build records using either a raw query OR keyword filters, but not both.
+        # Note: callers may accidentally pass an empty dict for `query` along with
+        # kwargs. In that case, treat it as if no query was provided and rely on kwargs.
+        # Always delegate merging of raw query + kwargs to EEGDash.find
         self.records = self.eeg_dash.find(query, **query_kwargs)
         for record in self.records:

{eegdash-0.3.5.dev183002612 → eegdash-0.3.6.dev97}/eegdash/dataset.py RENAMED Viewed

@@ -321,7 +321,9 @@ class EEGChallengeDataset(EEGDashDataset):
             )
         if self.mini:
-            if query and "subject" in query:
+            # Disallow mixing subject selection with mini=True since mini already
+            # applies a predefined subject subset.
+            if (query and "subject" in query) or ("subject" in kwargs):
                 raise ValueError(
                     "Query using the parameters `subject` with the class EEGChallengeDataset and `mini==True` is not possible."
                     "Please don't use the `subject` selection twice."

eegdash 0.3.5.dev183002612__tar.gz → 0.3.6.dev97__tar.gz

Potentially problematic release.

eegdash 0.3.5.dev183002612tar.gz → 0.3.6.dev97tar.gz