PyPI - ltfmselector - Versions diffs - 0.1.13__tar.gz → 0.2.1__tar.gz - Mend

ltfmselector 0.1.13tar.gz → 0.2.1tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (51) hide show

{ltfmselector-0.1.13 → ltfmselector-0.2.1}/.gitignore RENAMED Viewed

@@ -17,4 +17,9 @@ runs*
 # Dev
 predictionModels.py
-stdutils
+stdutils
+examples/train.py
+RBRHX_ModalScores.xlsx
+# LaTeX
+*converted-to.pdf

{ltfmselector-0.1.13 → ltfmselector-0.2.1}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: ltfmselector
-Version: 0.1.13
+Version: 0.2.1
 Summary: Locally-Tailored Feature and Model Selector with Deep Q-Learning
 Project-URL: GitHub, https://github.com/RenZhen95/ltfmselector/
 Author-email: RenZhen95 <j-liaw@hotmail.com>

ltfmselector-0.2.1/doc/00Introduction.tex ADDED Viewed

@@ -0,0 +1,14 @@
+Poststroke gait rehabilitation requires a personalized therapy, usually designed by an interdisciplinary medical team via time-consuming assessments \cite{raab2020,liaw2025}. An automated gait assessment tool based on gait measurements and interdisciplinary knowledge could allow for faster poststroke evaluation, while providing relevant feedback via objective analysis of a patient’s status. One major challenge of using gait data for this purpose is its high dimensionality, which is usually met by carrying out feature selection on a fixed feature set. Owing to the individual uniqueness of each patient in terms of physical and functional statuses \cite{lee2020}, we present a dynamic feature and model selection approach using reinforcement learning (RL).
+\cite{lee2020} has for instance shown that when performing the ``Bring a Hand to Mouth''-exercise during stroke rehabilitation, different stroke patients compensate for the affected motion in different ways. Other than inter-patient variability, the relevant biomarkers of a patient have also been shown to change with the severity of one's condition. \cite{pistacchi2017} showed for instance how reduced step lengths appeared to be a specific feature of Parkinson's disease in its early stages, and as the disease progresses to its moderate stage, gait asymmetry, double-limb support, and increased cadence becomes more characteristic, followed by freezing of gait and reduced balance in its advanced stages. Notably, research \cite{huang2016,biase2020} have highlighted the necessity of adapting the analyzed gait parameters to evolve in tandem with the disease's condition.
+Therefore, it is hypothesized that a CDSS which dynamically selects salient features per individual patient should be more beneficial over classically selecting a fixed subset of informative features \cite{lee2020,lee2021}. It is after all the therapist's goal to design a therapy plan, \emph{tailored to a specific patient}. Moreover, simply presenting a multitude of variables can easily overwhelm a therapist and hinder one from obtaining useful insights \cite{lee2021}. A CDSS that can automatically identify a subset of patient-specific relevant features should thus greatly help a therapist save precious time \cite{lee2021}, especially in light of the current shortage of medical staff \cite{healthcareburden}.
+Inspired by the application of RL in controlling mechanical systems (i.e. balancing an inversed pendulum), the Deep $Q$-Learning (DQL) algorithm is applied in this work to select an optimal feature subset of the extracted gait features, coupled along with a corresponding prediction (supervised learning) model to automatically assess gait poststroke. Earlier works by \cite{lee2020,lee2021} have applied DQL to develop a CDSS that automatically assesses a patient's ability in performing functional exercises, while delivering patient-specific relevant features for each corresponding task. In contrast, the method developed here is $(i)$ applied to the context of gait-assessment poststroke and $(ii)$ extended to include model selection. Model selection in this work will cover both the selection of a \emph{learning algorithm} and the subsequent \emph{hyperparameter tuning}.
+The issue of model selection is often described as more an art than a strict science \cite{raschka2020}, a notion underpinned by the ``No Free Lunch'' theorem by \cite{nofreelunch}, which proves that no single optimization algorithm can outperform all others across all possible problem spaces or datasets. Consequently, the selection of a model should be guided by the characteristics of the dataset, instead of a reliance on a universal ``best'' model. The common practice among practitioners is to employ hold out or/and cross validation techniques to evaluate and select the optimal model hyperparameters and learning algorithm \cite{nestedcv,raschka2020}. In this work, the ability of model-free RL to learn an optimal policy purely, based on the saved experiences of the agent-environment interaction is leveraged to dynamically select
+\begin{enumerate}
+  \item a subset of features, alongside
+  \item a learning algorithm and its corresponding optimal hyperparameters,
+\end{enumerate}
+based on each individal patient's extracted gait features. The learning algorithm and its corresponding optimal hyperparameters will be referred to as the \emph{prediction model} (PM) in the remainder of this paper.

ltfmselector-0.2.1/doc/01ReinforcementLearning.tex ADDED Viewed

@@ -0,0 +1,37 @@
+RL is a branch of machine learning that deals with learning control laws and policies to interact with an environment, from experience \cite{brunton2022}. In contrast to supervised learning, RL does not require a training dataset to learn, but rather an environment to interact with and via trial-and-error, learn the best course of action to achieve a long-term objective \cite{mnih2015,sutton2018}. The learning is guided via feedback, or formally \emph{reinforcement}, a concept biologically inspired from the study of animal psychology, where animals have been observed to be ``hardwired'' to recognize pain and hunger as negative rewards, and food intake as positive rewards, and thus, mold their behavior accordingly to best maximize (positive) rewards \cite{pavlov2010,brunton2022}. As shown in Figure \ref{fig:RLSchematic}, an agent senses the \emph{state} of its \emph{environment}, and learns to take \emph{actions} that maximizes cumulative future rewards. Specifically, the agent arrives at a sequence of different states $\nvec{s}$ by performing actions $a$, which either lead to positive or negative rewards $R$ used for learning \cite{brunton2022}.
+\begin{figure}[h!]
+  \centering
+  \begin{overpic}[width=1.0\columnwidth]{ReinforcementLearning.eps}
+    % Nouns
+    \put(66, 33){\emph{ENVIRONMENT}}
+    \put(8, 26){\emph{AGENT}}
+    \put(1.3, 17){\emph{STATE}, $\nvec{s}$}
+    \put(38, 27){\emph{POLICY}}
+    \put(39, 25){$\policy$}
+    \put(35, 38){\emph{REWARD}, $R$}
+    % Verbs
+    \put(40, 1){Observe \emph{STATE}, $\nvec{s}$}
+    \put(51.5, 23){Perform}
+    \put(51.5, 20){\emph{ACTION}, $a$}
+    % Variables
+    \put(19, 24){$\dot{x}$}
+    \put(19, 20){$x$}
+    \put(19, 16){$\varphi$}
+    \put(19, 12){$\dot{\varphi}$}
+    % Environment
+    \put(96, 9){$x$}
+    \put(71, 30){$y$}
+    \put(77, 4){$x$}
+    \put(89.8, 4.5){$\dot{x}$}
+    \put(76, 11){$F$}
+    \put(44, 19.5){$+F$}
+    \put(44, 14){$-F$}
+    \put(90, 28){$\varphi$}
+    \put(95, 21){$\dot{\varphi}$}
+    \put(72, 21){$\nvec{s}=\begin{bmatrix}x \\ \dot{x} \\ \varphi \\ \dot{\varphi}\end{bmatrix}$}
+    % Parameters
+  \end{overpic}
+  \caption{Schematic of RL, where an agent senses its environmental state $\nvec{s}$ and performs an action $a$, according to a policy $\policy$ that is optimized through learning to maximize cumulative future rewards $R$. In recent works, a typical approach to represent the policy $\policy$ is to use a deep neural network. Such a policy is known as a \emph{deep policy network}. Figure adapted from \cite{brunton2022}.}
+  \label{fig:RLSchematic}
+\end{figure}

ltfmselector-0.2.1/doc/02MDP.tex ADDED Viewed

@@ -0,0 +1,58 @@
+The \emph{environment} is represented by the state $\nvec{s}_t$ at the current time-step $t$. The agent then performs an \emph{action} $a_t$ according to a learned \emph{policy} $\policy$, which results in the current state $\nvec{s}_t$ evolving to the \emph{next state} $\nvec{s}_{t+1}$, and the agent receiving an appropriate \emph{reward} $R_{t+1} \in \mathbb{R}$ one time-step later \cite{sutton2018,brunton2022}. These collectively form an \emph{experience} $\boldsymbol{e}_t$, which describes the knowledge an agent has amassed from interacting with the environment, usually expressed for a given time-step $t$ as a tuple $\boldsymbol{e}_t = \left( \nvec{s}_t \, , a_t \, , \nvec{s}_{t+1} \, , R_{t+1} \right)$ \cite{almahamid2021}. The agent-environment interaction thereby yields a \emph{trajectory}, as shown in (\ref{eq:Trajectory}) \cite{sutton2018}
+\begin{equation}
+    \nvec{s}_0 \,, a_0 \,, R_1 \,, \nvec{s}_1 \,, a_1 \,, R_2 \,, \nvec{s}_2 \,, a_2 \,, R_3 \,, \cdots \,,
+    \label{eq:Trajectory}
+\end{equation}
+or in terms of experiences \cite{brunton2022}
+\begin{equation}
+    \boldsymbol{e}_0 \,, \boldsymbol{e}_1 \,, \boldsymbol{e}_2 \,, \cdots \,.
+    \label{eq:TrajectoryExperiences}
+\end{equation}
+Formally described, the environment evolves according to a \emph{Markov decision process} (MDP), where the random variables $R_t \in \mathcal{R}$ and $\nvec{s}_{t} \in \mathcal{S}$ each have defined discrete probability distributions, that depend only on the preceding state and action, and not other previous states or hidden variables \cite{sutton2018,brunton2022}. Specifically, the probability of particular values of these random variables $r \in \mathcal{R} \in \mathbb{R}$ and $\nvec{s}' \in \mathcal{S}$ at the next time-step $t+1$, given the state $\nvec{s}$ and action $a$ at the current time-step $t$, is given as
+\begin{equation}
+    P(\nvec{s}', r \,|\, \nvec{s}, a) = \mathrm{Pr}\left\{ \nvec{s}_{t+1} = \nvec{s}' , R_{t+1} = r \,|\, \nvec{s}_t = s, a_t = a \right\} \,.
+    \label{eq:MDPDynamics}
+\end{equation}
+The expected reward given a state-action pair can in turn be computed with (\ref{eq:RewardFunction})
+\begin{equation}
+    r(\nvec{s}, a) = \mathbb{E} \left[ R_{t+1} \,|\, \nvec{s}_{t}=\nvec{s}, a_t = a \right] = \sum_{r \in \mathcal{R}} r \sum_{\nvec{s}' \in \mathcal{S}} P(\nvec{s}', r \,|\, \nvec{s}, a) \,.
+    \label{eq:RewardFunction}
+\end{equation}
+Depending on the application, ``time-steps'' does not necessarily have to be fixed intervals of real time, but may include successive stages of decision making.
+In short, the goal of RL is to maximize the sum of rewards in the long run \cite{sutton2018}. Formally, this can be described as maximizing the expected discounted \emph{return} as defined in (\ref{eq:Return})
+\begin{equation}
+  G_{t} = R_{t+1} + \gamma R_{t+2} + \gamma^{2} R_{t+3} + \cdots = \sum_{k=0}^{\infty} \gamma^{k} R_{t+1+k} \,,
+  \label{eq:Return}
+\end{equation}
+where $\gamma$ denotes the \emph{discount rate} \cite{sutton2018,brunton2022}. For the agent to learn a policy that maximizes the return $G$, it is necessary for the agent to be able to gauge the ``desirability'' of being in a given state $\nvec{s}_t$, which can be quantified via the \emph{value function} (\ref{eq:ValueFunction})
+\begin{equation}
+    \begin{split}
+      V_{\policy} (\nvec{s}) &= \mathbb{E}_{\policy} \left[ G_{t} \left. \, \right\rvert \, \nvec{s}_t = \nvec{s} \right] \\
+                            &= \mathbb{E}_{\policy} \left[ R_{t+1} + \gamma G_{t+1} \left. \, \right\rvert \, \nvec{s}_t = \nvec{s} \right] \\
+                             &= \sum_{a} \policy(a \,|\, \nvec{s}) \sum_{\nvec{s}'} \sum_{r} P(\nvec{s}', r \,|\, \nvec{s}, a) \left[ r + \gamma\mathbb{E}_{\policy} \left[ G_{t+1}|\nvec{s}_{t+1}=\nvec{s}' \right] \right] \\
+                             &= \sum_{a} \policy(a \,|\, \nvec{s}) \sum_{\nvec{s}', r} P(\nvec{s}', r \,|\, \nvec{s}, a) \left[ r + \gamma V_{\policy}(\nvec{s}') \right] \,,
+                               \text{ for all } \nvec{s} \in \mathcal{S} \,.
+    \end{split}
+    \label{eq:ValueFunction}
+\end{equation}
+(\ref{eq:ValueFunction}) describes the expected discounted return when starting from $\nvec{s}$, and following $\policy$ thereafter \cite{sutton2018}. The optimal policy can thus be rewritten in terms of the value function as
+\begin{equation}
+    \begin{split}
+        V_{\optpolicy} (\nvec{s}) &= \max_{a} \mathbb{E} \left[R_{t+1} + \gamma V_{\optpolicy}(\nvec{s}') \, \rvert \, \nvec{s}_{t}=\nvec{s} \,, a_{t}=a \right] \\
+                                  &= \max_{a} \sum_{\nvec{s}', r} P(\nvec{s}', r \,|\, \nvec{s}, a) \left[ r + \gamma V_{\optpolicy}(\nvec{s}') \right] \,,
+    \end{split}
+    \label{eq:BellmanEq}
+\end{equation}
+Equation (\ref{eq:BellmanEq}), also known as the \emph{Bellman equation} can be broken down and recursively written for every subsequence of steps. This property is crucial because it implies that a control policy for a multi-step procedure must also be locally optimal for every subsequence of steps, thus allowing for a large optimization problem to be solved by recursively dividing and locally optimizing entire sequence \cite{bellman1966,sutton2018,brunton2022}
+The discount factor $\gamma$ helps guide the agent towards learning a behavior that balances the trade-off between immediate gratification and long-term strategic gains \cite{sutton2018}. This is crucial in helping the agent deal with problem-domains, where the optimal solutions involve a multi-step procedure \cite{brunton2022}. Consider the example of chess, where the ultimate goal is to checkmate the opponent at a later point in time. To achieve this, the agent may accept sacrifices, even if this results temporarily in an unfavorable immediate position \cite{huegle2022}.
+Classically, the value function $V$ is computed iteratively and used to search for better policies via methods of dynamic programming, such as policy and value iteration \cite{sutton2018}. Classical dynamic programming is however of limited utility for two main reasons \cite{sutton2018,brunton2022}, namely
+\begin{enumerate}
+    \item the assumption of a perfect model, i.e. \emph{a priori} knowledge of the environmental transition dynamics $P(\nvec{s}, r \,|\, \nvec{s}, a)$ and
+    \item memory and computational constraints, for handling large and combinatorial state spaces.
+\end{enumerate}
+In most modern applications, the environmental transition dynamics $P(\nvec{s}, r \,|\, \nvec{s}, a)$ is not known beforehand. Moreover, not only are the state spaces too enormous to be stored in tables, but they also require too much time and data to be filled accurately \cite{sutton2018,brunton2022}. One approach in dealing with this issue is to apply a function approximator to model the value function based on gathered experiences, such as a deep learning neural network \cite{sutton2018,mnih2015}.
+There exist various RL algorithms in the literature \cite{almahamid2021}, each suited for different environment types. The choice of RL algorithm depends namely on the $(i)$ number of states and $(ii)$ action types. For this work involving an environment that comprises an $(i)$ \emph{unlimited} number of states, $(ii)$ an agent that performs \emph{discrete} actions, and $(iii)$ no a priori knowledge of the environment dynamics, the Deep Q-Learning algorithm by \cite{mnih2015} is best suited. Moving forward, the equations in the next sections will only be formulated deterministically (i.e. $P(\nvec{s}, r \,|\, \nvec{s}, a)=1$).

ltfmselector-0.2.1/doc/03DQL.tex ADDED Viewed

@@ -0,0 +1,89 @@
+To begin, the \emph{quality function}, also referred to as the \emph{action-value} function \cite{watkins1992,mnih2015,sutton2018,almahamid2021} is defined as
+\begin{equation}
+    \begin{split}
+    Q_{\policy} (\nvec{s}, a) = R(\nvec{s}, a) + \gamma V_{\policy}(\nvec{s}')\,,
+    \end{split}
+    \label{eq:QualityFunction}
+\end{equation}
+which describes the \emph{joint-desirability} of performing the action $a$ for the given state $\nvec{s}$. Following this formulation, the agent selects the action $a$ that yields the maximum $Q$-value, for the given state $\nvec{s}$ as shown in (\ref{eq:QLearningAction})
+\begin{equation}
+    \policy (\nvec{s}) = \argmax_{a} Q_{\policy} (\nvec{s}, a) \, .
+    \label{eq:QLearningAction}
+\end{equation}
+The goal in \emph{Q-Learning} \cite{watkins1992,almahamid2021} is for the agent to learn an optimal policy that maximizes the action-value function
+\begin{equation}
+  \begin{aligned}
+    Q_{\optpolicy} (\nvec{s}, a) = \max_{\policy} Q_{\policy} (\nvec{s}, a) &= R(\nvec{s}, a) & &+ \, \gamma \max_{a'} Q_{\optpolicy} (\nvec{s}', a') \\
+    &= r & &+ \, \gamma \max_{a'} Q_{\optpolicy} (\nvec{s}', a') \,,
+  \end{aligned}
+\end{equation}
+which yields the following intuition. If the optimal value $Q_{\optpolicy} (\nvec{s}', a')$ for the state $\nvec{s}'$ at the next time-step is known for all possible actions $a'$, then the optimal strategy is to simply select the action $a'$ that maximizes the value of $r + \gamma Q_{\optpolicy} (\nvec{s}', a')$ \cite{mnih2015}.
+In the original $Q$-Learning, the optimal action-value function is obtained via a value iteration algorithm, where the $Q$-values are essentially maintained in a $Q$-Table and updated iteratively \cite{watkins1992,almahamid2021}. In the seminal work by \cite{mnih2015}, Deep $Q$-Learning (DQL) was introduced, where a deep convolutional neural network was used to approximate the action-value $Q_{\policy}$ through some parameterization $\boldsymbol{\theta}$.
+\begin{equation}
+    Q_{\policy} (\nvec{s}, a) \approx Q_{\policy} (\nvec{s}, a; \boldsymbol{\theta}) \,.
+\end{equation}
+The neural network function approximator of the parameters $\boldsymbol{\theta}$ is referred to as the \emph{Q-network}, where $\boldsymbol{\theta}$ updated by minimizing the loss function (\ref{eq:DQNLossFunction})
+\begin{equation}
+    \min_{\boldsymbol{\theta}} \dfrac{1}{|\mathcal{B}|} \sum_{\boldsymbol{e} \in \mathcal{B}} \left[ \left( r + \gamma \max_{a'} Q_{\policy} (\nvec{s}', a'; \boldsymbol{\theta}) \right) - Q_{\policy} (\nvec{s}, a; \boldsymbol{\theta}) \right]^2 \, ,
+    \label{eq:DQNLossFunction}
+\end{equation}
+over a batch of samples $\mathcal{B}$ \cite{mnih2015}, where each sample pertains to a sampled experience $\boldsymbol{e} = \left( \nvec{s} \, , a \, , \nvec{s}' \, , r \right)$. The term on the right is referred to as the \emph{target values} $\left( r + \gamma \max_{a'} Q_{\policy} (\nvec{s}', a'; \boldsymbol{\theta}) \right)$. To deal with the well-known instability of using deep neural networks in RL, \cite{mnih2015} introduces two key ideas, namely $(i)$ updating the neural network over \emph{randomly sampled experiences} of the agent-environment interactions and $(ii)$ only \emph{periodically updating} the neural network towards the target values. The first idea involves storing the agent's experiences $\boldsymbol{e}_t$ at each time-step $t$ into a \emph{replay memory} $D_t = \left\{ \boldsymbol{e}_1\,, \boldsymbol{e}_2\,, \cdots\,, \boldsymbol{e}_t \right\}$, from which a batch of experiences are then randomly sampled $\mathcal{B}_{D} \subseteq D$ to update the $Q$-network \cite{mnih2015}. This helps break the correlations between each consecutive experience, thus preventing undesired feedback loops during learning \cite{mnih2015}.
+The second idea is implemented by using a clone of the $Q$-network, termed the \emph{target network} $\hat{Q}_{\policy}$ to generate the target values, whose parameters $\boldsymbol{\theta}^{-}$ follow the parameters $\boldsymbol{\theta}$ of the $Q$-network with a slight delay, which helps the learning better converge \cite{mnih2015}. Following the suggestion by \cite{lillicrap2015}, the parameters $\boldsymbol{\theta}^{-}$ are updated \emph{softly} according to (\ref{eq:TargetUpdates}).
+\begin{equation}
+  \boldsymbol{\theta}^{-} = \tau \boldsymbol{\theta} + \left(1 - \tau \right) \boldsymbol{\theta}^{-} \,,
+  \label{eq:TargetUpdates}
+\end{equation}
+where $\tau$ denotes the \emph{soft target update rate}. The $Q$-network's weights are thereby updated according to
+\begin{equation}
+    \min_{\boldsymbol{\theta}} \dfrac{1}{|\mathcal{B}_{D}|} \sum_{\boldsymbol{e} \in \mathcal{B}_D} \left[ \left( r + \gamma \max_{a'} \hat{Q}_{\policy} (\nvec{s}', a'; \boldsymbol{\theta}^{-}) \right) - Q_{\policy} (\nvec{s}, a; \boldsymbol{\theta}) \right]^2 \, .
+    \label{eq:DQNLossFunction2}
+\end{equation}
+To promote exploration, the agent's action is selected according to an \emph{$\epsilon$-greedy} algorithm, where the parameter $\epsilon$ denotes the probability of the agent performing a random action instead of the maximizing action according to (\ref{eq:QLearningAction}) \cite{mnih2015,brunton2022}. Intuitively, as the $Q$-function improves over the course of training, $\epsilon$ should gradually decay, allowing the agent to increasingly choose the maximizing action \cite{brunton2022}. For this work, $\epsilon$ is implemented to decay exponentially according to (\ref{eq:EpsilonExpDecay})
+\begin{equation}
+  \epsilon = \left( {\epsilon}_{\text{initial}} - {\epsilon}_{\text{final}} \right) e^{-\frac{t_c}{{\epsilon}_{\text{decay}}}} + {\epsilon}_{\text{final}} \, ,
+  \label{eq:EpsilonExpDecay}
+\end{equation}
+where $t_c$ denotes the cumulative time-steps over episodes, and ${\epsilon}_{\text{initial}}$, ${\epsilon}_{\text{final}}$, and ${\epsilon}_{\text{decay}}$, the initial value, final value, and decay rate of $\epsilon$, respectively.
+Algorithm \ref{alg:DQN} shows how DQL is implemented in this work, combined with experience replay and an $\epsilon$-greedy algorithm for selecting actions. The $Q$-network is updated at every time-step $t$, provided the memory $D$ contains at least the number of user-specified batch size for training $|\mathcal{B}_D|$. Moreover, the memory $D$ is implemented in practice as a finite-sized cache which stores only the $N$ most recent experiences, discarding the oldest samples as new ones are added \cite{mnih2015,lillicrap2015}.
+\begin{algorithm}[!t]
+    \caption{DQL with experience replay, combined with an $\epsilon$-greedy algorithm for promoting random exploration \cite{mnih2015}. The notations $\nvec{s}_{k,t}$, $a_{k,t}$, $\nvec{s}_{k,t+1}$, $r_{k,t}$, and $y_{k,t}$ denote the state, action, next state, reward, and target values of the $k$-th episode, at time-step $t$ respectively.}
+    \label{alg:DQN}
+    \begin{algorithmic}
+      \State Initialize number of episodes $K$
+      \State Initialize replay memory $D$ with capacity $N$
+      \State Initialize discount rate $\gamma$
+      \State Initialize batch size $|\mathcal{B}_{D}|$ for updating parameters $\boldsymbol{\theta}$
+      \State Initialize $Q$-network $Q_{\policy}$ with random weights $\boldsymbol{\theta}$
+      \State Initialize target network $\hat{Q}_{\policy}$ with weights $\boldsymbol{\theta}^{-} = \boldsymbol{\theta}$
+      \State Initialize soft target update rate $\tau$
+      \State Initialize $\epsilon$ with parameters ${\epsilon}_{\text{initial}}$, ${\epsilon}_{\text{final}}$, and ${\epsilon}_{\text{decay}}$ for random exploration
+      \State Initialize counter for cumulative time-steps over episodes $t_c = 1$
+      \For{$k := 1$ to $K$} \Comment for each $k$-th episode
+          \State Initialize time-step $t=1$
+          \State Initialize initital state $\nvec{s}_{k,t=1}$
+          \While{$T_{\text{end}}$ is false} \Comment termination condition for $k$-th episode not fulfilled
+              \State With probability of $\epsilon$ select a random action $a_{k,t}$,
+              \State $\quad$ otherwise $a_{k,t} = \max_{a_{k,t}} Q(\nvec{s}_{k,t}, a_{k,t})$
+              \State Execute action $a_{k,t}$, and observe reward $r_{k,t}$ and next state $\nvec{s}_{k,t+1}$
+              \State Store episode $\boldsymbol{e}_{k,t} = \left( \nvec{s}_{k,t} \, , a_{k,t} \, , \nvec{s}_{k,t+1} \, , r_{k,t} \right)$ in replay memory $D$
+              \If{$|D| \geq |\mathcal{B}_{D}|$} \Comment if number of stored experiences are at least batch size
+                  \State Sample minibatch of random episodes $\boldsymbol{e}_{j} = \left( \nvec{s}_{j} \, , a_{j} \, , \nvec{s}_{j} \, , r_{j} \right)$ from $D$
+                  \If{$T_{end}$ is true} \Comment termination condition for $k$-th episode fulfilled
+                      \State $y_{j} = r_{j}$
+                  \Else
+                      \State $y_{j} = r_{j} + \gamma \max_{a_{j}'} \hat{Q}_{\policy} (\nvec{s}_{j}', a_{j}'; \boldsymbol{\theta}^{-})$
+                  \EndIf
+                  \State Perform a gradient descent step on $\left( y_{j} - Q_{\policy} (\nvec{s}_j, a_j; \boldsymbol{\theta}) \right)^2$
+                  \State $\quad$ with respect to $Q$-network parameters $\boldsymbol{\theta}$
+              \EndIf
+              \State Update parameters of target network $\boldsymbol{\theta}^{-}$ towards $\boldsymbol{\theta}$ according to (\ref{eq:TargetUpdates})
+              \State Update $\epsilon$ according to (\ref{eq:EpsilonExpDecay})
+              \State Update time-step counter $t = t + 1$
+              \State Update cumulative time-step counter $t_c = t_c + 1$
+          \EndWhile
+      \EndFor
+    \end{algorithmic}
+\end{algorithm}

ltfmselector-0.2.1/doc/04ExampleDQL.tex ADDED Viewed

@@ -0,0 +1,112 @@
+Consider the classical example of balancing an inversed pendulum on a cart by applying a series of forces to the cart. The \emph{environment} at a given time-step $t$ is represented by the state $\nvec{s}_t = \begin{smallmatrix} \begin{bmatrix} x_{t} & \dot{x}_{t} & \varphi_{t} & \dot{\varphi}_{t} \end{bmatrix}^T \end{smallmatrix}$, where the variables $x_{t}$, $\dot{x}_{t}$, $\varphi_{t}$, and $\dot{\varphi}_{t}$ denote the cart's position, and cart's velocity in the $x$-direction, the pendulum's angle with respect to the vertical, and the pendulum's angular velocity, at the time-step $t$, respectively. The available choices of action $a \in \mathcal{A} = \left\{ -F \ +F \right\}$ are applying a constant force $F$ to the cart in either the right or left direction. According to the performed action $a_{t} = {\policy}(\nvec{s}_t)$, the environmental state evolves to the next state $\nvec{s}_{t+1}$, as governed by the dynamical equations of the cart and pendulum as shown in (\ref{eq:AngularAccCartPole}) and (\ref{eq:AccCartPole}). Frictional effects are neglected for the sake of simplicity.
+\begin{equation}
+    \begin{split}
+    \ddot{\varphi}_{t} &= \frac{a_t \cos{{\varphi}_t} + m_p \dot{{\varphi}_t}^2 \ell \sin{{\varphi}_t} \cos{{\varphi}_t} - (m_c + m_p)g\sin{{\varphi}_t}}
+                         { \dfrac{4}{3} \ell (m_c + m_p) - m_p \ell \cos{{\varphi}_t}^2} \,, \\
+    a_t &=
+    \begin{cases}
+      +F & \text{force F applied to the right of the cart} \\
+      -F & \text{force F applied to the left of the cart}
+    \end{cases}
+    \end{split}
+    \label{eq:AngularAccCartPole}
+\end{equation}
+\begin{equation}
+    \ddot{x}_{t} = \frac{1}{\cos{{\varphi}_t}} \left[ \dfrac{4}{3} \ell \ddot{{\varphi}_t} + g\sin{{\varphi}_t} \right]
+    \label{eq:AccCartPole}
+\end{equation}
+The mass of the cart is denoted by $m_c$, and the pendulum modelled by a massless rod of length $\ell$, with a point mass $m_p$ fixed on one end as shown Figure \ref{fig:RLSchematic}, while the other end is in turn attached to the cart by a revolute joint. Applying for instance, the Euler method over a step size $\Delta t$, the next state $\nvec{s}_{t+1}$ is obtained as shown in (\ref{eq:EulerIntegrationCartPole})
+\begin{equation}
+    \nvec{s}_{t+1} =
+    \begin{bmatrix}
+        x_{t+1} \\ \dot{x}_{t+1} \\ \varphi_{t+1} \\ \dot{\varphi}_{t+1}
+    \end{bmatrix} =
+    \begin{bmatrix}
+        x_{t} + \dot{x}_{t} \cdot \Delta t\\
+        \dot{x}_{t} + \ddot{x}_{t} \cdot \Delta t\\
+        \varphi_{t} + \dot{\varphi}_{t} \cdot \Delta t\\
+        \dot{\varphi}_{t} + \ddot{\varphi}_{t} \cdot \Delta t
+    \end{bmatrix} \,.
+    \label{eq:EulerIntegrationCartPole}
+\end{equation}
+The reward function is defined as shown in (\ref{eq:RewardFunctionCartPole})
+\begin{equation}
+  R(\nvec{s}_{t+1}) =
+  \begin{cases}
+    +1 & \text{if } |{\varphi}_{t+1}| < \varphi^* \\
+    0 & \text{if } |{\varphi}_{t+1}| \geq \varphi^*
+  \end{cases} \,,
+  \label{eq:RewardFunctionCartPole}
+\end{equation}
+and the termination condition in (\ref{eq:TerminationConditionCartPole})
+\begin{equation}
+  T_{\text{end}} =
+  \left\{
+  \begin{array}{rll}
+    \text{true} & \text{if } |{\varphi}_{t+1}| \geq \varphi^* & \text{(pendulum falls over)} \\
+    \text{true} & \text{if } |{x}_{t+1}| \geq x^* & \text{(positional limit of cart due to physical constraints)} \\
+    \text{true} & \text{if } t = 500 & \text{(end simulation due to time-constraint)} \\
+    \text{false} & \text{otherwise} & \text{(pendulum kept balanced)}
+  \end{array}
+  \right. \,,
+  \label{eq:TerminationConditionCartPole}
+\end{equation}
+where $\varphi^*$ and $x^*$ denote thresholds for the angle of the pendulum with respect to the vertical and position of the cart in the $x$-direction, respectively. The DQL algorithm has since been adapted to develop other algorithms such as the Deep Deterministic Policy Gradient algorithm by \cite{lillicrap2015}, which allows for continuous (real-valued) and high-dimensional action spaces.
+To implement this example, the environmental parameters, as well as the DQL agent hyperparameters are initialized with values as shown in Table \ref{tab:CartPoleParameters}. The policy and target networks were implemented as multilayer perceptron (MLP) with two hidden layers, each with 128 neurons. The agent is then subsequently trained according to Algorithm \ref{alg:DQN} over 750 episodes, where the parameters $\boldsymbol{\theta}$ of the $Q$-network are optimized using an AdamW optimizer \cite{loshchilov2017}, with the learning rate $l_r$. The learning was carried out on a 5.3 GHz Intel\textsuperscript{\textregistered{}} Core\texttrademark{} i9-10900K CPU, and implemented with the deep learning framework PyTorch \cite{pytorch}, as well as other libraries for applications in science and data analysis (e.g. pandas \cite{pandas}, SciPy \cite{scipy}, NumPy \cite{numpy}) in the Python programming language.
+As shown in Figure \ref{fig:PendulumDuration}, the duration (i.e. total time-steps) of the pendulum kept balanced increases over the course of training and even ultimately reaches the maximum permitted number of time-steps as set in (\ref{eq:TerminationConditionCartPole}). One can also observe how the agent progressively improves, and its learning eventually converges to an optimal policy $\optpolicy$, as implied in Figure \ref{fig:QValuesProgression} with the converging $Q$-Values.
+\begin{table}[H]
+  \centering
+  \begin{tabular}{p{0.65\textwidth}p{0.055\textwidth}p{0.1\textwidth}}
+    \hline
+    \multicolumn{3}{l}{\textbf{Environmental Parameters}} \\
+    \hline
+    Mass of cart & $m_c$ & 1.0 \si{kg} \\
+    Mass of point mass on end of pole & $m_p$ & 0.1 \si{kg} \\
+    Length of pole & $\ell$ & 1.0 \si{m} \\
+    Magnitude of force applied to cart in $x$-direction & $F$ & 10 \si{N} \\
+    Gravitational acceleration & $g$ & 9.8 \si{m/s^2} \\
+    Threshold angle with respect to vertical  & $\varphi^*$ & \ang{12} \\
+    Threshold position of cart in $x$-direciton  & $x^*$ & 2.4 \si{m} \\
+    Step size & $\Delta t$ & 0.02 \si{s} \\
+    \hline
+    \multicolumn{3}{l}{\textbf{Agent Hyperparameters}} \\
+    \hline
+    Number of episodes & $K$ & 750 \\
+    Number of experiences stored in replay memory & $N$ & 10000 \\
+    Discount rate & $\gamma$ & 0.99 \\
+    Batch size of experiences drawn from replay memory & $|\mathcal{B}_{D}|$ & 128 \\
+    Learning rate of AdamW optimizer & $l_r$ & \num{1e-4} \\
+    Soft target update rate & $\tau$ & 0.005 \\
+    Initial probability $\epsilon$ for random exploration & $\epsilon_{\text{initial}}$ & 0.9 \\
+    Final probability $\epsilon$ for random exploration & $\epsilon_{\text{final}}$ & 0.05 \\
+    Decay rate of probability $\epsilon$ for random exploration & $\epsilon_{\text{decay}}$ & 1000 \\
+    \hline
+  \end{tabular}
+  \caption{Environmental parameters for the example of balancing an inversed pendulum on a cart, and the learning hyperparameters of the DQL agent.}
+  \label{tab:CartPoleParameters}
+\end{table}
+\begin{figure}[H]
+  \centering
+  \vspace{-1.0em}
+  \includegraphics[width=1.0\textwidth]{PendulumDuration}
+  % Matptlotlib Customized Settings
+  % figsize=(6.25, 3.5)
+  % loc='left', fontsize='large'
+  \vspace{-1.5em}
+  \caption{Duration of pendulum kept upright over training episodes.}
+  \label{fig:PendulumDuration}
+\end{figure}
+% \begin{figure}[H]
+%   \centering
+%   \vspace{-1.0em}
+%   \includegraphics[width=1.0\textwidth]{QValuesPendulum}
+%   % Matptlotlib Customized Settings
+%   % figsize=(5.5, 3.5)
+%   % ax.ticklabel_format(style='sci', axis='x', scilimits=(0,0))
+%   % loc='left', fontsize='large
+%   \vspace{-1.5em}
+%   \caption{Progression of $Q$-values over the course of training.}
+%   \label{fig:QValuesProgression}
+% \end{figure}

ltfmselector-0.2.1/doc/06PatSpecFMS_Reconstruction.tex ADDED Viewed

File without changes

ltfmselector-0.2.1/doc/07Results.tex ADDED Viewed

@@ -0,0 +1,15 @@
+\begin{table}
+\caption{Table captions should be placed above the
+tables.}\label{tab1}
+\begin{tabular}{|l|l|l|}
+\hline
+Heading level &  Example & Font size and style\\
+\hline
+Title (centered) &  {\Large\bfseries Lecture Notes} & 14 point, bold\\
+1st-level heading &  {\large\bfseries 1 Introduction} & 12 point, bold\\
+2nd-level heading & {\bfseries 2.1 Printing Area} & 10 point, bold\\
+3rd-level heading & {\bfseries Run-in Heading in Bold.} Text follows & 10 point, bold\\
+4th-level heading & {\itshape Lowest Level Heading.} Text follows & 10 point, italic\\
+\hline
+\end{tabular}
+\end{table}

ltfmselector-0.2.1/doc/08Discussion.tex ADDED Viewed

File without changes

ltfmselector-0.2.1/doc/09Conclusion.tex ADDED Viewed

File without changes

ltfmselector-0.2.1/doc/Makefile ADDED Viewed

@@ -0,0 +1,42 @@
+# === === === === === === === === === === ===
+# @Author : Liaw
+# @Date   : 08.07.2025
+#
+# Makefile to compile a LaTeX project using
+# your own customized Docker image based on
+# texlive's official Docker image
+#
+# === === === === === === === === === === ===
+# Name of the the "main" file to compile
+NAME=main
+DOCKER_LATEXCOMPILER=liawlatex
+all:
+	docker run --rm -v "$(shell pwd):/project" -w /project $(DOCKER_LATEXCOMPILER) latexmk -pdf -file-line-error -bibtex $(NAME).tex
+clean:
+	rm -f *.toc
+	rm -f $(NAME).dvi
+	rm -f *.log
+	rm -f *.aux
+	rm -f $(NAME).ps
+	rm -f $(NAME).pdf
+	rm -f *~
+	rm -f *.fls
+	rm -f *.fdb_latexmk
+	rm -f *.bbl
+	rm -f *.blg
+	rm -f *.glo
+	rm -f *.ist
+	rm -f *.acn
+	rm -f *.bcf*
+	rm -f *.bbl*
+	rm -f *.run*
+	rm -f *.acr
+	rm -f *.alg
+	rm -f *.glg
+	rm -f *.gls
+	rm -f *.out
+	rm -f *.glsdefs

ltfmselector-0.2.1/doc/abstract.tex ADDED Viewed

@@ -0,0 +1,17 @@
+% Abstract should be limited to 150--250 words.
+Designing personalized therapy for poststroke gait rehabilitation often involves the effort of an interdisciplinary medical team and tedious assessments. An automated gait assessment tool based on gait measurements and interdisciplinary knowledge could help experts with faster gait assessments, while providing objective feedback. Gait measurements are however high-dimensional, making the development of such tools challenging. Inspired by the application of Deep Q-Learning in solving physical problems, this study presents a method for dynamic feature and model selection. The search space is formulated as a partially observable Markov Decision Process, where the agent iteratively explores the 680 extracted gait features and various prediction models, to learn optimal patient-specific combinations of feature subsets and prediction models. The model was developed using a dataset of 904 stride pairs from 100 hemiparetic stroke patients. Each patient was evaluated by an interdisciplinary board using the Stroke Mobility Score, a multiple-cue clinical observational score comprised of six subscores, each pertaining to a functional criterion of gait. The agent was trained to approximate optimal decision-making, receiving rewards for accurate predictions and efficient feature selection. Results demonstrated excellent predictive performance, achieving a coefficient of determination ($R^2$) of 0.83 on the test set. Crucially, the tool identifies patient-specific key features, that could help clinicians by highlighting specific therapeutic targets tailored to individual needs, thus potentially providing a solution for personalized poststroke therapy.
+%% ORIGINAL
+% Designing personalized therapy for poststroke gait rehabilitation often involves the effort of an interdisciplinary medical team and tedious assessments. An automated gait-assessment tool based on gait measurements and interdisciplinary knowledge could help experts with faster gait assessments, while providing objective feedback. However, developing such a tool based on gait data can be challenging due to the high dimensionality of the training datasets typically derived from gait measurements.
+%
+% While a common approach involves carrying out feature selection on a fixed feature set, this work presents a dynamic feature and model selection approach using reinforcement learning. Hereby, the task of selecting the optimal feature set and corresponding prediction model to map gait data to expert gait-assessment is formulated as a partially observable Markov Decision Process (POMDP), where an agent learns by autonomously exploring different options for each available sample iteratively.
+%
+% This approach adds a patient-specific component to the gait assessment tool, which could assist clinicians in tailoring personalized therapy.
+%
+% To achieve this, a dataset is first obtained from 100 hemiparetic stroke-patients which received a clinical examination and a full-body instrumented gait-analysis. An interdisciplinary board of medical experts assigned each patient a Stroke Mobility Score, a multiple-cue clinical observational score comprised of six sub-scores, each pertaining to a functional criterion of gait.
+%
+% From the measurements, 904 measured stride pairs of 100 patients were obtained, 690 gait features extracted, and the dataset split 70/30 for training and testing. As a preprocessing step, expert knowledge was used to trim the features accordingly, followed by filtering out statistically non-discriminatory features. Within the setting of a POMDP, the agent is allowed to either query a feature or a prediction model, or make a prediction based on the selected feature set and prediction model.
+%
+% The agent is then rewarded accordingly for each action, before transitioning onto a next state where the process is repeated iteratively. Over the course of many iterations, the agent eventually learns to select an optimal set of actions, given a set of features of a stride pair measurement.
+%
+% The agent is trained using a Deep Q-Learning algorithm that approximates the Bellman equation by training a deep neural network iteratively on a batch of randomly chosen transitions. The trained agent tested on the test dataset yielded excellent predictive performance, showing a coefficient of determination of 0.85, while delivering patient-specific key features. The delivered patient-specific key features could help clinicians focus on key therapeutic targets, specifically tailored to a patient's needs.

ltfmselector-0.2.1/doc/figures/InversedPendulum.eps ADDED Viewed

Binary file

ltfmselector-0.2.1/doc/figures/ReinforcementLearning.eps ADDED Viewed

Binary file

ltfmselector-0.2.1/doc/figures/fig1.eps ADDED Viewed

Binary file

ltfmselector-0.2.1/doc/history.txt ADDED Viewed

@@ -0,0 +1,139 @@
+Version history for the LLNCS LaTeX2e class
+ date     filename      version   action/reason/acknowledgements
+----------------------------------------------------------------------------
+ 29.5.96  letter.txt      beta    naming problems (subject index file)
+                                  thanks to Dr. Martin Held, Salzburg, AT
+          subjindx.ind            renamed to subjidx.ind as required
+                                  by llncs.dem
+          history.txt             introducing this file
+ 30.5.96  llncs.cls               incompatibility with new article.cls of
+                                  1995/12/20 v1.3q Standard LaTeX document class,
+                                  \if@openbib is no longer defined,
+                                  reported by Ralf Heckmann and Graham Gough
+                                  solution by David Carlisle
+ 10.6.96  llncs.cls               problems with fragile commands in \author field
+                                  reported by Michael Gschwind, TU Wien
+ 25.7.96  llncs.cls               revision a corrects:
+                                  wrong size of text area, floats not \small,
+                                  some LaTeX generated texts
+                                  reported by Michael Sperber, Uni Tuebingen
+ 16.4.97  all files        2.1    leaving beta state,
+                                  raising version counter to 2.1
+  8.6.97  llncs.cls        2.1a   revision a corrects:
+                                  unbreakable citation lists, reported by
+                                  Sergio Antoy of Portland State University
+11.12.97  llncs.cls        2.2    "general" headings centered; two new elements
+                                  for the article header: \email and \homedir;
+                                  complete revision of special environments:
+                                  \newtheorem replaced with \spnewtheorem,
+                                  introduced the theopargself environment;
+                                  two column parts made with multicol package;
+                                  add ons to work with the hyperref package
+07.01.98  llncs.cls        2.2    changed \email to simply switch to \tt
+25.03.98  llncs.cls        2.3    new class option "oribibl" to suppress
+                                  changes to the thebibliograpy environment
+                                  and retain pure LaTeX codes - useful
+                                  for most BibTeX applications
+16.04.98  llncs.cls        2.3    if option "oribibl" is given, extend the
+                                  thebibliograpy hook with "\small", suggested
+                                  by Clemens Ballarin, University of Cambridge
+20.11.98  llncs.cls        2.4    pagestyle "titlepage" - useful for
+                                  compilation of whole LNCS volumes
+12.01.99  llncs.cls        2.5    counters of orthogonal numbered special
+                                  environments are reset each new contribution
+27.04.99  llncs.cls        2.6    new command \thisbottomragged for the
+                                  actual page; indention of the footnote
+                                  made variable with \fnindent (default 1em);
+                                  new command \url that copys its argument
+ 2.03.00  llncs.cls        2.7    \figurename and \tablename made compatible
+                                  to babel, suggested by Jo Hereth, TU Darmstadt;
+                                  definition of \url moved \AtBeginDocument
+                                  (allows for url package of Donald Arseneau),
+                                  suggested by Manfred Hauswirth, TU of Vienna;
+                                  \large for part entries in the TOC
+16.04.00  llncs.cls        2.8    new option "orivec" to preserve the original
+                                  vector definition, read "arrow" accent
+17.01.01  llncs.cls        2.9    hardwired texts made polyglot,
+                                  available languages: english (default),
+                                  french, german - all are "babel-proof"
+20.06.01  splncs.bst              public release of a BibTeX style for LNCS,
+                                  nobly provided by Jason Noble
+14.08.01  llncs.cls        2.10   TOC: authors flushleft,
+                                  entries without hyphenation; suggested
+                                  by Wiro Niessen, Imaging Center - Utrecht
+23.01.02  llncs.cls        2.11   fixed footnote number confusion with
+                                  \thanks, numbered institutes, and normal
+                                  footnote entries; error reported by
+                                  Saverio Cittadini, Istituto Tecnico
+                                  Industriale "Tito Sarrocchi" - Siena
+28.01.02  llncs.cls        2.12   fixed footnote fix; error reported by
+                                  Chris Mesterharm, CS Dept. Rutgers - NJ
+28.01.02  llncs.cls        2.13   fixed the fix (programmer needs vacation)
+17.08.04  llncs.cls        2.14   TOC: authors indented, smart \and handling
+                                  for the TOC suggested by Thomas Gabel
+                                  University of Osnabrueck
+07.03.06  splncs.bst              fix for BibTeX entries without year; patch
+                                  provided by Jerry James, Utah State University
+14.06.06  splncs_srt.bst          a sorting BibTeX style for LNCS, feature
+                                  provided by Tobias Heindel, FMI Uni-Stuttgart
+16.10.06  llncs.dem        2.3    removed affiliations from \tocauthor demo
+11.12.07  llncs.doc               note on online visibility of given e-mail address
+15.06.09  splncs03.bst            new BibTeX style compliant with the current
+                                  requirements, provided by Maurizio "Titto"
+                                  Patrignani of Universita' Roma Tre
+30.03.10  llncs.cls        2.15   fixed broken hyperref interoperability;
+                                  patch provided by Sven Koehler,
+                                  Hamburg University of Technology
+15.04.10  llncs.cls        2.16   fixed hyperref warning for informatory TOC entries;
+                                  introduced \keywords command - finally;
+                                  blank removed from \keywordname, flaw reported
+                                  by Armin B. Wagner, IGW TU Vienna
+15.04.10  llncs.cls        2.17   fixed missing switch "openright" used by \backmatter;
+                                  flaw reported by Tobias Pape, University of Potsdam
+27.09.13  llncs.cls        2.18   fixed "ngerman" incompatibility; solution provided
+                                  by Bastian Pfleging, University of Stuttgart
+04.09.17  llncs.cls        2.19   introduced \orcidID command
+10.03.18  llncs.cls        2.20   adjusted \doi according to CrossRef requirements;
+                                  TOC: removed affiliation numbers
+          splncs04.bst            added doi field;
+                                  bold journal numbers
+          samplepaper.tex         new sample paper
+          llncsdoc.pdf            new LaTeX class documentation

ltfmselector 0.1.13__tar.gz → 0.2.1__tar.gz

ltfmselector 0.1.13tar.gz → 0.2.1tar.gz