superlab 0.1.13 → 0.1.15
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +15 -3
- package/README.zh-CN.md +15 -3
- package/bin/superlab.cjs +38 -0
- package/lib/auto_contracts.cjs +7 -3
- package/lib/auto_runner.cjs +33 -52
- package/lib/auto_state.cjs +27 -21
- package/lib/context.cjs +15 -0
- package/lib/i18n.cjs +122 -37
- package/lib/install.cjs +1 -0
- package/package-assets/claude/commands/lab/auto.md +3 -0
- package/package-assets/claude/commands/lab/write.md +1 -1
- package/package-assets/claude/commands/lab.md +15 -0
- package/package-assets/codex/prompts/lab-auto.md +3 -0
- package/package-assets/codex/prompts/lab-write.md +1 -1
- package/package-assets/codex/prompts/lab.md +15 -0
- package/package-assets/shared/lab/.managed/templates/final-report.md +12 -0
- package/package-assets/shared/lab/.managed/templates/main-tables.md +37 -0
- package/package-assets/shared/lab/config/workflow.json +3 -1
- package/package-assets/shared/lab/context/auto-mode.md +8 -1
- package/package-assets/shared/lab/context/auto-outcome.md +3 -0
- package/package-assets/shared/skills/lab/SKILL.md +6 -2
- package/package-assets/shared/skills/lab/references/paper-writing/abstract.md +7 -1
- package/package-assets/shared/skills/lab/references/paper-writing/examples/abstract/template-a.md +21 -0
- package/package-assets/shared/skills/lab/references/paper-writing/examples/abstract/template-b.md +34 -0
- package/package-assets/shared/skills/lab/references/paper-writing/examples/abstract/template-c.md +28 -0
- package/package-assets/shared/skills/lab/references/paper-writing/examples/abstract-examples.md +13 -0
- package/package-assets/shared/skills/lab/references/paper-writing/examples/index.md +21 -0
- package/package-assets/shared/skills/lab/references/paper-writing/examples/introduction/novel-task-challenge-decomposition.md +18 -0
- package/package-assets/shared/skills/lab/references/paper-writing/examples/introduction/pipeline-not-recommended-abstract-only.md +30 -0
- package/package-assets/shared/skills/lab/references/paper-writing/examples/introduction/pipeline-version-1-one-contribution-multi-advantages.md +30 -0
- package/package-assets/shared/skills/lab/references/paper-writing/examples/introduction/pipeline-version-2-two-contributions.md +34 -0
- package/package-assets/shared/skills/lab/references/paper-writing/examples/introduction/pipeline-version-3-new-module-on-existing-pipeline.md +18 -0
- package/package-assets/shared/skills/lab/references/paper-writing/examples/introduction/pipeline-version-4-observation-driven.md +16 -0
- package/package-assets/shared/skills/lab/references/paper-writing/examples/introduction/technical-challenge-version-1-existing-task.md +32 -0
- package/package-assets/shared/skills/lab/references/paper-writing/examples/introduction/technical-challenge-version-2-existing-task-insight-backed-by-traditional.md +33 -0
- package/package-assets/shared/skills/lab/references/paper-writing/examples/introduction/technical-challenge-version-3-novel-task.md +21 -0
- package/package-assets/shared/skills/lab/references/paper-writing/examples/introduction/version-1-task-then-application.md +14 -0
- package/package-assets/shared/skills/lab/references/paper-writing/examples/introduction/version-2-application-first.md +10 -0
- package/package-assets/shared/skills/lab/references/paper-writing/examples/introduction/version-3-general-to-specific-setting.md +14 -0
- package/package-assets/shared/skills/lab/references/paper-writing/examples/introduction/version-4-open-with-challenge.md +20 -0
- package/package-assets/shared/skills/lab/references/paper-writing/examples/introduction-examples.md +25 -0
- package/package-assets/shared/skills/lab/references/paper-writing/examples/method/example-of-the-three-elements.md +67 -0
- package/package-assets/shared/skills/lab/references/paper-writing/examples/method/method-writing-common-issues-note.md +10 -0
- package/package-assets/shared/skills/lab/references/paper-writing/examples/method/module-design-instant-ngp.md +55 -0
- package/package-assets/shared/skills/lab/references/paper-writing/examples/method/module-motivation-patterns.md +15 -0
- package/package-assets/shared/skills/lab/references/paper-writing/examples/method/module-triad-neural-body.md +19 -0
- package/package-assets/shared/skills/lab/references/paper-writing/examples/method/neural-body-annotated-figure-text.md +66 -0
- package/package-assets/shared/skills/lab/references/paper-writing/examples/method/overview-template.md +30 -0
- package/package-assets/shared/skills/lab/references/paper-writing/examples/method/pre-writing-questions.md +17 -0
- package/package-assets/shared/skills/lab/references/paper-writing/examples/method/section-skeleton.md +9 -0
- package/package-assets/shared/skills/lab/references/paper-writing/examples/method-examples.md +24 -0
- package/package-assets/shared/skills/lab/references/paper-writing/introduction.md +7 -1
- package/package-assets/shared/skills/lab/references/paper-writing/method.md +6 -2
- package/package-assets/shared/skills/lab/references/paper-writing-integration.md +26 -0
- package/package-assets/shared/skills/lab/stages/auto.md +29 -1
- package/package-assets/shared/skills/lab/stages/report.md +5 -1
- package/package-assets/shared/skills/lab/stages/write.md +16 -1
- package/package.json +1 -1
package/package-assets/shared/skills/lab/references/paper-writing/examples/abstract/template-b.md
ADDED
|
@@ -0,0 +1,34 @@
|
|
|
1
|
+
# Abstract Template B Examples (Challenge -> Insight -> Contribution)
|
|
2
|
+
|
|
3
|
+
```latex
|
|
4
|
+
\section{Abstract}
|
|
5
|
+
% Task
|
|
6
|
+
%% Example 1: In recent years, generative models have undergone significant advancement due to the success of diffusion models.
|
|
7
|
+
%% Example 2: This paper addresses the challenge of novel view synthesis for a human performer from a very sparse set of camera views.
|
|
8
|
+
|
|
9
|
+
% Technical challenge for previous methods (discuss around the technical challenge that we solved)
|
|
10
|
+
%% Example 1: The success of these models is often attributed to their use of guidance techniques, such as classifier and classifier-free methods, which provides effective mechanisms to tradeoff between fidelity and diversity. However, these methods are not capable of guiding a generated image to be aware of its geometric configuration, e.g., depth, which hinders the application of diffusion models to areas that require a certain level of depth awareness.
|
|
11
|
+
%% Example 2: Some recent works have shown that learning implicit neural representations of 3D scenes achieves remarkable view synthesis quality given dense input views. However, the representation learning will be ill-posed if the views are highly sparse.
|
|
12
|
+
|
|
13
|
+
% Introduce the insight for solving the challenge in one sentence
|
|
14
|
+
%% Example 1: To address this limitation, we propose a novel guidance approach for diffusion models that uses estimated depth information derived from the rich intermediate representations of diffusion models.
|
|
15
|
+
%% Example 2: To solve this ill-posed problem, our key idea is to integrate observations over video frames.
|
|
16
|
+
|
|
17
|
+
% Introduce the technical contribution that implements the insight in one to two sentences (usually mention the technical term/name only, without describing every detailed step. The term should be easy to understand and should not create a jump in reading. This ability is very important for writing a good abstract.)
|
|
18
|
+
%% Example 1: To do this, we first present a label-efficient depth estimation framework using the internal representations of diffusion models. At the sampling phase, we utilize two guidance techniques to self-condition the generated image using the estimated depth map, the first of which uses pseudo-labeling, and the subsequent one uses a depth-domain diffusion prior.
|
|
19
|
+
%% Example 2: To this end, we propose Neural Body, a new human body representation which assumes that the learned neural representations at different frames share the same set of latent codes anchored to a deformable mesh
|
|
20
|
+
|
|
21
|
+
% Introduce the benefits of technical novelty
|
|
22
|
+
%% Example 2: so that the observations across frames can be naturally integrated. The deformable mesh also provides geometric guidance for the network to learn 3D representations more efficiently.
|
|
23
|
+
|
|
24
|
+
% Experiment
|
|
25
|
+
```
|
|
26
|
+
|
|
27
|
+
## Given example pattern 2
|
|
28
|
+
|
|
29
|
+
1. `This paper addresses the challenge of novel view synthesis for a human performer from a very sparse set of camera views.`
|
|
30
|
+
2. `... representation learning will be ill-posed if the views are highly sparse.`
|
|
31
|
+
3. `To solve this ill-posed problem, our key idea is to integrate observations over video frames.`
|
|
32
|
+
4. `To this end, we propose Neural Body ...`
|
|
33
|
+
5. `... observations across frames can be naturally integrated ... provides geometric guidance ...`
|
|
34
|
+
6. `Experiments show [main result].`
|
package/package-assets/shared/skills/lab/references/paper-writing/examples/abstract/template-c.md
ADDED
|
@@ -0,0 +1,28 @@
|
|
|
1
|
+
# Abstract Template C Examples (Multiple Contributions)
|
|
2
|
+
|
|
3
|
+
```latex
|
|
4
|
+
% Task
|
|
5
|
+
%% This paper introduces a novel contour-based approach named deep snake for real-time instance segmentation.
|
|
6
|
+
|
|
7
|
+
%% Unlike some recent methods that directly regress the coordinates of the object boundary points from an image
|
|
8
|
+
|
|
9
|
+
% Introduce technical contribution and technical advantage in one sentence (this ability is very important for writing a good abstract.)
|
|
10
|
+
%% deep snake uses a neural network to iteratively deform an initial contour to match the object boundary, which implements the classic idea of snake algorithms with a learning-based approach.
|
|
11
|
+
|
|
12
|
+
% Introduce technical contribution and technical advantage in one sentence
|
|
13
|
+
%% For structured feature learning on the contour, we propose to use circular convolution in deep snake, which better exploits the cycle-graph structure of a contour compared against generic graph convolution.
|
|
14
|
+
|
|
15
|
+
% Introduce technical contribution and technical advantage in one sentence
|
|
16
|
+
%% Based on deep snake, we develop a two-stage pipeline for instance segmentation: initial contour proposal and contour deformation, which can handle errors in object localization.
|
|
17
|
+
|
|
18
|
+
% Experiment
|
|
19
|
+
```
|
|
20
|
+
|
|
21
|
+
## Given example pattern (Deep Snake style from your text)
|
|
22
|
+
|
|
23
|
+
1. `This paper introduces a novel contour-based approach named deep snake for real-time instance segmentation.`
|
|
24
|
+
2. `Unlike some recent methods that directly regress the coordinates of the object boundary points from an image ...`
|
|
25
|
+
3. `deep snake uses a neural network to iteratively deform an initial contour ...`
|
|
26
|
+
4. `For structured feature learning on the contour, we propose circular convolution ...`
|
|
27
|
+
5. `Based on deep snake, we develop a two-stage pipeline ...`
|
|
28
|
+
6. `Experiments show [main result].`
|
package/package-assets/shared/skills/lab/references/paper-writing/examples/abstract-examples.md
ADDED
|
@@ -0,0 +1,13 @@
|
|
|
1
|
+
# Abstract Examples Index
|
|
2
|
+
|
|
3
|
+
All abstract example cites should point to the local files below.
|
|
4
|
+
|
|
5
|
+
1. Version 1 (Challenge -> Contribution)
|
|
6
|
+
`Version 1: Introduce the technical challenge, then use one to two sentences to present the technical contribution that solves the challenge.`
|
|
7
|
+
`references/examples/abstract/template-a.md`
|
|
8
|
+
2. Version 2 (Challenge -> Insight -> Contribution)
|
|
9
|
+
`Version 2: Introduce the technical challenge, then use one to two sentences to present the insight for solving the challenge, and then one sentence to present the technical contribution that implements this insight. (Personally recommended.)`
|
|
10
|
+
`references/examples/abstract/template-b.md`
|
|
11
|
+
3. Version 3 (Multiple Contributions)
|
|
12
|
+
`Version 3: When there are multiple technical contributions, describe each contribution together with its technical advantage.`
|
|
13
|
+
`references/examples/abstract/template-c.md`
|
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
# Example Bank Index
|
|
2
|
+
|
|
3
|
+
Use this folder for concrete writing patterns and locally organized cite targets.
|
|
4
|
+
|
|
5
|
+
## Files
|
|
6
|
+
|
|
7
|
+
1. Abstract examples index: `references/examples/abstract-examples.md`
|
|
8
|
+
2. Introduction examples index: `references/examples/introduction-examples.md`
|
|
9
|
+
3. Abstract template files: `references/examples/abstract/template-a.md`, `references/examples/abstract/template-b.md`, `references/examples/abstract/template-c.md`
|
|
10
|
+
4. Introduction task/application files: `references/examples/introduction/version-1-task-then-application.md`, `references/examples/introduction/version-2-application-first.md`, `references/examples/introduction/version-3-general-to-specific-setting.md`, `references/examples/introduction/version-4-open-with-challenge.md`
|
|
11
|
+
5. Introduction technical-challenge files: `references/examples/introduction/technical-challenge-version-1-existing-task.md`, `references/examples/introduction/technical-challenge-version-2-existing-task-insight-backed-by-traditional.md`, `references/examples/introduction/technical-challenge-version-3-novel-task.md`, `references/examples/introduction/novel-task-challenge-decomposition.md`
|
|
12
|
+
6. Introduction pipeline files: `references/examples/introduction/pipeline-version-1-one-contribution-multi-advantages.md`, `references/examples/introduction/pipeline-version-2-two-contributions.md`, `references/examples/introduction/pipeline-version-3-new-module-on-existing-pipeline.md`, `references/examples/introduction/pipeline-version-4-observation-driven.md`, `references/examples/introduction/pipeline-not-recommended-abstract-only.md`
|
|
13
|
+
7. Method examples index: `references/examples/method-examples.md`
|
|
14
|
+
8. Method detail files: `references/examples/method/pre-writing-questions.md`, `references/examples/method/module-triad-neural-body.md`, `references/examples/method/neural-body-annotated-figure-text.md`, `references/examples/method/module-design-instant-ngp.md`, `references/examples/method/module-motivation-patterns.md`, `references/examples/method/section-skeleton.md`, `references/examples/method/overview-template.md`, `references/examples/method/example-of-the-three-elements.md`, `references/examples/method/method-writing-common-issues-note.md`
|
|
15
|
+
|
|
16
|
+
## Usage
|
|
17
|
+
|
|
18
|
+
1. Pick one template from a section guide.
|
|
19
|
+
2. Open the matching examples file.
|
|
20
|
+
3. Reuse the sentence logic, not exact wording.
|
|
21
|
+
4. Keep citation links in your notes for traceability.
|
|
@@ -0,0 +1,18 @@
|
|
|
1
|
+
# Introduction Novel-Task Challenge Decomposition
|
|
2
|
+
|
|
3
|
+
|
|
4
|
+
`For novel tasks without direct methods, decompose the challenge into clear requirement/challenge points.`
|
|
5
|
+
|
|
6
|
+
```latex
|
|
7
|
+
% To achieve xx goal, several requirements must be satisfied (or several challenges must be handled).
|
|
8
|
+
%% Example: In this work, our goal is to build a model that captures such object intrinsics from a single image. This problem is challenging for three reasons.
|
|
9
|
+
|
|
10
|
+
% Describe point 1
|
|
11
|
+
%% Example: First, we only have a single image. This makes our work fundamentally different from existing works on 3D-aware image generation models [8, 9, 27, 28], which typically require a large dataset of thousands of instances for training. In comparison, the single image contains at most a few dozen instances, making the inference problem highly under-constrained.
|
|
12
|
+
|
|
13
|
+
% Describe point 2
|
|
14
|
+
%% Example: Second, these already limited instances may vary significantly in pixel values. This is because they have different poses and illumination conditions, but neither of these factors are annotated or known. We also cannot resort to existing tools for pose estimation based on structure from motion, such as COLMAP [35], because the appearance variations violate the assumptions of epipolar geometry.
|
|
15
|
+
|
|
16
|
+
% Describe point 3
|
|
17
|
+
%% Example: Finally, the object intrinsics we aim to infer are probabilistic, not deterministic: no two roses in the natural world are identical, and we want to capture a distribution of their geometry, texture, and material to exploit the underlying multi-view information.
|
|
18
|
+
```
|
|
@@ -0,0 +1,30 @@
|
|
|
1
|
+
# Not Recommended: Abstract-Only Method Description in Introduction
|
|
2
|
+
|
|
3
|
+
|
|
4
|
+
`Not recommended: If the method is simple, do not avoid concrete method details in Introduction and only discuss abstract insight to make it look novel.`
|
|
5
|
+
|
|
6
|
+
Expert note (faithful translation):
|
|
7
|
+
|
|
8
|
+
1. The craft of this writing template is how to make a simple pipeline look novel.
|
|
9
|
+
2. Note: this is not about making the insight look novel, but about making the pipeline steps look novel.
|
|
10
|
+
3. In most cases this is not recommended.
|
|
11
|
+
4. The better target is to clearly explain how the core contribution is implemented in Introduction.
|
|
12
|
+
|
|
13
|
+
```latex
|
|
14
|
+
% To tackle this problem, we propose a novel 3D GAN training method to generate photo-realistic images irrespective of the viewing angle.
|
|
15
|
+
|
|
16
|
+
% Introduce key idea
|
|
17
|
+
% Our key idea is as follows. To ease the challenging problem of learning photorealistic and multi-view consistent image synthesis, we cast the problem into two subproblems, each of which can be solved more easily.
|
|
18
|
+
|
|
19
|
+
% Explain why the key idea works, but without concretely discussing the full pipeline (or only discuss abstract benefit)
|
|
20
|
+
%% Example: Specifically, we formulate the problem as a combination of two simple discrimination problems, one of which learns to discriminate whether a synthesized image looks real or not, and the other learns to discriminate whether a synthesized image agrees with the camera pose. Unlike the formulations of the previous methods, which try to learn the real image distribution for each pose, or to learn pose estimation, our subproblems are much easier as each of them is analogous to a basic binary classification problem.
|
|
21
|
+
|
|
22
|
+
% Introduce pipeline modules with new terms but without clearly explaining the full pipeline (or skip concrete pipeline details)
|
|
23
|
+
%% Example: Based on this key idea, we propose a dual-branched discriminator, which has two branches for learning photorealism and pose consistency, respectively. As these branches are supervised explicitly for their respective purposes, high-quality images with pose consistency can be produced at each viewing angle, and consequently, the generator creates high-quality images and shapes. (This paragraph does not clearly explain how the pipeline works.)
|
|
24
|
+
|
|
25
|
+
% Introduce another contribution
|
|
26
|
+
%% Example: In addition, we propose a pose-matching loss to give supervision to the discriminator for the pose consistency, by considering a positive pose (i.e., rendering pose or ground truth pose) and a negative pose (i.e., irrelevant pose) for a given image. (This paragraph does not clearly explain how the pipeline works.)
|
|
27
|
+
|
|
28
|
+
% Explain expected benefit over prior methods
|
|
29
|
+
%% Example: For example, the frontal viewpoint is one of the irrelevant poses for a side-view image. As reported in the experiments, this loss helps improve image and shape quality. This can be interpreted as a simplification of a classification problem from a large number of classes into binary, which is composed of positive and negative pairs.
|
|
30
|
+
```
|
|
@@ -0,0 +1,30 @@
|
|
|
1
|
+
# Pipeline Version 1 (One Contribution, Multiple Advantages)
|
|
2
|
+
|
|
3
|
+
|
|
4
|
+
`Version 1: One contribution with multiple advantages, and one teaser figure to present the basic idea.`
|
|
5
|
+
|
|
6
|
+
```latex
|
|
7
|
+
% In this paper, we propose a novel framework …
|
|
8
|
+
%% Example: In this paper, we introduce a novel implicit neural representation for dynamic humans, named Neural Body, to solve the challenge of novel view synthesis from sparse views.
|
|
9
|
+
In this paper, we propose a novel framework/representation, named [method name] for [xxx task].
|
|
10
|
+
|
|
11
|
+
% Teaser for basic idea
|
|
12
|
+
%% Example: The basic idea is illustrated in Figure 2.
|
|
13
|
+
The basic idea is illustrated in [xxx Figure].
|
|
14
|
+
|
|
15
|
+
% One-sentence key novelty/contribution (very important ability)
|
|
16
|
+
%% Example: For the implicit fields at different frames, instead of learning them separately, Neural Body generates them from the same set of latent codes.
|
|
17
|
+
Our innovation is in [one sentence for key novelty].
|
|
18
|
+
|
|
19
|
+
% Method details
|
|
20
|
+
%% Example: Specifically, we anchor a set of latent codes to the vertices of a deformable human model (SMPL in this work), namely that their spatial locations vary with the human pose. To obtain the 3D representation at a frame, we first transform the code locations based on the human pose, which can be reliably estimated from sparse camera views. Then, a network is designed to regress the density and color for any 3D point based on these latent codes. Both the latent codes and the network are jointly learned from images of all video frames during the reconstruction.
|
|
21
|
+
Specifically, [how it works in detail].
|
|
22
|
+
|
|
23
|
+
% Advantage 1
|
|
24
|
+
%% Example: This model is inspired by the latent variable model in statistics, which enables us to effectively integrate observations at different frames.
|
|
25
|
+
In contrast to previous methods, [our advantage].
|
|
26
|
+
|
|
27
|
+
% Advantage 2
|
|
28
|
+
%% Example: Another advantage of the proposed method is that the deformable model provides a geometric prior (rough surface location) to enable more efficient learning of implicit fields.
|
|
29
|
+
Another advantage of the proposed method is that [another advantage].
|
|
30
|
+
```
|
|
@@ -0,0 +1,34 @@
|
|
|
1
|
+
# Pipeline Version 2 (Two Contributions)
|
|
2
|
+
|
|
3
|
+
|
|
4
|
+
`Version 2: Two contributions, and one teaser figure to present the basic idea.`
|
|
5
|
+
|
|
6
|
+
```latex
|
|
7
|
+
% In this paper, we propose a novel framework …
|
|
8
|
+
%% Example: In this paper, we introduce a novel implicit neural representation for dynamic humans, named Neural Body, to solve the challenge of novel view synthesis from sparse views.
|
|
9
|
+
In this paper, we propose a novel framework/representation, named [method name] for [xxx task].
|
|
10
|
+
|
|
11
|
+
% One-sentence key novelty
|
|
12
|
+
%% Example: To that end, we propose techniques to represent a given subject with rare token identifiers and fine-tune a pre-trained, diffusion-based text-to-image framework that operates in two steps; generating a low-resolution image from text and subsequently applying super-resolution (SR) diffusion models.
|
|
13
|
+
Our innovation is in [one sentence for key novelty].
|
|
14
|
+
|
|
15
|
+
% Teaser
|
|
16
|
+
%% Example: The basic idea is illustrated in Figure 2.
|
|
17
|
+
The basic idea is illustrated in [xxx Figure].
|
|
18
|
+
|
|
19
|
+
% Contribution 1 details
|
|
20
|
+
%% Example: We first fine-tune the low-resolution text-to-image model with the input images and text prompts containing a unique identifier followed by the class name of the subject (e.g., “A [V] dog”).
|
|
21
|
+
Specifically, [how contribution 1 works].
|
|
22
|
+
|
|
23
|
+
% Advantage of contribution 1
|
|
24
|
+
%% Example: This model is inspired by the latent variable model in statistics, which enables us to effectively integrate observations at different frames.
|
|
25
|
+
In contrast to previous methods, [advantage of contribution 1].
|
|
26
|
+
|
|
27
|
+
% Challenge motivating contribution 2
|
|
28
|
+
%% Example: In order to prevent overfitting and language drift [35, 40] that cause the model to associate the class name (e.g., “dog”) with the specific instance
|
|
29
|
+
However, [another technical challenge].
|
|
30
|
+
|
|
31
|
+
% Contribution 2 details
|
|
32
|
+
%% Example: we propose an autogenous, class-specific prior preservation loss, which leverages the semantic prior on the class that is embedded in the model, and encourages it to generate diverse instances of the same class as our subject.
|
|
33
|
+
Specifically, [how contribution 2 works].
|
|
34
|
+
```
|
|
@@ -0,0 +1,18 @@
|
|
|
1
|
+
# Pipeline Version 3 (New Module on Existing Pipeline)
|
|
2
|
+
|
|
3
|
+
|
|
4
|
+
`Version 3: Build on a prior pipeline and introduce one new module, with a teaser figure for the basic idea.`
|
|
5
|
+
|
|
6
|
+
```latex
|
|
7
|
+
% In this paper, we propose a learning-based snake algorithm, named deep snake, for real-time instance segmentation.
|
|
8
|
+
|
|
9
|
+
% Inspired by previous methods [21, 25], deep snake takes an initial contour as input and deforms it by regressing vertex-wise offsets.
|
|
10
|
+
|
|
11
|
+
% Our innovation is introducing the circular convolution for efficient feature learning on a contour, as illustrated in Figure 1.
|
|
12
|
+
|
|
13
|
+
% We observe that the contour is a cycle graph that consists of a sequence of vertices connected in a closed cycle. Since every vertex has the same degree equal to two, we can apply the standard 1D convolution on the vertex features.
|
|
14
|
+
|
|
15
|
+
% Considering that the contour is periodic, deep snake introduces the circular convolution, which indicates that an aperiodic function (1D kernel) is convolved in the standard way with a periodic function (features defined on the contour).
|
|
16
|
+
|
|
17
|
+
% The kernel of circular convolution encodes not only the feature of each vertex but also the relationship among neighboring vertices. In contrast, the generic GCN performs pooling to aggregate information from neighboring vertices. The kernel function in our circular convolution amounts to a learnable aggregation function, which is more expressive and results in better performance than using a generic GCN, as demonstrated by our experimental results in Section 5.2.
|
|
18
|
+
```
|
|
@@ -0,0 +1,16 @@
|
|
|
1
|
+
# Pipeline Version 4 (Observation-Driven Contribution)
|
|
2
|
+
|
|
3
|
+
|
|
4
|
+
`Version 4: Contribution comes from one important observation. Introduce key innovation first, then intuitive observation as motivation, then method details, then benefits.`
|
|
5
|
+
|
|
6
|
+
```latex
|
|
7
|
+
% In this paper, we propose a learning-based snake algorithm, named deep snake, for real-time instance segmentation.
|
|
8
|
+
|
|
9
|
+
% Our innovation is introducing the circular convolution for efficient feature learning on a contour, as illustrated in Figure 1.
|
|
10
|
+
|
|
11
|
+
% We observe that the contour is a cycle graph that consists of a sequence of vertices connected in a closed cycle. Since every vertex has the same degree equal to two, we can apply the standard 1D convolution on the vertex features.
|
|
12
|
+
|
|
13
|
+
% Considering that the contour is periodic, deep snake introduces the circular convolution, which indicates that an aperiodic function (1D kernel) is convolved in the standard way with a periodic function (features defined on the contour).
|
|
14
|
+
|
|
15
|
+
% The kernel of circular convolution encodes not only the feature of each vertex but also the relationship among neighboring vertices. In contrast, the generic GCN performs pooling to aggregate information from neighboring vertices. The kernel function in our circular convolution amounts to a learnable aggregation function, which is more expressive and results in better performance than using a generic GCN, as demonstrated by our experimental results in Section 5.2.
|
|
16
|
+
```
|
|
@@ -0,0 +1,32 @@
|
|
|
1
|
+
# Technical Challenge Version 1 (Existing Task, Existing Methods)
|
|
2
|
+
|
|
3
|
+
|
|
4
|
+
`Version 1: For existing tasks with existing methods, discuss the challenge chain from traditional methods to recent methods and finally to the challenge we solve.`
|
|
5
|
+
|
|
6
|
+
```latex
|
|
7
|
+
% Discuss general technical challenges of this task (to lead into recent methods)
|
|
8
|
+
%% Example 1: This problem is quite challenging from many perspectives, including object detection under severe occlusions, variations in lighting and appearance, and cluttered background objects.
|
|
9
|
+
%% Example 2: This problem is particularly challenging due to the inherent ambiguity on acquiring human geometry, materials and motions from images.
|
|
10
|
+
This problem is particularly challenging due to several factors, including [xxx reason], [xxx reason], and [xxx reason].
|
|
11
|
+
|
|
12
|
+
% Briefly introduce one class of traditional methods, then discuss their technical challenge
|
|
13
|
+
%% Example: Traditional methods have shown that pose estimation can be achieved by establishing the correspondences between an object image and the object model.
|
|
14
|
+
To overcome these challenges, traditional methods [how they work], [what they achieve].
|
|
15
|
+
|
|
16
|
+
%% Example: They rely on hand-crafted features, which are not robust to image variations and background clutters.
|
|
17
|
+
However, they [technical challenge they face].
|
|
18
|
+
|
|
19
|
+
% Briefly introduce one class of recent methods 1 (optional), then discuss their challenge
|
|
20
|
+
%% Example: Deep learning based methods train end-to-end neural networks that take an image as input and output its corresponding pose.
|
|
21
|
+
Recently, [xxx methods] [how they work], [what they achieve].
|
|
22
|
+
|
|
23
|
+
%% Example: However, generalization remains as an issue, as it is unclear that such end-to-end methods learn sufficient feature representations for pose estimation.
|
|
24
|
+
However, they [limitation], because [xxx technical reason].
|
|
25
|
+
|
|
26
|
+
% Briefly introduce one class of recent methods 2, then discuss their challenge (must lead to our solved challenge)
|
|
27
|
+
%% Example: Some recent methods use CNNs to first regress 2D keypoints and then compute 6D pose parameters using the Perspective-n-Point (PnP) algorithm. In other words, the detected keypoints serve as an intermediate representation for pose estimation. Such two-stage approaches achieve state-of-the-art performance, thanks to robust detection of keypoints.
|
|
28
|
+
To overcome this challenge, [xxx methods] [how they work], [what they achieve].
|
|
29
|
+
|
|
30
|
+
%% Example: However, these methods have difficulty in tackling occluded and truncated objects, since part of their keypoints are invisible. Although CNNs may predict these unseen keypoints by memorizing similar patterns, generalization remains difficult.
|
|
31
|
+
However, they [limitation], because [xxx technical reason].
|
|
32
|
+
```
|
|
@@ -0,0 +1,33 @@
|
|
|
1
|
+
# Technical Challenge Version 2 (Existing Task, Insight Backed by Traditional Methods)
|
|
2
|
+
|
|
3
|
+
|
|
4
|
+
`Version 2: For existing tasks, if our technical insight was used in traditional methods, discuss that line to provide conceptual backing.`
|
|
5
|
+
|
|
6
|
+
```latex
|
|
7
|
+
% Introduce one class of traditional/recent methods and discuss their technical challenge (to lead to our insight)
|
|
8
|
+
%% Example (Deep Snake): Most of the state-of-the-art instance segmentation methods perform pixel-wise segmentation within a bounding box given by an object detector.
|
|
9
|
+
%% Example (ManhattanSDF): Given input images, traditional methods generally estimate the depth map for each image based on the multi-view stereo (MVS) algorithms and then fuse estimated depth maps into 3D models.
|
|
10
|
+
Traditional/recent methods [how they work], [what they achieve].
|
|
11
|
+
|
|
12
|
+
%% Example (Deep Snake): They may be sensitive to the inaccurate bounding box. Moreover, representing an object shape as dense binary pixels generally results in costly post-processing.
|
|
13
|
+
%% Example (ManhattanSDF): Although these methods achieve successful reconstruction in most cases, they have difficulty in handling low-textured regions, e.g., floors and walls of indoor scenes, due to the unreliable stereo matching in these regions.
|
|
14
|
+
However, they [limitation], because [xxx technical reason].
|
|
15
|
+
|
|
16
|
+
% Discuss traditional methods that used an insight similar to ours (implicitly backing our idea)
|
|
17
|
+
%% Example (Deep Snake): An alternative shape representation is the object contour, which is a set of vertices along the object silhouette. In contrast to pixel-based representation, a contour is not limited within a bounding box and has fewer parameters. Such a contour-based representation has long been used in image segmentation since the seminal work by Kass et al., which is well known as snakes or active contours.
|
|
18
|
+
%% Example (ManhattanSDF): To improve the reconstruction of low-textured regions, a typical approach is leveraging the planar prior of manmade scenes, which has long been explored in literature. A renowned example is the Manhattanworld assumption, i.e., the surfaces of man-made scenes should be aligned with three dominant directions.
|
|
19
|
+
To overcome this problem, a typical approach is [xxx insight], which has long been explored in literature.
|
|
20
|
+
|
|
21
|
+
These methods [how they work].
|
|
22
|
+
|
|
23
|
+
%% Example (Deep Snake): While many variants have been developed in literature, these methods are prone to local optima as the objective functions are handcrafted and typically nonconvex.
|
|
24
|
+
%% Example (ManhattanSDF): However, all of them focus on optimizing per-view depth maps instead of the full scene models in 3D space. As a result, depth estimation and plane segmentation could still be inconsistent among views, yielding suboptimal reconstruction quality as demonstrated by our experimental results in Section 5.3.
|
|
25
|
+
However, they [limitation], because [xxx technical reason].
|
|
26
|
+
|
|
27
|
+
% Then discuss newer methods and their remaining challenge (must lead to our solved challenge)
|
|
28
|
+
%% Example: There is a recent trend to represent 3D scenes as implicit neural representations and learn the representations from images with differentiable renderers. In particular, [49, 54, 55] use a signed distance field (SDF) to represent the scene and render it into images based on the sphere tracing or volume rendering. Thanks to the well-defined surfaces of SDFs, they recover high-quality 3D geometries from images.
|
|
29
|
+
To overcome this challenge, [xxx methods] [how they work], [what they achieve].
|
|
30
|
+
|
|
31
|
+
%% Example: However, these methods essentially rely on the multi-view photometric consistency to learn the SDFs. So they still suffer from poor performance in low-textured planar regions, as shown in Figure 1, as many plausible solutions may satisfy the photometric constraint in low-textured planar regions.
|
|
32
|
+
However, they [limitation], because [xxx technical reason].
|
|
33
|
+
```
|
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
# Technical Challenge Version 3 (Novel Task)
|
|
2
|
+
|
|
3
|
+
|
|
4
|
+
`Version 3: For novel tasks without direct methods, define the challenge directly and decompose it by requirement/challenge points.`
|
|
5
|
+
|
|
6
|
+
```latex
|
|
7
|
+
% To achieve xx goal, several requirements/challenges must be satisfied.
|
|
8
|
+
%% Example: In this work, our goal is to build a model that captures such object intrinsics from a single image. This problem is challenging for three reasons.
|
|
9
|
+
|
|
10
|
+
% Describe point 1
|
|
11
|
+
%% Example: First, we only have a single image. This makes our work fundamentally different from existing works on 3D-aware image generation models [8, 9, 27, 28], which typically require a large dataset of thousands of instances for training. In comparison, the single image contains at most a few dozen instances, making the inference problem highly under-constrained.
|
|
12
|
+
|
|
13
|
+
% Describe point 2
|
|
14
|
+
%% Example: Second, these already limited instances may vary significantly in pixel values. This is because they have different poses and illumination conditions, but neither of these factors are annotated or known. We also cannot resort to existing tools for pose estimation based on structure from motion, such as COLMAP [35], because the appearance variations violate the assumptions of epipolar geometry.
|
|
15
|
+
|
|
16
|
+
% Describe point 3
|
|
17
|
+
%% Example: Finally, the object intrinsics we aim to infer are probabilistic, not deterministic: no two roses in the natural world are identical, and we want to capture a distribution of their geometry, texture, and material to exploit the underlying multi-view information.
|
|
18
|
+
```
|
|
19
|
+
|
|
20
|
+
See also:
|
|
21
|
+
1. `references/examples/introduction/novel-task-challenge-decomposition.md`
|
|
@@ -0,0 +1,14 @@
|
|
|
1
|
+
# Introduction Version 1: Task First, Then Application
|
|
2
|
+
|
|
3
|
+
|
|
4
|
+
`Version 1: If the task is relatively niche, introduce the task first, then introduce applications.`
|
|
5
|
+
|
|
6
|
+
```latex
|
|
7
|
+
% Introduce Task (if the task is very familiar, this part can be skipped)
|
|
8
|
+
%% Example: Object pose estimation aims to estimate object's orientation and translation relative to a canonical frame from a single image.
|
|
9
|
+
[xxx task] targets at recovering/reconstructing/estimating [xxx output] from [xxx input].
|
|
10
|
+
|
|
11
|
+
% Introduce Application
|
|
12
|
+
%% Example: Accurate pose estimation is essential for a variety of applications such as augmented reality, autonomous driving and robotic manipulation.
|
|
13
|
+
[xxx task] has a variety of applications such as [xxx], [xxx], and [xxx].
|
|
14
|
+
```
|
|
@@ -0,0 +1,10 @@
|
|
|
1
|
+
# Introduction Version 2: Application First
|
|
2
|
+
|
|
3
|
+
|
|
4
|
+
`Version 2: If the task is already familiar to most readers, introduce applications directly.`
|
|
5
|
+
|
|
6
|
+
```latex
|
|
7
|
+
% Introduce Application
|
|
8
|
+
%% Example: Accurate pose estimation is essential for a variety of applications such as augmented reality, autonomous driving and robotic manipulation.
|
|
9
|
+
[xxx task] has a variety of applications such as [xxx], [xxx], and [xxx].
|
|
10
|
+
```
|
|
@@ -0,0 +1,14 @@
|
|
|
1
|
+
# Introduction Version 3: General Application -> Specific Setting
|
|
2
|
+
|
|
3
|
+
|
|
4
|
+
`Version 3: Introduce applications of the general task first, then introduce the specific task setting. (Personally recommended when the setting is relatively new.)`
|
|
5
|
+
|
|
6
|
+
```latex
|
|
7
|
+
% Introduce applications of the general task
|
|
8
|
+
%% Example: Accurate pose estimation is essential for a variety of applications such as augmented reality, autonomous driving and robotic manipulation.
|
|
9
|
+
[xxx task] has a variety of applications such as [xxx], [xxx], and [xxx].
|
|
10
|
+
|
|
11
|
+
% Introduce the specific task setting
|
|
12
|
+
%% Example: This paper focuses on the specific setting of recovering the 6DoF pose of an object, i.e., rotation and translation in 3D, from a single RGB image of that object.
|
|
13
|
+
This paper focuses on the specific setting of recovering/reconstructing/estimating [xxx output] from [xxx input].
|
|
14
|
+
```
|
|
@@ -0,0 +1,20 @@
|
|
|
1
|
+
# Introduction Version 4: Open with Application and Challenge
|
|
2
|
+
|
|
3
|
+
|
|
4
|
+
`Version 4: If the task is familiar, introduce applications directly and expose the target technical challenge in the opening paragraph via previous methods.`
|
|
5
|
+
|
|
6
|
+
Expert notes (faithful translation):
|
|
7
|
+
|
|
8
|
+
1. It is often good if the opening paragraph already states what we want to solve.
|
|
9
|
+
2. But this style requires suitable conditions and is less common.
|
|
10
|
+
3. Usually, several prior-method paragraphs are still needed before the target challenge becomes clear.
|
|
11
|
+
|
|
12
|
+
```latex
|
|
13
|
+
% Introduce Application
|
|
14
|
+
%% Example 1: Reconstructing 3D scenes from multi-view images is a cornerstone of many applications such as augmented reality, robotics, and autonomous driving.
|
|
15
|
+
%% Example 2: Instance segmentation is the cornerstone of many computer vision tasks, such as video analysis, autonomous driving, and robotic grasping, which require both accuracy and efficiency.
|
|
16
|
+
|
|
17
|
+
% Use previous methods to expose the target technical challenge
|
|
18
|
+
%% Example 1: Given input images, traditional methods [43, 44, 59] generally estimate the depth map for each image based on the multi-view stereo (MVS) algorithms and then fuse estimated depth maps into 3D models. Although these methods achieve successful reconstruction in most cases, they have difficulty in handling low-textured regions, e.g., floors and walls of indoor scenes, due to the unreliable stereo matching in these regions.
|
|
19
|
+
%% Example 2: Most of the state-of-the-art instance segmentation methods [18, 27, 5, 19] perform pixel-wise segmentation within a bounding box given by an object detector [36], which may be sensitive to the inaccurate bounding box. Moreover, representing an object shape as dense binary pixels generally results in costly post-processing.
|
|
20
|
+
```
|
package/package-assets/shared/skills/lab/references/paper-writing/examples/introduction-examples.md
ADDED
|
@@ -0,0 +1,25 @@
|
|
|
1
|
+
# Introduction Examples Index
|
|
2
|
+
|
|
3
|
+
All introduction example cites should point to the local files below.
|
|
4
|
+
|
|
5
|
+
## A. Task and Application Versions
|
|
6
|
+
|
|
7
|
+
1. Version 1: `references/examples/introduction/version-1-task-then-application.md`
|
|
8
|
+
2. Version 2: `references/examples/introduction/version-2-application-first.md`
|
|
9
|
+
3. Version 3: `references/examples/introduction/version-3-general-to-specific-setting.md`
|
|
10
|
+
4. Version 4: `references/examples/introduction/version-4-open-with-challenge.md`
|
|
11
|
+
|
|
12
|
+
## B. Technical Challenge Versions
|
|
13
|
+
|
|
14
|
+
1. Version 1 (existing task): `references/examples/introduction/technical-challenge-version-1-existing-task.md`
|
|
15
|
+
2. Version 2 (existing task + traditional insight backing): `references/examples/introduction/technical-challenge-version-2-existing-task-insight-backed-by-traditional.md`
|
|
16
|
+
3. Version 3 (novel task): `references/examples/introduction/technical-challenge-version-3-novel-task.md`
|
|
17
|
+
4. Novel-task decomposition examples: `references/examples/introduction/novel-task-challenge-decomposition.md`
|
|
18
|
+
|
|
19
|
+
## C. Pipeline-Introduction Versions
|
|
20
|
+
|
|
21
|
+
1. Version 1: `references/examples/introduction/pipeline-version-1-one-contribution-multi-advantages.md`
|
|
22
|
+
2. Version 2: `references/examples/introduction/pipeline-version-2-two-contributions.md`
|
|
23
|
+
3. Version 3: `references/examples/introduction/pipeline-version-3-new-module-on-existing-pipeline.md`
|
|
24
|
+
4. Version 4: `references/examples/introduction/pipeline-version-4-observation-driven.md`
|
|
25
|
+
5. Not recommended pattern: `references/examples/introduction/pipeline-not-recommended-abstract-only.md`
|
|
@@ -0,0 +1,67 @@
|
|
|
1
|
+
# Example of the Three Elements
|
|
2
|
+
|
|
3
|
+
This example uses `%` comments as annotations.
|
|
4
|
+
Each `% ...` annotation explains the paragraph(s) immediately below it.
|
|
5
|
+
|
|
6
|
+
```latex
|
|
7
|
+
\begin{quote}
|
|
8
|
+
\textbf{Annotation rule.} In this example, each line starting with \% labels the role of the paragraph(s) directly below it.
|
|
9
|
+
\end{quote}
|
|
10
|
+
|
|
11
|
+
\begin{itemize}
|
|
12
|
+
\item Module design (data structure)
|
|
13
|
+
\item Motivation of this module
|
|
14
|
+
\item Technical advantages of this module
|
|
15
|
+
\item Module design (forward process)
|
|
16
|
+
\end{itemize}
|
|
17
|
+
|
|
18
|
+
\subsection{3.1. Structured latent codes}
|
|
19
|
+
|
|
20
|
+
% Module design: introduce the module's data structure
|
|
21
|
+
To control the spatial locations of latent codes with the human pose, we anchor these latent codes to a deformable human body model (SMPL) [38]. SMPL is a skinned vertex-based model, which is defined as a function of shape parameters, pose parameters, and a rigid transformation relative to the SMPL coordinate system. The function outputs a posed 3D mesh with 6890 vertices. Specifically, we define a set of latent codes \( Z = \{z_1, z_2, ..., z_{6890}\} \) on vertices of the SMPL model. For the frame \( t \), SMPL parameters \( S_t \) are estimated from the multi-view images \( \{I_t^c \mid c = 1, ..., N_c\} \) using [26]. The spatial locations of the latent codes are then transformed based on the human pose \( S_t \) for the density and color regression. Figure 3 shows an example. The dimension of latent code \( z \) is set to 16 in our experiments.
|
|
22
|
+
|
|
23
|
+
% Technical advantages of this module
|
|
24
|
+
Similar to the local implicit representations [25, 5, 18], the latent codes are used with a neural network to represent the local geometry and appearance of a human. Anchoring these codes to a deformable model enables us to represent a dynamic human. With the dynamic human representation, we establish a latent variable model that maps the same set of latent codes to the implicit fields of density and color at different frames, which naturally integrates observations at different frames.
|
|
25
|
+
|
|
26
|
+
\subsection{3.2. Code diffusion}
|
|
27
|
+
|
|
28
|
+
% Motivation of this module
|
|
29
|
+
Figure 3(a) shows the process of code diffusion. The implicit fields assign the density and color to each point in the 3D space, which requires us to query the latent codes at continuous 3D locations. This can be achieved with the trilinear interpolation. However, since the structured latent codes are relatively sparse in the 3D space, directly interpolating the latent codes leads to zero vectors at most 3D points. To solve this problem, we diffuse the latent codes defined on the surface to nearby 3D space.
|
|
30
|
+
|
|
31
|
+
% Module design: introduce module design by describing the module forward process
|
|
32
|
+
Inspired by [65, 56, 49], we choose the SparseConvNet [21] to efficiently process the structured latent codes, whose architecture is described in Table 1. Specifically, based on the SMPL parameters, we compute the 3D bounding box of the human and divide the box into small voxels with voxel size of \( 5mm \times 5mm \times 5mm \). The latent code of a non-empty voxel is the mean of latent codes of SMPL vertices inside this voxel. SparseConvNet utilizes 3D sparse convolutions to process the input volume and output latent code volumes with \( 2\times, 4\times, 8\times, 16\times \) downsampled sizes. With the convolution and downsampling, the input codes are diffused to nearby space. Following [56], for any point in 3D space, we interpolate the latent codes from multi-scale code volumes of network layers 5, 9, 13, 17, and concatenate them into the final latent code. Since the code diffusion should not be affected by the human position and orientation in the world coordinate system, we transform the code locations to the SMPL coordinate system.
|
|
33
|
+
|
|
34
|
+
For any point \( \mathbf{x} \) in 3D space, we query its latent code from the latent code volume. Specifically, the point \( \mathbf{x} \) is first transformed to the SMPL coordinate system, which aligns the point and the latent code volume in 3D space. Then, the latent code is computed using the trilinear interpolation. For the SMPL parameters \( S_t \), we denote the latent code at point \( \mathbf{x} \) as \( \psi(\mathbf{x}, Z, S_t) \). The code vector is passed into MLP networks to predict the density and color for point \( \mathbf{x} \).
|
|
35
|
+
|
|
36
|
+
\subsection{3.3. Density and color regression}
|
|
37
|
+
|
|
38
|
+
Figure 3(b) overviews the regression of density and color for any point in 3D space. The density and color fields are represented by MLP networks. Details of network architectures are described in the supplementary material.
|
|
39
|
+
|
|
40
|
+
% Module design: introduce module design by describing the module forward process
|
|
41
|
+
\textbf{Density model.} For the frame \( t \), the volume density at point \( \mathbf{x} \) is predicted as a function of only the latent code \( \psi(\mathbf{x}, Z, S_t) \), which is defined as:
|
|
42
|
+
|
|
43
|
+
\[
|
|
44
|
+
\sigma_t(\mathbf{x}) = M_{\sigma}(\psi(\mathbf{x}, Z, S_t)),
|
|
45
|
+
\tag{1}
|
|
46
|
+
\]
|
|
47
|
+
|
|
48
|
+
where \( M_{\sigma} \) represents an MLP network with four layers.
|
|
49
|
+
|
|
50
|
+
% Module design: introduce the module's data structure
|
|
51
|
+
\textbf{Color model.} Similar to [37, 44], we take both the latent code \( \psi(\mathbf{x}, Z, S_t) \) and the viewing direction \( \mathbf{d} \) as input for the color regression. To model the location-dependent incident light, the color model also takes the spatial location \( \mathbf{x} \) as input. We observe that temporally-varying factors affect the human appearance, such as secondary lighting and self-shadowing. Inspired by the auto-decoder [48], we assign a latent embedding \( \ell_t \) for each video frame \( t \) to encode the temporally-varying factors.
|
|
52
|
+
|
|
53
|
+
% Module design: introduce module design by describing the module forward process
|
|
54
|
+
Specifically, for the frame \( t \), the color at \( \mathbf{x} \) is predicted as a function of the latent code \( \psi(\mathbf{x}, Z, S_t) \), the viewing direction \( \mathbf{d} \), the spatial location \( \mathbf{x} \), and the latent embedding \( \ell_t \). Following [51, 44], we apply the positional encoding to both the viewing direction \( \mathbf{d} \) and the spatial location \( \mathbf{x} \), which enables better learning of high frequency functions. The color model at frame \( t \) is defined as:
|
|
55
|
+
|
|
56
|
+
\[
|
|
57
|
+
c_t(\mathbf{x}) = M_c(\psi(\mathbf{x}, Z, S_t), \gamma_d(\mathbf{d}), \gamma_x(\mathbf{x}), \ell_t),
|
|
58
|
+
\tag{2}
|
|
59
|
+
\]
|
|
60
|
+
|
|
61
|
+
where \( M_c \) represents an MLP network with two layers, and \( \gamma_d \) and \( \gamma_x \) are positional encoding functions for viewing direction and spatial location, respectively. We set the dimension of \( \ell_t \) to 128 in experiments.
|
|
62
|
+
|
|
63
|
+
\subsection{3.4. Volume rendering}
|
|
64
|
+
|
|
65
|
+
% Module design: introduce module design by describing the module forward process
|
|
66
|
+
Given a viewpoint, we utilize the classical volume rendering techniques to render the Neural Body into a 2D image. The pixel colors are estimated via the volume rendering integral equation [27] that accumulates volume densities and colors along the corresponding camera ray. In practice, the integral is approximated using numerical quadrature [41, 44]. Given a pixel, we first compute its camera ray \( \mathbf{r} \) using the camera parameters and sample \( N_k \) points \( \{\mathbf{x}_k\}_{k=1}^{N_k} \) along camera ray \( \mathbf{r} \) between near and far bounds. The scene bounds are estimated based on the SMPL model. Then, Neural Body predicts volume densities and colors at these points. For the video frame \( t \), the rendered color \( \hat{C}_t(\mathbf{r}) \) ...
|
|
67
|
+
```
|
|
@@ -0,0 +1,10 @@
|
|
|
1
|
+
# Method Writing Common Issues (Reference Note)
|
|
2
|
+
|
|
3
|
+
Original source mentioned in your notes:
|
|
4
|
+
|
|
5
|
+
1. `Method writing common issues (PDF in your source notes)`
|
|
6
|
+
|
|
7
|
+
Usage recommendation:
|
|
8
|
+
|
|
9
|
+
1. Use this reference as a troubleshooting checklist after drafting Method.
|
|
10
|
+
2. Prioritize unclear motivation, broken flow, missing implementation details, and inconsistent terms.
|
|
@@ -0,0 +1,55 @@
|
|
|
1
|
+
# Module Design Example
|
|
2
|
+
|
|
3
|
+
This example uses `%` comments as annotations.
|
|
4
|
+
Each `% ...` annotation explains the paragraph(s) immediately below it.
|
|
5
|
+
|
|
6
|
+
```latex
|
|
7
|
+
\begin{quote}
|
|
8
|
+
\textbf{Annotation rule.} In this example, each line starting with \% labels the role of the paragraph(s) directly below it.
|
|
9
|
+
\end{quote}
|
|
10
|
+
|
|
11
|
+
\begin{itemize}
|
|
12
|
+
\item Motivation of this module
|
|
13
|
+
\item Module design (data structure)
|
|
14
|
+
\item Module design (forward process)
|
|
15
|
+
\end{itemize}
|
|
16
|
+
|
|
17
|
+
\section{3 \quad MULTIRESOLUTION HASH ENCODING}
|
|
18
|
+
|
|
19
|
+
% Motivation of this module
|
|
20
|
+
Given a fully connected neural network \(m(y;\Phi)\), we are interested in an encoding of its inputs \(y=\operatorname{enc}(x;\theta)\) that improves the approximation quality and training speed across a wide range of applications without incurring a notable performance overhead.
|
|
21
|
+
|
|
22
|
+
% Module design: introduce the module's data structure
|
|
23
|
+
Our neural network not only has trainable weight parameters \(\Phi\), but also trainable encoding parameters \(\theta\). These are arranged into \(L\) levels, each containing up to \(T\) feature vectors with dimensionality \(F\). Typical values for these hyperparameters are shown in Table 1. Figure 3 illustrates the steps performed in our multiresolution hash encoding. Each level (two of which are shown as red and blue in the figure) is independent and conceptually stores feature vectors at the vertices of a grid, the resolution of which is chosen to be a geometric progression between the coarsest and finest resolutions \([N_{\min},N_{\max}]\):
|
|
24
|
+
|
|
25
|
+
\[
|
|
26
|
+
N_l := \left\lfloor N_{\min}\cdot b^l \right\rfloor, \tag{2}
|
|
27
|
+
\]
|
|
28
|
+
|
|
29
|
+
\[
|
|
30
|
+
b := \exp\!\left(\frac{\ln N_{\max}-\ln N_{\min}}{L-1}\right). \tag{3}
|
|
31
|
+
\]
|
|
32
|
+
|
|
33
|
+
\(N_{\max}\) is chosen to match the finest detail in the training data. Due to the large number of levels \(L\), the growth factor is usually small. Our use cases have \(b\in[1.26,2]\).
|
|
34
|
+
|
|
35
|
+
% Module design: introduce module design by describing the module forward process
|
|
36
|
+
Consider a single level \(l\). The input coordinate \(x\in\mathbb{R}^d\) is scaled by that level's grid resolution before rounding down and up:
|
|
37
|
+
\[
|
|
38
|
+
\lfloor x_l \rfloor := \lfloor x\cdot N_l \rfloor,\quad
|
|
39
|
+
\lceil x_l \rceil := \lceil x\cdot N_l \rceil.
|
|
40
|
+
\]
|
|
41
|
+
|
|
42
|
+
\(\lfloor x_l \rfloor\) and \(\lceil x_l \rceil\) span a voxel with \(2^d\) integer vertices in \(\mathbb{Z}^d\). We map each corner to an entry in the level's respective feature vector array, which has fixed size of at most \(T\). For coarser levels where a dense grid requires fewer than \(T\) parameters, i.e. \((N_l+1)^d \le T\), this mapping is 1:1. At finer levels, we use a hash function \(h:\mathbb{Z}^d\rightarrow\mathbb{Z}_T\) to index into the array, effectively treating it as a hash table, although there is no explicit collision handling. We rely instead on the gradient-based optimization to store appropriate sparse detail in the array, and the subsequent neural network \(m(y;\Phi)\) for collision resolution. The number of trainable encoding parameters \(\theta\) is therefore \(O(T)\) and bounded by \(T\cdot L\cdot F\), which in our case is always \(T\cdot16\cdot2\) (Table 1).
|
|
43
|
+
|
|
44
|
+
We use a spatial hash function [Teschner et al. 2003] of the form
|
|
45
|
+
\[
|
|
46
|
+
h(x)=\left(\bigoplus_{i=1}^{d} x_i\pi_i\right)\bmod T, \tag{4}
|
|
47
|
+
\]
|
|
48
|
+
where \(\oplus\) denotes the bit-wise XOR operation and \(\pi_i\) are unique, large prime numbers. Effectively, this formula XORs the results of a per-dimension linear congruential (pseudo-random) permutation [Lehmer 1951], \emph{decorrelating} the effect of the dimensions on the hashed value. Notably, to achieve (pseudo-)independence, only \(d-1\) of the \(d\) dimensions must be permuted, so we choose \(\pi_1:=1\) for better cache coherence, \(\pi_2=2{,}654{,}435{,}761\), and \(\pi_3=805{,}459{,}861\).
|
|
49
|
+
|
|
50
|
+
Lastly, the feature vectors at each corner are \(d\)-linearly interpolated according to the relative position of \(x\) within its hypercube, i.e. the interpolation weight is \(w_l := x_l-\lfloor x_l \rfloor\).
|
|
51
|
+
|
|
52
|
+
Recall that this process takes place independently for each of the \(L\) levels. The interpolated feature vectors of each level, as well as auxiliary inputs \(\xi\in\mathbb{R}^E\) (such as the encoded view direction and textures in neural radiance caching), are concatenated to produce \(y\in\mathbb{R}^{LF+E}\), which is the encoded input \(\operatorname{enc}(x;\theta)\) to the MLP \(m(y;\Phi)\).
|
|
53
|
+
|
|
54
|
+
\textbf{Performance vs. quality.} Choosing the hash table size \(T\) provides a trade-off between performance, memory and quality. Higher values of \(T\) result in higher quality and lower performance. The memory ...
|
|
55
|
+
```
|
|
@@ -0,0 +1,15 @@
|
|
|
1
|
+
# Module Motivation Writing Patterns
|
|
2
|
+
|
|
3
|
+
|
|
4
|
+
`Module motivation is usually problem-driven: because a problem exists, we design xx to solve it.`
|
|
5
|
+
|
|
6
|
+
Typical opening sentences:
|
|
7
|
+
|
|
8
|
+
1. `A remaining problem/challenge is ...`
|
|
9
|
+
2. `However, we ...`
|
|
10
|
+
3. `Previous methods have difficulty in ...`
|
|
11
|
+
|
|
12
|
+
Usage note:
|
|
13
|
+
|
|
14
|
+
1. State the specific failure before introducing the module.
|
|
15
|
+
2. Keep motivation independent from implementation details.
|