知识库｜湖南麓川信息科技有限公司

Link to this comparison view

--- deep_learning:incremental_learning:incremental_learning_papers [2021/08/08 15:26]
jordan [iCaRL: Incremental Classifier and Representation Learning]
+++ deep_learning:incremental_learning:incremental_learning_papers [2023/03/08 16:11] (current)
xujianglong ↷ Links adapted because of a move operation
@@ Line 1: / Line 1: @@
 ====== Incremental Learning Papers ======
-Refer to: [[https://github.com/xialeiliu/Awesome-Incremental-Learning | Awesome Incremental Learning Papers]]
+Refer to: [[https://github.com/xialeiliu/Awesome-Incremental-Learning|Awesome Incremental Learning Papers]]
 ===== Survey =====
 All survey literatures are sorted by first submission date.
 ==== Online Continual Learning in Image Classification: An Empirical Survey ====
@@ Line 20: / Line 21: @@
 **Methods**:
-  * **Regularization**-based methods: **[[deep_learning:incremental_learning:EWC]]**(Elastic Weight Consolidation)++, **[[deep_learning:incremental_learning:incremental_learning_papers#Learning without Forgetting|LwF]]**(Learning without Forgetting)
+   * **Regularization**-based methods: **[[:deep_learning:incremental_learning:ewc|EWC]]** (Elastic Weight Consolidation)++, **[[:deep_learning:incremental_learning:incremental_learning_papers#learning_without_forgetting|LwF]]** (Learning without Forgetting)
-  * **Memory**-based methods: **A-GEM**(Averaged GEM), **[[deep_learning:incremental_learning:iCaRL]]**(Incremental Classifier and Representation Learning), **ER**(Experience Replay), **MIR**(Maximally Interfered Retrieval), **GSS**(Gradient based Sample Selection), **GDumb**(Greedy Sampler and Dumb Learner)
+  * **Memory**-based methods: **A-GEM** (Averaged GEM), **[[:deep_learning:incremental_learning:incremental_learning_papers#icarlincremental_classifier_and_representation_learning|iCaRL]]** (Incremental Classifier and Representation Learning), **ER** (Experience Replay), **MIR** (Maximally Interfered Retrieval), **GSS** (Gradient based Sample Selection), **GDumb** (Greedy Sampler and Dumb Learner)
-  * **Parameter-isolation**-based methods: **CN-DPM**(Continual Neural Dirichlet Process Mixture)
+  * **Parameter-isolation**-based methods: **CN-DPM** (Continual Neural Dirichlet Process Mixture)
 **Related Survey**:
   * Not empirical
-    * [[deep_learning:incremental_learning:incremental_learning_papers#Continual Lifelong Learning with Neural Networks: A Review]]
+      * [[:deep_learning:incremental_learning:incremental_learning_papers#continual_lifelong_learning_with_neural_networksa_review|Continual Lifelong Learning with Neural Networks: A Review]]
-    * Continual learning for robotics: Definition, framework, learning strategies, opportunities and challenges
+      * Continual learning for robotics: Definition, framework, learning strategies, opportunities and challenges
-    * Online continual learning on sequences, in: Recent Trends in Learning From Data
+      * Online continual learning on sequences, in: Recent Trends in Learning From Data
   * Empirical
-    * Re-evaluating continual learning scenarios: A categorization and case for strong baselines
+      * Re-evaluating continual learning scenarios: A categorization and case for strong baselines
-    * Three scenarios for continual learning
+      * Three scenarios for continual learning
-    * Towards robust evaluations of continual learning
+      * Towards robust evaluations of continual learning
-    * walk for incremental learning: Understanding forgetting and intransigence
+      * walk for incremental learning: Understanding forgetting and intransigence
-    * A comprehensive, application-oriented study of catastrophic forgetting in DNNs
+      * A comprehensive, application-oriented study of catastrophic forgetting in DNNs
-    * Measuring catastrophic forgetting in neural networks
+      * Measuring catastrophic forgetting in neural networks
-    * [[deep_learning:incremental_learning:incremental_learning_papers#A continual learning survey: Defying forgetting in classification tasks|A continual learning survey: Defying forgetting in classification tasks]]
+      * [[:deep_learning:incremental_learning:incremental_learning_papers#a_continual_learning_surveydefying_forgetting_in_classification_tasks|A continual learning survey: Defying forgetting in classification tasks]]
 **Focus**
@@ Line 46: / Line 46: @@
 **Trends**
-  * Raw-Data-Free Methods. Regularization(has theoretical limitations in the class incremental setting and cannot be used alone to reach decent performance), Generative replay(not viable for more complex datasets), Feature Replay(latent features of the old samples at a given layer are relayed instead of raw data, one way is to freeze—the learning of all the layers before the feature extraction layer, another way is to generate latent features with a generative model), approaches not require storing the raw data(such as SDC, [[deep_learning:incremental_learning:DMC]], DSLDA)
+  * Raw-Data-Free Methods. Regularization(has theoretical limitations in the class incremental setting and cannot be used alone to reach decent performance), Generative replay(not viable for more complex datasets), Feature Replay(latent features of the old samples at a given layer are relayed instead of raw data, one way is to freeze—the learning of all the layers before the feature extraction layer, another way is to generate latent features with a generative model), approaches not require storing the raw data(such as SDC, [[:deep_learning:incremental_learning:dmc|DMC]], DSLDA)
   * Meta Learning. Such as MER, OML, iTAML, La-MAML, MERLIN, Continual-MAML.
   * CL in Other Area: Object detection, RNN, language learning, dialogue systems, image captioning, sentiment classification, sentence representation learning, recommender system, on-the-job learning
 ==== A Comprehensive Study of Class Incremental Learning Algorithms for Visual Tasks ====
@@ Line 65: / Line 64: @@
 **Methods**:
-  * Model-Growth based methods: PNN(Progressive Neural Networks), DAN(Deep Adaptation Networks), PackNet, [[deep_learning:incremental_learning:MAS]](Memory Aware Synapses), SOMs(Self-Organizing Maps), NG(Neural Gas), PROPRE(PROjection-PREdiction), NGPCA(Neural Gas with local Principal Component Analysis), DYNG(Dynamic Online Growing Neural Gas), TOPIC(TOpology-Preserving knowledge InCrementer), ILVQ(Incremental Learning Vector Quantization)
+  * Model-Growth based methods: PNN(Progressive Neural Networks), DAN(Deep Adaptation Networks), PackNet, [[:deep_learning:incremental_learning:mas|MAS]](Memory Aware Synapses), SOMs(Self-Organizing Maps), NG(Neural Gas), PROPRE(PROjection-PREdiction), NGPCA(Neural Gas with local Principal Component Analysis), DYNG(Dynamic Online Growing Neural Gas), TOPIC(TOpology-Preserving knowledge InCrementer), ILVQ(Incremental Learning Vector Quantization)
-  * Fixed Presentation based methods: **DeeSIL**(Deep Shallow Incremental Learning), SVMs(Support Vector Machines), FearNet, **Deep-SLDA**(Deep Streaming Linear Discriminant Analysis), **REMIND**(REplay using Memory INDexing), ART(Adaptive Resonance Theroy), **FR**(Fixed Representation)
+  * Fixed Presentation based methods: **DeeSIL** (Deep Shallow Incremental Learning), SVMs(Support Vector Machines), FearNet, **Deep-SLDA** (Deep Streaming Linear Discriminant Analysis), **REMIND** (REplay using Memory INDexing), ART(Adaptive Resonance Theroy), **FR** (Fixed Representation)
-  * Fine-Tuning based methods: **[[deep_learning:incremental_learning:iCaRL]]**, **[[deep_learning:incremental_learning:incremental_learning_papers#Learning without Forgetting|LwF]]**, LwM(Learning without Memorizing), M2KD(Multi-model and Multi-level Knowledge Distillation), **LUCIR**(Learning a Unified Classifier Incrementally via Rebalance), PODNet, **BiC**(Bias Correction), **IL2M**(Incremental Learning with Dual Memory), MDF(Maintaining Discrimination and Fairness), **ScaIL**(Classifier Weights Scailing for Class Inrcremental Learning), SIW(Standardization of Initial Weights), SDC(Semantic Drift Compensation), GAN Memory with No Forgetting, **FT**.
+  * Fine-Tuning based methods: **[[:deep_learning:incremental_learning:incremental_learning_papers#icarlincremental_classifier_and_representation_learning|iCaRL]]**, **[[:deep_learning:incremental_learning:incremental_learning_papers#learning_without_forgetting|LwF]]**, LwM(Learning without Memorizing), M2KD(Multi-model and Multi-level Knowledge Distillation), **LUCIR** (Learning a Unified Classifier Incrementally via Rebalance), PODNet, **BiC** (Bias Correction), **IL2M** (Incremental Learning with Dual Memory), MDF(Maintaining Discrimination and Fairness), **ScaIL** (Classifier Weights Scailing for Class Inrcremental Learning), SIW(Standardization of Initial Weights), SDC(Semantic Drift Compensation), GAN Memory with No Forgetting, **FT**.
 **Related Survey**
@@ Line 81: / Line 80: @@
   * handling class IL as an imbalanced learning problem provides very interesting results with or without the use of a distillation component. Here, we introduced a competitive method where classification bias in favor of new classes is reduced by using prior class probabilities. It would be interesting to **investigate more sophisticated bias reduction schemes to improve performance further**.
-  *  **a more in-depth investigation of why distillation fails to work for large scale datasets is needed**. The empirical findings reported here should be complemented with a more theoretical analysis to improve its usefulness. Already, the addition of inter-class separation is promising. More powerful distillation formulations, such as the relational knowledge distillation, also hold promise.
+  * **a more in-depth investigation of why distillation fails to work for large scale datasets is needed**. The empirical findings reported here should be complemented with a more theoretical analysis to improve its usefulness. Already, the addition of inter-class separation is promising. More powerful distillation formulations, such as the relational knowledge distillation, also hold promise.
-  *  the results obtained with herding based selection of exemplars are better compared to a random selection for all methods tested. **Further work in this direction could follow up on Mnemonics training and investigate in more depth which exemplar distribution is optimal for replay**.
+  * the results obtained with herding based selection of exemplars are better compared to a random selection for all methods tested. **Further work in this direction could follow up on Mnemonics training and investigate in more depth which exemplar distribution is optimal for replay**.
-  *  **the evaluation scenario should be made more realistic** by: (1) dropping the strong hypothesis that new data are readily annotated when they are streamed; (2) using a variable number of classes for the incremental states and (3) working with imbalanced datasets, which are more likely to occur in real-life applications than the controlled datasets tested until now.
+  * **the evaluation scenario should be made more realistic**  by: (1) dropping the strong hypothesis that new data are readily annotated when they are streamed; (2) using a variable number of classes for the incremental states and (3) working with imbalanced datasets, which are more likely to occur in real-life applications than the controlled datasets tested until now.
 ==== Class-incremental learning: survey and performance evaluation on image classification ====
@@ Line 100: / Line 98: @@
   * Regularization methods
-    * Weight regularization: **[[deep_learning:incremental_learning:EWC]]**(Elastic Weight Consolidation), **PathInt**(Path Integral), **[[deep_learning:incremental_learning:MAS]]**(Memory Aware Synapses), **RWalk**(Riemanian Walk)
+      * Weight regularization: **[[:deep_learning:incremental_learning:ewc|EWC]]** (Elastic Weight Consolidation), **PathInt** (Path Integral), **[[:deep_learning:incremental_learning:mas|MAS]]** (Memory Aware Synapses), **RWalk** (Riemanian Walk)
-    * Data regularization: **[[deep_learning:incremental_learning:incremental_learning_papers#Learning without Forgetting|LwF]]**(Learn without Forgetting), LFL(less-forgetting learning), Encoder-based lifelong learning
+      * Data regularization: **[[:deep_learning:incremental_learning:incremental_learning_papers#learning_without_forgetting|LwF]]** (Learn without Forgetting), LFL(less-forgetting learning), Encoder-based lifelong learning
-    * Recent development: **LwM**(Learning without memorizing), **[[deep_learning:incremental_learning:DMC]]**(Deep Model Consolidation), **GD**(Global Distillation), Less-Forget constraint
+      * Recent development: **LwM** (Learning without memorizing), **[[:deep_learning:incremental_learning:dmc|DMC]]** (Deep Model Consolidation), **GD** (Global Distillation), Less-Forget constraint
-  * Rehearsal methods: **[[deep_learning:incremental_learning:iCaRL]]**
+  * Rehearsal methods: **[[:deep_learning:incremental_learning:incremental_learning_papers#icarlincremental_classifier_and_representation_learning|iCaRL]]**
   * Bias-correction methods
-    * **EEIL**(End-to-End Incremental Learning)
+      * **EEIL** (End-to-End Incremental Learning)
-    * **BiC**(Bias Correction)
+      * **BiC** (Bias Correction)
-    * **LUCIR**(Learning a Unified Classifier Incremental via Rebalancing)
+      * **LUCIR** (Learning a Unified Classifier Incremental via Rebalancing)
-    * **IL2M**(Class-IL with Dual Memory)
+      * **IL2M** (Class-IL with Dual Memory)
 **Related Survey**
-  * [[deep_learning:incremental_learning:incremental_learning_papers#Continual Lifelong Learning with Neural Networks: A Review]]
+  * [[:deep_learning:incremental_learning:incremental_learning_papers#continual_lifelong_learning_with_neural_networksa_review|Continual Lifelong Learning with Neural Networks: A Review]]
   * Continual learning for robotics: Definition, framework, learning strategies, opportunities and challenges
   * A comprehensive, application-oriented study of catastrophic forgetting in dnns
-  * [[deep_learning:incremental_learning:incremental_learning_papers#A continual learning survey: Defying forgetting in classification tasks|A continual learning survey: Defying forgetting in classification tasks]]
+  * [[:deep_learning:incremental_learning:incremental_learning_papers#a_continual_learning_surveydefying_forgetting_in_classification_tasks|A continual learning survey: Defying forgetting in classification tasks]]
 **Focus**
   * Class Incremental Learning
 **Trends**
   * Exemplar learning: an exciting new direction has emerged that parametrizes exemplars and optimizes them to prevent forgetting. Optimizing the available storage by computing more effificient exemplars is expected to attract more research in the coming years
   * Feature rehearsal: Moving away from image replay towards different variants of feature replay is expected to gain traction.
@@ Line 130: / Line 126: @@
   * Meta-learning: these techniques to be further developed in the coming years, and will start to obtain results on more complex datasets like the ones considered in our evaluation.
   * Task-free settings: The transition to the task-free setting is not straight-forward, since many methods have inherent operations that are performed on the task boundaries: replacing the old model, updating of importance weights, etc.
 ==== A continual learning survey: Defying forgetting in classification tasks ====
@@ Line 145: / Line 142: @@
   * Replay methods
-    * Rehearsal methods: **[[deep_learning:incremental_learning:iCaRL]]**(implement in multi-head fashion since iCaRL is class incremental method), CoPE(Continual Prototype Evolution), ER, SER, TEM
+      * Rehearsal methods: **[[:deep_learning:incremental_learning:incremental_learning_papers#icarlincremental_classifier_and_representation_learning|iCaRL]]** (implement in multi-head fashion since iCaRL is class incremental method), CoPE(Continual Prototype Evolution), ER, SER, TEM
-    * Constrained methods: **GEM**, A-GEM, GSS.
+      * Constrained methods: **GEM**, A-GEM, GSS.
-    * Pseudo Rehearsal methods: DGR, PR, CCLUGM, LGM
+      * Pseudo Rehearsal methods: DGR, PR, CCLUGM, LGM
   * Regularization-based methods
-    * Data-focused methods: **[[deep_learning:incremental_learning:incremental_learning_papers#Learning without Forgetting|LwF]]**, LFL, **EBLL**, [[deep_learning:incremental_learning:DMC]]
+      * Data-focused methods: **[[:deep_learning:incremental_learning:incremental_learning_papers#learning_without_forgetting|LwF]]**, LFL, **EBLL**, [[:deep_learning:incremental_learning:dmc|DMC]]
-    * Prior-focused methods: **[[deep_learning:incremental_learning:EWC]]**, **IMM**, **SI**, R-EWC, **[[deep_learning:incremental_learning:MAS]]**, Riemannian Walk
+      * Prior-focused methods: **[[:deep_learning:incremental_learning:ewc|EWC]]**, **IMM**, **SI**, R-EWC, **[[:deep_learning:incremental_learning:mas|MAS]]**, Riemannian Walk
   * Parameter isolation methods
-    * Fixed Network: **PackNet**, PathNet, PiggyBack, **HAT**
+      * Fixed Network: **PackNet**, PathNet, PiggyBack, **HAT**
-    * Dynamic Architecture: PNN, Expert Gate, RCL, DAN
+      * Dynamic Architecture: PNN, Expert Gate, RCL, DAN
 **Related Survey**
@@ Line 169: / Line 166: @@
   * No task boundaries
   * Online learning without demanding offline training of large batches or separate tasks introduces fast acquisition of new information.
-  *  Forward transfer or zero-shot learning indicates the importance of previously acquired knowledge to aid the learning of new tasks by increased data efficiency.
+  * Forward transfer or zero-shot learning indicates the importance of previously acquired knowledge to aid the learning of new tasks by increased data efficiency.
-  *  Backward transfer aims at retaining previous knowledge and preferably improving it when learning future related tasks.
+  * Backward transfer aims at retaining previous knowledge and preferably improving it when learning future related tasks.
-  *  Problem agnostic. Continual learning is not limited to a specific setting (e.g. only classification).
+  * Problem agnostic. Continual learning is not limited to a specific setting (e.g. only classification).
   * Adaptive systems learn from available unlabeled data as well, opening doors for adaptation to specific user data.
   * No test time oracle providing the task label should be required for prediction.
@@ Line 191: / Line 188: @@
 **Methods**:
-  * Task specific components(sub-network per task): **XDG**(Context-dependent Gating)
+  * Task specific components(sub-network per task): **XDG** (Context-dependent Gating)
-  * regularized optimization(differently regularizing parameters): **[[deep_learning:incremental_learning:EWC]]**(Elastic Weight Consolidation), **SI**(Synaptic Intelligence)
+  * regularized optimization(differently regularizing parameters): **[[:deep_learning:incremental_learning:ewc|EWC]]** (Elastic Weight Consolidation), **SI** (Synaptic Intelligence)
-  * Modifying Training Data(pseudo-data, generate samples): **[[deep_learning:incremental_learning:incremental_learning_papers#Learning without Forgetting|LwF]]**(Learning without Forgetting), **DGR**(Deep Generative Replay)
+  * Modifying Training Data(pseudo-data, generate samples): **[[:deep_learning:incremental_learning:incremental_learning_papers#learning_without_forgetting|LwF]]** (Learning without Forgetting), **DGR** (Deep Generative Replay)
-  * Using Exemplars(store data from previous tasks): **[[deep_learning:incremental_learning:iCaRL]]**
+  * Using Exemplars(store data from previous tasks): **[[:deep_learning:incremental_learning:incremental_learning_papers#icarlincremental_classifier_and_representation_learning|iCaRL]]**
 **Related Survey**
@@ Line 209: / Line 206: @@
 None
 ==== Continual Lifelong Learning with Neural Networks: A Review ====
@@ Line 223: / Line 221: @@
 **Methods**:
-  * Regularization methods: **[[deep_learning:incremental_learning:EWC]]**(Elastic Weight Consolidation)++, **[[deep_learning:incremental_learning:incremental_learning_papers#Learning without Forgetting|LwF]]**(Learning without Forgetting)
+  * Regularization methods: **[[:deep_learning:incremental_learning:ewc|EWC]]** (Elastic Weight Consolidation)++, **[[:deep_learning:incremental_learning:incremental_learning_papers#learning_without_forgetting|LwF]]** (Learning without Forgetting)
 **Related Survey**
@@ Line 240: / Line 238: @@
 **Code**: [[https://github.com/ngailapdi/LWF|code on github]]
+**First Submission**: 2016-06-29
+**Latest Submission**: 2017-02-14
 **Focus**: image classification problems with Convolutional Neural Network classifiers.
@@ Line 253: / Line 255: @@
 **Related work**
-**Feature extraction**: **$\theta_s$ and $\theta_o$ are unchanged**, and the outputs of one or more layers are used as features for the new tasks in training $\theta_n$.
+**Feature extraction**: **$\theta_s$ and $\theta_o$ are unchanged**, and the outputs of one or more layers are used as features for the new tasks in training $\theta_n$. **Drawback**: Feature extraction typically underperforms on the new task because the shared parameters fail to represent some information that is discriminative for the new task.
-**Drawback**: Feature extraction typically underperforms on the new task because the shared parameters fail to represent some information that is discriminative for the new task.
-**Fine-tuning**: $\theta_s$ and $\theta_n$ are both optimized for the new tasks, while **$\theta_o$ is fixed**.
+**Fine-tuning**: $\theta_s$ and $\theta_n$ are both optimized for the new tasks, while **$\theta_o$ is fixed**. **Drawback**: Fine-tuning degrades performance on previously learned tasks because the shared parameters change without new guidance for the original task-specifific prediction parameters
-**Drawback**: Fine-tuning degrades performance on previously learned tasks because the shared parameters change without new guidance for the original task-specifific prediction parameters
-**Joint Training**: **All parameters $\theta_s$, $\theta_o$, $\theta_n$ are jointly optimized.**
+**Joint Training**: **All parameters $\theta_s$, $\theta_o$, $\theta_n$ are jointly optimized.**  **Drawback**: Joint training becomes increasingly cumbersome in training as more tasks are learned and is not possible if the training data for previously learned tasks is unavailable.
-**Drawback**: Joint training becomes increasingly cumbersome in training as more tasks are learned and is not possible if the training data for previously learned tasks is unavailable.
-{{:deep_learning:incremental_learning:illustration_lwf_others.png|}}
+{{deep_learning:incremental_learning:illustration_lwf_others.png}}
 **Algorithm of LwF**
-{{:deep_learning:incremental_learning:procedure_lwf.png|}}
+{{deep_learning:incremental_learning:procedure_lwf.png}}
 **Backbone**
@@ Line 282: / Line 281: @@
 **Code**: [[https://github.com/srebuffi/iCaRL|code on GitHub]]
+**First Submission**: 2016-11-23
+**Latest Submission**: 2017-04-14
+**First definition of class-incremental learning**:
+The following three properties of an algorithm to qualify as class-incremental:
+  * It should be trainable from a stream of data in which examples of different classes occur at different times.
+  * It should at any time provide a competitive multi-class classifier for the classes observed so far.
+  * Its computational requirements and memory footprint should remain bounded, or at least grow very slowly, with respect to the number of classes so far.
+**Components of iCaRL**
+  * classification by nearest-mean-of-exemplars rule instead of cnn
+  * prioritized exemplar selection based on herding
+  * representation learning using knowledge distillation and prototype rehearsal
+**Introduction**
+Classification
+{{deep_learning:incremental_learning:icarl_algorithm1.png?nolink&405x242}}
+Training
+{{deep_learning:incremental_learning:icarl_algorithm2.png?nolink&398x316}}
+**Why nearest-mean-of-exemplars classification**
+NME overcomes 2 major problems of the IL settings:
+  * ccn outputs will change uncontrollably, which is observable as catastrophic forgetting, in contrast, NME rule does not have decoupled weight vectors. The class-prototypes automatically change whenever the feature representation changes, making the classifier robust against changes of the feature respresentation.
+  * We cannot make use of the true class mean, since all training data would have to be stored in order to recompute this quantity after a representation change. Instead, we use the average over a flexible number of exemplars that are chosen in a way to provide a good approximation to the class mean.
+**Why representation learning**
+  * The representation learning step resembles ordinary network finetuning: starting from previously learned network weights it minimizes a loss function over a training set.
+  * Modification to fine-tuning: the training set consists of new training exemplars and the stored exemplars; The loss function is augmented, including standard classification loss and distillation loss
+**Exemplar management**
+Overall, iCaRL’s steps for exemplar selection and reduction fit exactly to the incremental learning setting: the selection step is required for each class only once, when it is first observed and its training data is available. At later times, only the reduction step is called, which does not need access to any earlier training data.
+**Related work**
+  * fixed data representation
+  * representation learning
+  * LwF
+**Dataset**
+CIFAR-100, ImageNet ILSVRC 2012
+**Future work**
+  * analyze the reasons for low performance in more detail with the goal of closing the remaining performance gap.
+  * study related scenarios in which the classifier cannot store any of the training data in raw form for privacy reasons.

知识库｜湖南麓川信息科技有限公司

User Tools

Site Tools

Page Tools