This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
deep_learning:incremental_learning:incremental_learning_papers [2021/08/08 15:26] jordan [iCaRL: Incremental Classifier and Representation Learning] |
deep_learning:incremental_learning:incremental_learning_papers [2023/03/08 16:11] (current) xujianglong ↷ Links adapted because of a move operation |
||
---|---|---|---|
Line 1: | Line 1: | ||
====== Incremental Learning Papers ====== | ====== Incremental Learning Papers ====== | ||
- | Refer to: [[https:// | + | Refer to: [[https:// |
===== Survey ===== | ===== Survey ===== | ||
All survey literatures are sorted by first submission date. | All survey literatures are sorted by first submission date. | ||
+ | |||
==== Online Continual Learning in Image Classification: | ==== Online Continual Learning in Image Classification: | ||
Line 20: | Line 21: | ||
**Methods**: | **Methods**: | ||
- | | + | * **Regularization**-based methods: **[[:deep_learning: |
- | * **Memory**-based methods: **A-GEM**(Averaged GEM), **[[deep_learning: | + | * **Memory**-based methods: **A-GEM** (Averaged GEM), **[[:deep_learning: |
- | * **Parameter-isolation**-based methods: **CN-DPM**(Continual Neural Dirichlet Process Mixture) | + | * **Parameter-isolation**-based methods: **CN-DPM** (Continual Neural Dirichlet Process Mixture) |
**Related Survey**: | **Related Survey**: | ||
* Not empirical | * Not empirical | ||
- | | + | |
- | * Continual learning for robotics: Definition, framework, learning strategies, opportunities and challenges | + | * Continual learning for robotics: Definition, framework, learning strategies, opportunities and challenges |
- | * Online continual learning on sequences, in: Recent Trends in Learning From Data | + | * Online continual learning on sequences, in: Recent Trends in Learning From Data |
* Empirical | * Empirical | ||
- | | + | |
- | * Three scenarios for continual learning | + | * Three scenarios for continual learning |
- | * Towards robust evaluations of continual learning | + | * Towards robust evaluations of continual learning |
- | * walk for incremental learning: Understanding forgetting and intransigence | + | * walk for incremental learning: Understanding forgetting and intransigence |
- | * A comprehensive, | + | * A comprehensive, |
- | * Measuring catastrophic forgetting in neural networks | + | * Measuring catastrophic forgetting in neural networks |
- | * [[deep_learning: | + | * [[:deep_learning: |
**Focus** | **Focus** | ||
Line 46: | Line 46: | ||
**Trends** | **Trends** | ||
- | * Raw-Data-Free Methods. Regularization(has theoretical limitations in the class incremental setting and cannot be used alone to reach decent performance), | + | * Raw-Data-Free Methods. Regularization(has theoretical limitations in the class incremental setting and cannot be used alone to reach decent performance), |
* Meta Learning. Such as MER, OML, iTAML, La-MAML, MERLIN, Continual-MAML. | * Meta Learning. Such as MER, OML, iTAML, La-MAML, MERLIN, Continual-MAML. | ||
* CL in Other Area: Object detection, RNN, language learning, dialogue systems, image captioning, sentiment classification, | * CL in Other Area: Object detection, RNN, language learning, dialogue systems, image captioning, sentiment classification, | ||
- | |||
==== A Comprehensive Study of Class Incremental Learning Algorithms for Visual Tasks ==== | ==== A Comprehensive Study of Class Incremental Learning Algorithms for Visual Tasks ==== | ||
Line 65: | Line 64: | ||
**Methods**: | **Methods**: | ||
- | * Model-Growth based methods: PNN(Progressive Neural Networks), DAN(Deep Adaptation Networks), PackNet, [[deep_learning: | + | * Model-Growth based methods: PNN(Progressive Neural Networks), DAN(Deep Adaptation Networks), PackNet, [[:deep_learning: |
- | * Fixed Presentation based methods: **DeeSIL**(Deep Shallow Incremental Learning), SVMs(Support Vector Machines), FearNet, **Deep-SLDA**(Deep Streaming Linear Discriminant Analysis), **REMIND**(REplay using Memory INDexing), ART(Adaptive Resonance Theroy), **FR**(Fixed Representation) | + | * Fixed Presentation based methods: **DeeSIL** (Deep Shallow Incremental Learning), SVMs(Support Vector Machines), FearNet, **Deep-SLDA** (Deep Streaming Linear Discriminant Analysis), **REMIND** (REplay using Memory INDexing), ART(Adaptive Resonance Theroy), **FR** (Fixed Representation) |
- | * Fine-Tuning based methods: **[[deep_learning: | + | * Fine-Tuning based methods: **[[:deep_learning: |
**Related Survey** | **Related Survey** | ||
Line 81: | Line 80: | ||
* handling class IL as an imbalanced learning problem provides very interesting results with or without the use of a distillation component. Here, we introduced a competitive method where classification bias in favor of new classes is reduced by using prior class probabilities. It would be interesting to **investigate more sophisticated bias reduction schemes to improve performance further**. | * handling class IL as an imbalanced learning problem provides very interesting results with or without the use of a distillation component. Here, we introduced a competitive method where classification bias in favor of new classes is reduced by using prior class probabilities. It would be interesting to **investigate more sophisticated bias reduction schemes to improve performance further**. | ||
- | * **a more in-depth investigation of why distillation fails to work for large scale datasets is needed**. The empirical findings reported here should be complemented with a more theoretical analysis to improve its usefulness. Already, the addition of inter-class separation is promising. More powerful distillation formulations, | + | * **a more in-depth investigation of why distillation fails to work for large scale datasets is needed**. The empirical findings reported here should be complemented with a more theoretical analysis to improve its usefulness. Already, the addition of inter-class separation is promising. More powerful distillation formulations, |
- | * the results obtained with herding based selection of exemplars are better compared to a random selection for all methods tested. **Further work in this direction could follow up on Mnemonics training and investigate in more depth which exemplar distribution is optimal for replay**. | + | * the results obtained with herding based selection of exemplars are better compared to a random selection for all methods tested. **Further work in this direction could follow up on Mnemonics training and investigate in more depth which exemplar distribution is optimal for replay**. |
- | * **the evaluation scenario should be made more realistic** by: (1) dropping the strong hypothesis that new data are readily annotated when they are streamed; (2) using a variable number of classes for the incremental states and (3) working with imbalanced datasets, which are more likely to occur in real-life applications than the controlled datasets tested until now. | + | * **the evaluation scenario should be made more realistic** |
==== Class-incremental learning: survey and performance evaluation on image classification ==== | ==== Class-incremental learning: survey and performance evaluation on image classification ==== | ||
Line 100: | Line 98: | ||
* Regularization methods | * Regularization methods | ||
- | | + | |
- | * Data regularization: | + | * Data regularization: |
- | * Recent development: | + | * Recent development: |
- | * Rehearsal methods: **[[deep_learning: | + | * Rehearsal methods: **[[:deep_learning: |
* Bias-correction methods | * Bias-correction methods | ||
- | | + | |
- | * **BiC**(Bias Correction) | + | * **BiC** (Bias Correction) |
- | * **LUCIR**(Learning a Unified Classifier Incremental via Rebalancing) | + | * **LUCIR** (Learning a Unified Classifier Incremental via Rebalancing) |
- | * **IL2M**(Class-IL with Dual Memory) | + | * **IL2M** (Class-IL with Dual Memory) |
**Related Survey** | **Related Survey** | ||
- | * [[deep_learning: | + | * [[:deep_learning: |
* Continual learning for robotics: Definition, framework, learning strategies, opportunities and challenges | * Continual learning for robotics: Definition, framework, learning strategies, opportunities and challenges | ||
* A comprehensive, | * A comprehensive, | ||
- | * [[deep_learning: | + | * [[:deep_learning: |
**Focus** | **Focus** | ||
* Class Incremental Learning | * Class Incremental Learning | ||
- | |||
**Trends** | **Trends** | ||
+ | |||
* Exemplar learning: an exciting new direction has emerged that parametrizes exemplars and optimizes them to prevent forgetting. Optimizing the available storage by computing more effificient exemplars is expected to attract more research in the coming years | * Exemplar learning: an exciting new direction has emerged that parametrizes exemplars and optimizes them to prevent forgetting. Optimizing the available storage by computing more effificient exemplars is expected to attract more research in the coming years | ||
* Feature rehearsal: Moving away from image replay towards different variants of feature replay is expected to gain traction. | * Feature rehearsal: Moving away from image replay towards different variants of feature replay is expected to gain traction. | ||
Line 130: | Line 126: | ||
* Meta-learning: | * Meta-learning: | ||
* Task-free settings: The transition to the task-free setting is not straight-forward, | * Task-free settings: The transition to the task-free setting is not straight-forward, | ||
+ | |||
==== A continual learning survey: Defying forgetting in classification tasks ==== | ==== A continual learning survey: Defying forgetting in classification tasks ==== | ||
Line 145: | Line 142: | ||
* Replay methods | * Replay methods | ||
- | | + | |
- | * Constrained methods: **GEM**, A-GEM, GSS. | + | * Constrained methods: **GEM**, A-GEM, GSS. |
- | * Pseudo Rehearsal methods: DGR, PR, CCLUGM, LGM | + | * Pseudo Rehearsal methods: DGR, PR, CCLUGM, LGM |
* Regularization-based methods | * Regularization-based methods | ||
- | | + | |
- | * Prior-focused methods: **[[deep_learning: | + | * Prior-focused methods: **[[:deep_learning: |
* Parameter isolation methods | * Parameter isolation methods | ||
- | | + | |
- | * Dynamic Architecture: | + | * Dynamic Architecture: |
**Related Survey** | **Related Survey** | ||
Line 169: | Line 166: | ||
* No task boundaries | * No task boundaries | ||
* Online learning without demanding offline training of large batches or separate tasks introduces fast acquisition of new information. | * Online learning without demanding offline training of large batches or separate tasks introduces fast acquisition of new information. | ||
- | * Forward transfer or zero-shot learning indicates the importance of previously acquired knowledge to aid the learning of new tasks by increased data efficiency. | + | * Forward transfer or zero-shot learning indicates the importance of previously acquired knowledge to aid the learning of new tasks by increased data efficiency. |
- | * Backward transfer aims at retaining previous knowledge and preferably improving it when learning future related tasks. | + | * Backward transfer aims at retaining previous knowledge and preferably improving it when learning future related tasks. |
- | * Problem agnostic. Continual learning is not limited to a specific setting (e.g. only classification). | + | * Problem agnostic. Continual learning is not limited to a specific setting (e.g. only classification). |
* Adaptive systems learn from available unlabeled data as well, opening doors for adaptation to specific user data. | * Adaptive systems learn from available unlabeled data as well, opening doors for adaptation to specific user data. | ||
* No test time oracle providing the task label should be required for prediction. | * No test time oracle providing the task label should be required for prediction. | ||
Line 191: | Line 188: | ||
**Methods**: | **Methods**: | ||
- | * Task specific components(sub-network per task): **XDG**(Context-dependent Gating) | + | * Task specific components(sub-network per task): **XDG** (Context-dependent Gating) |
- | * regularized optimization(differently regularizing parameters): | + | * regularized optimization(differently regularizing parameters): |
- | * Modifying Training Data(pseudo-data, | + | * Modifying Training Data(pseudo-data, |
- | * Using Exemplars(store data from previous tasks): **[[deep_learning: | + | * Using Exemplars(store data from previous tasks): **[[:deep_learning: |
**Related Survey** | **Related Survey** | ||
Line 209: | Line 206: | ||
None | None | ||
+ | |||
==== Continual Lifelong Learning with Neural Networks: A Review ==== | ==== Continual Lifelong Learning with Neural Networks: A Review ==== | ||
Line 223: | Line 221: | ||
**Methods**: | **Methods**: | ||
- | * Regularization methods: **[[deep_learning: | + | * Regularization methods: **[[:deep_learning: |
**Related Survey** | **Related Survey** | ||
Line 240: | Line 238: | ||
**Code**: [[https:// | **Code**: [[https:// | ||
+ | |||
+ | **First Submission**: | ||
+ | |||
+ | **Latest Submission**: | ||
**Focus**: image classification problems with Convolutional Neural Network classifiers. | **Focus**: image classification problems with Convolutional Neural Network classifiers. | ||
Line 253: | Line 255: | ||
**Related work** | **Related work** | ||
- | **Feature extraction**: | + | **Feature extraction**: |
- | **Drawback**: | + | |
- | **Fine-tuning**: | + | **Fine-tuning**: |
- | **Drawback**: | + | |
- | **Joint Training**: **All parameters $\theta_s$, $\theta_o$, $\theta_n$ are jointly optimized.** | + | **Joint Training**: **All parameters $\theta_s$, $\theta_o$, $\theta_n$ are jointly optimized.** |
- | **Drawback**: | + | |
- | {{:deep_learning: | + | {{deep_learning: |
**Algorithm of LwF** | **Algorithm of LwF** | ||
- | {{:deep_learning: | + | {{deep_learning: |
**Backbone** | **Backbone** | ||
Line 282: | Line 281: | ||
**Code**: [[https:// | **Code**: [[https:// | ||
+ | **First Submission**: | ||
+ | |||
+ | **Latest Submission**: | ||
+ | |||
+ | **First definition of class-incremental learning**: | ||
+ | |||
+ | The following three properties of an algorithm to qualify as class-incremental: | ||
+ | |||
+ | * It should be trainable from a stream of data in which examples of different classes occur at different times. | ||
+ | * It should at any time provide a competitive multi-class classifier for the classes observed so far. | ||
+ | * Its computational requirements and memory footprint should remain bounded, or at least grow very slowly, with respect to the number of classes so far. | ||
+ | |||
+ | **Components of iCaRL** | ||
+ | |||
+ | * classification by nearest-mean-of-exemplars rule instead of cnn | ||
+ | * prioritized exemplar selection based on herding | ||
+ | * representation learning using knowledge distillation and prototype rehearsal | ||
+ | |||
+ | **Introduction** | ||
+ | |||
+ | Classification | ||
+ | |||
+ | {{deep_learning: | ||
+ | |||
+ | Training | ||
+ | |||
+ | {{deep_learning: | ||
+ | |||
+ | **Why nearest-mean-of-exemplars classification** | ||
+ | |||
+ | NME overcomes 2 major problems of the IL settings: | ||
+ | |||
+ | * ccn outputs will change uncontrollably, | ||
+ | * We cannot make use of the true class mean, since all training data would have to be stored in order to recompute this quantity after a representation change. Instead, we use the average over a flexible number of exemplars that are chosen in a way to provide a good approximation to the class mean. | ||
+ | |||
+ | **Why representation learning** | ||
+ | |||
+ | * The representation learning step resembles ordinary network finetuning: starting from previously learned network weights it minimizes a loss function over a training set. | ||
+ | * Modification to fine-tuning: | ||
+ | |||
+ | **Exemplar management** | ||
+ | |||
+ | Overall, iCaRL’s steps for exemplar selection and reduction fit exactly to the incremental learning setting: the selection step is required for each class only once, when it is first observed and its training data is available. At later times, only the reduction step is called, which does not need access to any earlier training data. | ||
+ | |||
+ | **Related work** | ||
+ | |||
+ | * fixed data representation | ||
+ | * representation learning | ||
+ | * LwF | ||
+ | |||
+ | **Dataset** | ||
+ | |||
+ | CIFAR-100, ImageNet ILSVRC 2012 | ||
+ | |||
+ | **Future work** | ||
+ | |||
+ | * analyze the reasons for low performance in more detail with the goal of closing the remaining performance gap. | ||
+ | * study related scenarios in which the classifier cannot store any of the training data in raw form for privacy reasons. | ||