Transfer learning (TL) is a technique in
machine learning (ML) in which knowledge learned from a task is re-used in order to boost performance on a related task.[1] For example, for
image classification, knowledge gained while learning to
recognize cars could be applied when trying to recognize trucks. This topic is related to the psychological literature on
transfer of learning, although practical ties between the two fields are limited. Reusing/transferring information from previously learned tasks to new tasks has the potential to significantly improve learning efficiency.[2]
In 1976, Bozinovski and Fulgosi published a paper addressing transfer learning in
neural network training.[4][5] The paper gives a mathematical and geometrical model of the topic. In 1981, a report considered the application of transfer learning to a dataset of images representing letters of computer terminals, experimentally demonstrating positive and negative transfer learning.[6]
In 1992, Pratt formulated the discriminability-based transfer (DBT) algorithm.[7]
In 1997, Pratt and
Thrun guest-edited a special issue of Machine Learning devoted to transfer learning,[8] and by 1998, the field had advanced to include
multi-task learning,[9] along with more formal theoretical foundations.[10]Learning to Learn,[11] edited by Thrun and Pratt, is a 1998 review of the subject.
Transfer learning has been applied in
cognitive science. Pratt guest-edited an issue of Connection Science on reuse of neural networks through transfer in 1996.[12]
In the 2020 paper, "Rethinking Pre-Training and self-training",[16] Zoph et al. reported that pre-training can hurt accuracy, and advocate self-training instead.
In 2020, it was discovered that, due to their similar physical natures, transfer learning is possible between
electromyographic (EMG) signals from the muscles and classifying the behaviors of
electroencephalographic (EEG) brainwaves, from the
gesture recognition domain to the mental state recognition domain. It was noted that this relationship worked in both directions, showing that
electroencephalographic can likewise be used to classify EMG.[27] The experiments noted that the accuracy of
neural networks and
convolutional neural networks were improved[28] through transfer learning both prior to any learning (compared to standard random weight distribution) and at the end of the learning process (asymptote). That is, results are improved by exposure to another domain. Moreover, the end-user of a pre-trained model can change the structure of fully-connected layers to improve performance.[29]
Software
Several compilations of transfer learning and domain adaptation algorithms have been implemented:
^Stevo. Bozinovski and Ante Fulgosi (1976). "The influence of pattern similarity and transfer learning upon the training of a base perceptron B2." (original in Croatian) Proceedings of Symposium Informatica 3-121-5, Bled.
^S. Bozinovski (1981). "Teaching space: A representation concept for adaptive pattern classification." COINS Technical Report, the University of Massachusetts at Amherst, No 81-28 [available online: UM-CS-1981-028.pdf]
^Mihalkova, Lilyana; Huynh, Tuyen; Mooney, Raymond J. (July 2007),
"Mapping and Revising Markov Logic Networks for Transfer"(PDF), Learning Proceedings of the 22nd AAAI Conference on Artificial Intelligence (AAAI-2007), Vancouver, BC, pp. 608–614, retrieved 2007-08-05{{
citation}}: CS1 maint: location missing publisher (
link)
^Niculescu-Mizil, Alexandru; Caruana, Rich (March 21–24, 2007),
"Inductive Transfer for Bayesian Network Structure Learning"(PDF), Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics (AISTATS 2007), retrieved 2007-08-05
^Hajiramezanali, E. & Dadaneh, S. Z. & Karbalayghareh, A. & Zhou, Z. & Qian, X. Bayesian multi-domain learning for cancer subtype discovery from next-generation sequencing count data. 32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montréal, Canada.
arXiv:
1810.09433
^Arief-Ang, I.B.; Hamilton, M.; Salim, F.D. (2018-12-01). "A Scalable Room Occupancy Prediction with Transferable Time Series Decomposition of CO2 Sensor Data". ACM Transactions on Sensor Networks. 14 (3–4): 21:1–21:28.
doi:
10.1145/3217214.
S2CID54066723.
^Maitra, D. S.; Bhattacharya, U.; Parui, S. K. (August 2015). "CNN based common approach to handwritten character recognition of multiple scripts". 2015 13th International Conference on Document Analysis and Recognition (ICDAR). pp. 1021–1025.
doi:
10.1109/ICDAR.2015.7333916.
ISBN978-1-4799-1805-8.
S2CID25739012.