NLP 数据增强

在机器学习领域，个人觉得有一个大前提：数据是永远不够的。虽然现在有很多吹嘘大数据，在自然语言处理领域，标注数据尤其匮乏，而且标注的质量也非常难控制。在这种情况下，数据增强是非常必要的，这对于模型的robustness和generalization都非常重要。

在不同NLP领域都有一些特定的数据增强的方法.

Task-independent data augmentation for NLP

However, in NLP, data augmentation is not widely used. In my mind, this is for two reasons:

There are a few research directions that would be interesting to pursue:

Data augmentation with style transfer:Investigate if style transfer can be used to modify various attributes of training examples for more robust learning.

Learn the augmentation:Similar to Dong et al. (2017) we could learn either to paraphrase or to generate transformations for a particular task.

Tutorial

(未完待续...)