Proceedings of the VLDB Endowment. International Conference on Very Large Data Bases
Nov 1, 2017
Labeling training data is increasingly the largest bottleneck in deploying machine learning systems. We present Snorkel, a first-of-its-kind system that enables users to train state-of- the-art models without hand labeling any training data. Instead,...
Proceedings of the VLDB Endowment. International Conference on Very Large Data Bases
Jul 1, 2015
Populating a database with unstructured information is a long-standing problem in industry and research that encompasses problems of extraction, cleaning, and integration. Recent names used for this problem include dealing with dark data and knowledg...