Tuesday, July 23, 2013

Tuesday's Tip: Crowdsourcing at Digital Databases


"Crowdsourcing is the act of outsourcing tasks, traditionally performed by an employee or contractor, to an undefined, large group of people or community (a 'crowd'), through an open call." --Wikipedia

"Metadata describes other data. It provides information about a certain item's content. For example...a text document's metadata may contain information about how long the document is, who the author is, when the document was written, and a short summary of the document. Web pages often include metadata in the form of meta tags...Most search engines use this data when adding pages to their search index." --TechTerms.com

Recently, a number of digital archives websites have been inviting crowdsourcing in order to add metadata to their databases to allow ease of searching said databases. In case the previous sentence makes no sense to you, let me state it again in non-techy terms: Websites with databases full of historical digital images have been inviting volunteers to index the images, adding descriptions, comments, and tagging the digitized documents with names, dates, and places. Because these tend to be such vast projects, this crowdsourcing helps all of us by providing searchable, more usable databases attached to these digital images. We can find what we want when we can search on a name, date, or location, rather than having to browse through thousands or even millions of images.

One of the most notable examples of crowdsourcing was FamilySearch's 1940 U.S. Federal Census Indexing Project. Originally projected to take seven months, the census was completely name-indexed in four months, using 150,000 volunteers. Another well-known crowdsourcing project in the genealogy world is Ancestry's World Archives Project, indexing genealogical records from around the globe. Fold3 has annotating and comments features for its databases, although these are limited to subscribers only.


Additionally, some newer projects have been started for a couple of state digital archives projects. The first is Washington State Digital Archives. They have launched Scribe, a digital tool for adding metadata to their collections. A tutorial video can be viewed here. I tried Scribe out, and although I liked it, I have to admit it was not as efficient as FamilySearch's indexing tools. For example, the first document I indexed had three marriage licenses on it, yet there was only room in the Scribe tool to index one. I tried to see if there was a way to add more metadata for the other licenses on that page, without success. Also, I was unsure of what to do with initials, rather than names. Should "J. B. Smith" be indexed in the First, Middle, and Last Name fields as "J., B. Smith" or "J, B, Smith"? Some clarification and basic rules for complex situations would be beneficial. Nevertheless, I was excited to see the possibilities this has!
Another site with recent meta tagging capabilities is the California Digital Newspaper Collection. By creating a free account, one can add keyword tags and leave comments on various articles in the digitized newspaper collection. Correcting text misread by OCR software is also an option.


Crowdsourcing is definitely a way to give back to the genealogical and historical community. If you don't yet have a favorite volunteer project, I encourage you to pick one! Also, do you know of other similar crowdsourcing projects not mentioned? Please leave info and links in the comments below.

Pin It
Post a Comment