Kolloq. Rebekah Overdorf, topic: Blogs, Comments, and Twitter Feeds: A Study of Domain Adaptation in Stylometry
15.7.2014, 2:00 pm, FMI 03.07.023 (MI-Building, Campus Garching), this talk will be held in English
Abstract: Stylometry is a form of authorship attribution that relies on the linguistic information found in documents. This paper focuses on the cross-domain case where the known and suspect documents differ in what setting they were created in. These domains include Twitter feeds, blog entries, Reddit comments, and emails. We determine that state-of-the-art methods in stylometry do not perform as well in cross-domain situations as they do in in-domain situations and propose a method that increases the accuracy in the cross-domain setting. We are able to improve the accuracy of cross-domain stylometry to as high as 80%. Being able to identify authors across domains facilitates linking identities across the Internet making this a key security and privacy concern; users can take other measures to ensure their anonymity, but due to their unique writing style, they may not be as anonymous as they believe.