Publication Date

9-21-2021

Document Type

Article

Department

Justice Studies

Disciplines

Criminology and Criminal Justice | Forensic Science and Technology

Publication Title

Child Abuse & Neglect

Volume

122

DOI

10.1016/j.chiabu.2021.105336

Abstract

Background
Automated detection of child sexual abuse images (CSAI) often relies on image attributes, such as hash values. However, electronic service providers and others without access to hash value databases are limited in their ability to detect CSAI. Additionally, the increasing amount of CSA content being distributed means that a large percentage of images are not yet cataloged in hash value databases. Therefore, additional detection criteria need to be determined to improve identification of non-hashed CSAI.
Objective
We aim to identify patterns in the locations and folder/file naming practices of websites hosting and displaying CSAI, to use as additional detection criteria for non-hashed CSAI.
Methods
Using a custom-designed web crawler and snowball sampling, we analyzed the locations and naming practices of 103 Surface Web websites hosting and/or displaying 8108 known CSAI hash values.
Results
Websites specialize in either hosting or displaying CSAI with only 20% doing both. Neither hosting nor displaying websites fear repercussions. Over 27% of CSAI were displayed in the home directory (i.e., main page) with only 6% located in at least 4th-level sub-folder. Websites focused more on organizing images than hiding them with 68% of hosted and 54% of displayed CSAI being found in folders formatted year/month. Qualitatively, hosting websites were likely to use alphanumeric or disguised folder and file names to conceal images, while displaying websites were more explicit.
Conclusion
File and folder naming patterns can be combined with existing criteria to improve automated detection of websites and website locations likely hosting and/or displaying CSAI.

Keywords

Child sexual exploitation images, Child sexual abuse images, Child pornography, Limitation of hash values, Websites, Automated data collection

Comments

This is the Version of Record and can also be read online here.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License.

Share

COinS