Projects with this topic
Sort by:
-
Plain text boilerplate removal using character n-gram frequency across a corpus. Builds a template model from a sample, filters files in a single linear pass, and validates automatically. Includes an obfuscated mode where the model is a set of integers and output filenames are hashed: the operator never reads the content. AWK for character processing, Bash for orchestration, Lisp layer planned for positional classification.
Updated