"Comparing algorithms for extracting content from web pages."

Remember, kids, it's only legal to extract content from web pages if the Terms of Service permit it.

That said, extractors compared: BTE (Python), Goose3 (Python), jusText (Python), Newspaper3k (Python), Readability (JavaScript), Resiliparse (Python), Trafilatura (Python), news-please (Python), Boilerpipe (Java), Dragnet (Python), ExtractNet (Python), Go DOM Distiller (Go), BoilerNet (Python + JavaScript), and Web2Text (Python).

Looks like if you want to extract content from web pages, you should be using Python.

Comparing algorithms for extracting content from web pages

#solidstatelife #developers

2