Code Monkey home page Code Monkey logo

pystemon's People

Contributors

andrec10002 avatar asjidkalam avatar certxlm avatar chervaliery avatar cvandeplas avatar desoulter avatar deventual avatar goncalor avatar gurulhu avatar jalewis avatar lesleyxyz avatar mathieubaeumler avatar obert01 avatar osagit avatar pclr avatar rafa-dot-el avatar rafiot avatar rommelfs avatar timgates42 avatar trolldbois avatar ubaze avatar yaleman avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pystemon's Issues

HTTPlib error proxy error http://pastie.org/pastes

Hoping this is just temporary issue as just setup pystemon up:

Error 521 Ray ID: 4508f07cc11ba6d7 โ€ข 2018-08-26
Web server is down

Failed to download the page because of other HTTPlib error proxy error http://pastie.org/pastes trying again. [2018-08-26 ] Retry 1/100 for http://pastie.org/pastes [2018-08-26 ] Failed to download the page because of other HTTPlib error proxy error http://pastie.org/pastes trying again.

Is this a regular issue with the above website?

pastebin.com regex error

pystemon[19253]: No last pasties matches for regular expression site:pastebin.com regex:<a href="/(\w{8})">.+</a></td>. Error in your regex? Dumping htmlPage #012 <!DOCTYPE HTML>#012#011<head>#012#011#011<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />#012#011#011<title>Pastes Archive - Pastebin.com</title>#012#011#011<link rel="shortcut icon" href="/favicon.ico" />#012#011#011<script src="/js/jquery.min.js"></script>#012#011#011<script src="/js/pastebin.min.js"></script>#012#011#011<link href="/i/pastebin.min.css" rel="stylesheet" type="text/css" />#012#011#011<!--[if lt IE 10]>#012#011#011#011<link href="/i/pastebin.ie8.css" rel="stylesheet" type="text/css" />#012#011#011<![endif]-->#012#012 #012#011#011<style>body{-webkit-text-size-adjust:none;}</style>#012#011#011#011#011<meta property="fb:app_id" content="231493360234820" />#012#011#011<meta property="og:title" content="Pastes Archive - Pastebin.com" />#012#011#011<meta property="og:type" content="article" />#012#011#011<meta property="og:url" content="https://pastebin.com/archive" />#012#011#011<meta property="og:image" content="https://pastebin.com/i/facebook.png" />#012#011#011<meta property="og:site_name" content="Pastebin" />#012#011#011<meta name="google-site-verification" content="jkUAIOE8owUXu8UXIhRLB9oHJsWBfOgJbZzncqHoF4A" />#012#011#011<link rel="canonical" href="https://pastebin.com/archive" />#012#011#011#011#011<meta name="viewport" content="width=device-width, initial-scale=0.70, maximum-scale=1.0, user-scalable=yes">#012#011#011#012#011#011<script>#012#011#011#011(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){#012#011#011#011(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),#012#011#011#011m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)#012#011#011#011})(window,document,'script','//www.google-analytics.com/analytics.js','ga');#012#012#011#011#011ga('create', 'UA-58643-34', 'auto');#012#011#011#011ga('require', 'displayfeatures');#012#011#011#011ga('send', 'pageview');#012#011#011</script>#012#011#011<script type="text/javascript">#012#011#011#011if (top != self)#012#011#011#011#011top.location.href = location.href;#012#011#011</script>#012#011</head>#012#011<body>#012#011<div id="main_frame">#012#011#011<div id="jq-dropdown-1" class="jq-dropdown jq-dropdown-anchor-right jq-dropdown-scroll">#012#011#011#011<ul class="jq-dropdown-menu">#012#011#011#011#011#015#012#011#011#011#011<li class="lih_640">#015#012#011#011#011#011#011<form class="search_form_li" name="search_form_li" method="get" action="/search" id="cse-search-box-li">#015#012#011#011#011#011#011#011<input class="search_input_li" type="text" name="q" size="5" value="" placeholder="search..." />#015#012#011#011#011#011#011</form>#015#012#015#012#011#011#011#011</li>#015#012#011#011#011#011<li class="lih_div"></li>#015#012#011#011#011#011<li onclick="location.href='/signup'" class="dd_su">Sign Up</li>#015#012#011#011#011#011<li onclick="location.href='/login'" class="dd_lo">Login</li>#015#012#011#011#011#011<li class="lih_div"></li>#015#012#011#011#011#011<li onclick="location.href='/api'" class="lih_640">API</li>#015#012#011#011#011#011<li onclick="location.href='/faq'" class="lih_640">FAQ</li>#015#012#011#011#011#011<li onclick="location.href='/tools'" class="lih_640">Tools</li>#015#012#011#011#011#011<li onclick="location.href='/trends'" class="lih_640">Trends</li>#015#012#011#011#011#011<li onclick="location.href='/archive'" class="lih_640">Archive</li>#011#011#011</ul>#012#011#011</div>#012#011#011<div id="header">#012#011#011#011<div id="header_wrap">#012#011#011#011#011<div id="header_top">#012#011#011#011#011#011<div id="header_logo" onclick="location.href='/'">PASTEBIN</div>#012#011#011#011#011#011<div id="header_new_paste" class="new_paste_button" onclick="location.href='/'">new paste</div>#012#011#011#011#011#011<div id="header_links">#012#011#011#011#011#011#011<a href="/trends">trends</a>#012#011#011#011#011#011#011<a href="/api" class="mmh">API</a>#012#011#011#011#011#011#011<a href="/tools" class="mmh">tools</a>#012#011#011#011#011#011#011<a href="/faq" class="mmh">faq</a>#012#011#011#011#011#011</div>#012#011#011#011#011#011<div id="header_search">#012#011#011#011#011#011#011<form class="search_form" name="search_form" method="get" action="/search" id="cse-search-box">#012#011#011#011#011#011#011#011<input class="search_input" type="text" name="q" size="5" value="" placeholder="search..." />#012#011#011#011#011#011#011</form>#012#011#011#011#011#011</div>#012#011#011#011#011#011#015#012#011#011#011#011#011<div id="header_members">#015#012#011#011#011#011#011#011<div id="header_dropdown" data-jq-dropdown="#jq-dropdown-1">&nbsp;</div>#015#012#011#011#011#011#011#011<div id="header_icon"><a href="/login"><img src="/i/guest.png" class="header_icon" alt="" /></a></div>#015#012#011#011#011#011#011#011<div id="header_user_frame">#015#012#011#011#011#011#011#011#011<div id="header_username">Guest User</div>#015#012#011#011#011#011#011#011#011<div id="header_user_status">-</div>#015#012#011#011#011#011#011#011</div>#015#012#011#011#011#011#011#011<div id="header_icons">#015#012#011#011#011#011#011#011#011<a href="/login" title="My Pastebin"><img src="/i/t.gif" class="header_icons hi_mypastebin" alt="" /></a>#015#012#011#011#011#011#011#011#011<a href="/messages" title="My Messages"><img src="/i/t.gif" class="header_icons hi_messages" alt="" /></a>#015#012#011#011#011#011#011#011#011<a href="/alerts" title="My Alerts"><img src="/i/t.gif" class="header_icons hi_alerts" alt="" /></a>#015#012#011#011#011#011#011#011#011<a href="/settings" title="My Settings"><img src="/i/t.gif" class="header_icons hi_settings" alt="" /></a>#015#012#011#011#011#011#011#011</div>#015#012#011#011#011#011#011</div>#011#011#011#011</div>#012#011#011#011</div>#012#011#011</div>#012#011#011<div id="super_frame">#012#011#011#011<div id="monster_frame">#012#011#011#011#011<div id="content_frame">#012#011#011#011#011#011<div id="content_right">#011#011#011#011#011#011#012#011#011#011#011#011#011#011#011#011#011#011#011<div class="content_right_menu">#015#012#011#011#011#011#011#011#011#011#011<div class="content_right_title"><a href="/archive">Public Pastes</a></div>#015#012#011#011#011#011#011#011#011#011#011<div id="menu_2">#015#012#011#011#011#011#011#011#011#011#011#011<ul class="right_menu"><li><a href="/aJFbuCy2">Untitled</a><span>T-SQL | 15 sec ago</span></li><li><a href="/0W5mCKcJ">Untitled</a><span>PHP | 15 sec ago</span></li><li><a href="/ETVPpL2C">Untitled</a><span>21 sec ago</span></li><li><a href="/U2c9t6w6">Untitled</a><span>22 sec ago</span></li><li><a href="/a8jZ7dzF">Untitled</a><span>24 sec ago</span></li><li><a href="/EEpjM3LS">Untitled</a><span>31 sec ago</span></li><li><a href="/41RLME91">Untitled</a><span>32 sec ago</span></li><li><a href="/C6bv8q3q">Untitled</a><span>32 sec ago</span></li></ul></div></div>#011#011#011#011#011#011<div id="abrpm2"></div>#012#011#011#011#011#011#011#015#012#011#011#011<div style="padding: 0; width:160px;margin: 10px 0;clear:left;">#015#012#011#011#011#011<script type="text/javascript"><!--#015#012#011#011#011#011#011e9 = new Object();#015#012#011#011#011#011 e9.size = "160x600,120x600";#015#012#011#011#011#011//--></script>#015#012#011#011#011#011<script type="text/javascript" src="https://tags.expo9.exponential.com/tags/Pastebincom/Unsure/tags.js"></script>#015#012#011#011#011</div>#011#011#011#011#011#011<div id="steadfast" title="Pastebin is proudly hosted by Steadfast.net" onclick="location.href='http://steadfast.net/?utm_source=pastebin.com&amp;utm_medium=referral&amp;utm_content=hosting_by_banner&amp;utm_campaign=referral_20140118_x_x_pastebin_partner&amp;source=referral_20140118_x_x_pastebin_partner'"></div>#012#011#011#011#011#011</div>#012#011#011#011#011#011<div id="content_left"><div id="ie_msg"></div>#012#011#011#015#012#011#011#011<div id="abrpm"></div>#015#012#011#011#011<div class="banner_728">#015#012#011#011#011#011<script type="text/javascript"><!--#015

pystemon not receiving any matches

Hi,

Thanks for the great tool. However, I have been running the tool for sometime now but I cant seem to get any matches.

It is downloading the pasties but the alerts folder is still empty and i have not received a single match from any of the websites. The regex are as simple as searching for the word function just to test and still no matches.

Could you please help ?

implement block notification

Pastebin informs the user when you access the site to actively

Implement a general function that matches some keywords in the html like

  • Pastebin: "has temporarily blocked your computer"

Actions to implement:

  • in the OO model on the PastieSite object
  • log the blocking and send an email notification (if email alerting is on)
  • On more long term: remove proxy ip temporarily, throttle access.

sleep time between download of pasties

Allow some dynamic sleep time between the download of the pasties.
Related to #15 stats of queues so the user can see if his sleep timings are letting the queue grow indefinitely

include yara scan

Implement yara scanning of the pastie.
Options would thus be: regex OR yara-file

Error importing BeautifulSoup

Hi,

Nothing major but if you do an update for bug fixes at some point.

It throws an error when running Pystemon.
ERROR: Cannot import the BeautifulSoup 3 Python library. Are you sure you installed it?

This is due to the import being:
from BeautifulSoup import BeautifulSoup

BS4 uses:
from bs4 import BeautifulSoup

Thanks,

No such file or Directory

Some minutes after launching the feeder it gives this error and everything stops:

Traceback (most recent call last):
File "pystemon-feeder.py", line 64, in
messagedata = open(pystemonpath+paste).read()
IOError: [Errno 2] No such file or directory: '/home/gt/pystemon/archive/codepad.org/2018/01/03/8RsQZOlJ.gz'

The directory doesn't really exist and I made a FLUSHALL to redis but the problem persists.

Any idea?
Thanks,

string indices must be integers, not str

Hello,

CentOS 6.4 64-bit, Python 2.7.3 and the latest PyYAML and BeautifulSoup installed with easy_installer.

After launching pystemon I get a whole lot of this:

Found 10 new pasties for site nopaste.me
ThreadPasties for codepad.org crashed unexpectectly, recovering...: string indices must be integers, not str
Found 30 new pasties for site cdv.lt
Found 20 new pasties for site pastie.org
Found 20 new pasties for site snipt.net
ThreadPasties for pastebin.com crashed unexpectectly, recovering...: string indices must be integers, not str
ThreadPasties for codepad.org crashed unexpectectly, recovering...: string indices must be integers, not str
ThreadPasties for slexy.org crashed unexpectectly, recovering...: string indices must be integers, not str
ThreadPasties for pastie.org crashed unexpectectly, recovering...: string indices must be integers, not str
Found 13 new pasties for site pastesite.com
ThreadPasties for pastebin.com crashed unexpectectly, recovering...: string indices must be integers, not str
ThreadPasties for codepad.org crashed unexpectectly, recovering...: string indices must be integers, not str
ThreadPasties for cdv.lt crashed unexpectectly, recovering...: string indices must be integers, not str
ThreadPasties for pastie.org crashed unexpectectly, recovering...: string indices must be integers, not str
ThreadPasties for slexy.org crashed unexpectectly, recovering...: string indices must be integers, not str
ThreadPasties for pastebin.com crashed unexpectectly, recovering...: string indices must be integers, not str

Am I to assume things are working and I can ignore the crashed part?

pastebin.ru error

Failed to download the page because of other HTTPlib error proxy error http://pastebin.ru/ trying again.

pastebin pro - httplib

Hi,

Im getting
Failed to download the page because of other HTTPlib error proxy error http://pastebin.com/api_scrape_item.php?i=2BgPDRi1 trying again

but if I do :
curl "https://scrape.pastebin.com/api_scraping.php?limit=250"
it works

Any ideas what could be going wrong?
Ive also played with the network option in the yaml, but always get

[2020-01-28 11:51:33,018] Error in configuration file:
[2020-01-28 11:51:33,018] error position: (1:9)

Any ideas?

pastebin.ca regex error

No last pasties matches for regular expression site:pastebin.ca regex:rel="/preview.php\?id=(\d+). Error in your regex? Dumping htmlPage #012 <?xml version="1.0" encoding="utf-8"?>#012<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">#012<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="" lang="">#012<head>#012 <title>pastebin - Type, paste, share.</title>#012 <meta name="microid" content="ca0462a24e49b118730aa3ba02c4e6cc5a55cd2d"/>#012 <meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>#012 <script type="text/javascript">#012//<![CDATA[#012try{if (!window.CloudFlare) {var CloudFlare=[{verbose:0,p:0,byc:0,owlid:"cf",bag2:1,mirage2:0,oracle:0,paths:{cloudflare:"/cdn-cgi/nexp/dok3v=1613a3a185/"},atok:"562c088a39c1bb7971cde1dfe7a5cc2a",petok:"015d3265ec133392a1e4c2c915eb50a5492261f3-1491187013-1800",zone:"pastebin.ca",rocket:"m",apps:{}}];document.write('<script type="text/javascript" src="//ajax.cloudflare.com/cdn-cgi/nexp/dok3v=f2befc48d1/cloudflare.min.js"><'+'\/script>');}}catch(e){};#012//]]>#012</script>#012<link rel="stylesheet" href="https://pastebin.ca/pb-g.css" type="text/css"/>#012 <link rel="icon" href="https://pastebin.ca/pastebin.ico" type="image/x-icon"/>#012 <link rel="shortcut icon" href="https://pastebin.ca/pastebin.ico"#012 type="image/x-icon"/>#012 <link rel="alternate" href="http://en.pastebin.ca/"#012 hreflang="en" title="English Translation"/>#012 <link rel="alternate" href="http://fr.pastebin.ca/"#012 hreflang="fr" title="French Translation"/>#012 <link rel="alternate" href="http://de.pastebin.ca/"#012 hreflang="de" title="German Translation"/>#012 <link rel="alternate" href="http://ja.pastebin.ca/"#012 hreflang="ja" title="Japanese Translation"/>#012 <link href="mailto:[email protected]" rev="made"/>#012 <link rel="alternate" type="application/rss+xml" title="Posts" href="/rss/posts.rss"/>#012 <link rel="alternate" type="application/rss+xml" title="News" href="/rss/news.rss"/>#012 <link rel="Help" href="/what.php"/>#012 <script type="text/javascript" src="https://code.jquery.com/jquery-1.12.4.min.js"></script>#012 <script type="text/javascript" src="https://code.jquery.com/ui/1.12.0/jquery-ui.min.js"></script>#012 <script type="text/javascript" src="/jquery.cluetip.min.js"></script>#012 <script src="https://pastebin.ca/pb-h.js?2" type="text/javascript"></script>#012<script type="text/javascript" async src="https://www.google.com/recaptcha/api.js"></script>#012<script type="text/javascript">#012 var _paq = _paq || [];#012 _paq.push(["setDomains", ["*.pastebin.ca"]]);#012 _paq.push(['trackPageView']);#012 _paq.push(['enableLinkTracking']);#012 (function() {#012 var u="//pw.vocti.ca/";#012 _paq.push(['setTrackerUrl', u+'piwik.php']);#012 _paq.push(['setSiteId', '3']);#012 var d=document, g=d.createElement('script'), s=d.getElementsByTagName('script')[0];#012 g.type='text/javascript'; g.async=true; g.defer=true; g.src=u+'piwik.js'; s.parentNode.insertBefore(g,s);#012 })();#012</script>#012</head>#012<body>#012 <div id="header">#012 <h1><span style="color:#003366">paste</span>bin - Type, paste, share.</h1>#012 </div>#012 <div id="grprun">Part of <a href="http://slepp.ca/">Slepp's Projects</a> &mdash; <a href="http://pastebin.ca/">Pastebin</a> &mdash;#012 <a href="http://turl.ca/">TURL</a> &mdash; <a href="http://imagebin.ca/">Imagebin</a> &mdash; <a#012 href="http://filebin.ca/">Filebin</a></div>#012 <div id="runner"><a href="/feedback.php">Feedback</a> --#012 <a href="http://en.pastebin.ca/"#012 class="sprite sprite-ca">English</a>#012 <a href="http://fr.pastebin.ca/"#012 class="sprite sprite-fr">French</a>#012 <a href="http://de.pastebin.ca/"#012 class="sprite sprite-de">German</a>#012 <a href="http://ja.pastebin.ca/"#012 class="sprite sprite-jp">Japanese</a>#012 </div>#012 <script type="text/javascript">showRunnerMenu();</script>#012 <form method="get" action="/search.php">#012 <div id="topmenu"><a href="new.php" title="Create a new Paste|Follow this link to create a brand new paste."#012 class="jt">Create</a> <a href="upload.php"#012 title="Upload Text, Images or Files|By following this link, you can upload a a text or source file, upload an image, or upload a file!"#012 class="jt">Upload</a> <a href="newest.php">Newest</a> <a#012 href="tools.php">Tools</a> <a href="donate.php">Donate</a> <input type="text" name="q"#012 size="10"/><input type="submit"#012 value="Go"/>#012 </div>#012 </form>#012 <div id="body">#012 <div id="sl"><div class="bl"><div class="br"><div class="tl"><div class="tr"><div class="menu" id="idmenu0"><div class="menutitle"><h2>Stuff to Do</h2></div><div class="items" id="idmenu0-collapse"><div class="link"><a href="/new.php" class="sprite sprite-tab_new">New Post</a>#012</div>#012<div class="link"><a href="/upload.php" class="sprite sprite-top">Upload a Post</a>#012</div>#012<div class="link"><a href="/newest.php" class="sprite sprite-recur">Goto Newest</a>#012</div>#012<div class="link"><a href="/search.php" class="sprite sprite-search">Search</a>#012</div>#012<div class="link"><a href="/tools.php" class="sprite sprite-runprog">Tools / APIs</a>#012</div>#012<div class="link"><a href="/donate.php" class="sprite sprite-emoticon">Donate</a>#012</div>#012</div>#012</div>#012<div class="menu" id="idmenu1"><div class="menutitle"><h2>Information</h2></div><div class="items" id="idmenu1-collapse"><div class="link"><a href="/news.php" class="sprite sprite-comment">Site News</a>#012</div>#012<div class="link"><a href="/what.php" class="sprite sprite-documentinfo">What is This?</a>#012</div>#012</div>#012</div>#012#012 <div class="menu" id="id2243084">#012 <div class="menutitle">#012 <h2>Quick Search</h2>#012 </div>#012#012 <div id="id2243084-collapse">#012 <form method="get" action="/search.php">#012 <fieldset id="searchbar" class="searchbar">#012 <input type="text" name="q" size="15" style="width:10em" class="input-box"/>#012 <br/>#012 <input type="submit" value="Search" class="submit-button"#012 onclick="this.value='Searching...'"/>#012 <br/>#012 </fieldset>#012 </form>#012 <form action="http://pastebin.ca/google.php" id="cse-search-box">#012 <fieldset id="googlebar" class="searchbar">#012 <input type="hidden" name="cx" value="partner-pub-0367252804969302:1yimxphzru5"/>#012 <input type="hidden" name="cof" value="FORID:10"/>#012 <input type="hidden" name="ie" value="UTF-8"/>#012 <input type="text" name="q" id="sbi" style="width:10em" class="input-box"/>#012 <input type="submit" name="sa" value="Google Search" id="sbb" class="submit-button"/>#012 </fieldset>#012 </form>#012 </div>#012 </div>#012 <div class="menu" id="idmenurecent">#012 <div class="menutitle"><h2>Recent Posts</h2></div>#012 <div class="items" id="idmenurecent-collapse">#012 </div></div></div></div></div></div></div><div id="content"><div style="text-align:center;width:100%;ba

file not found always ... PastieSite[x]

archive/PastieSite[codepad.org]/2021/08/25/ZavEsjK5.gz
Error: /home/project/pystemon/archive/PastieSite[codepad.org]/2021/08/25/ZavEsjK5.gz, file not found
archive/PastieSite[ideone.com]/2021/08/25/I2G0lF.gz
Error: /home/project/pystemon/archive/PastieSite[ideone.com]/2021/08/25/I2G0lF.gz, file not found
archive/PastieSite[ideone.com]/2021/08/25/oiDxl6.gz
Error: /home/project/pystemon/archive/PastieSite[ideone.com]/2021/08/25/oiDxl6.gz, file not found
archive/PastieSite[paste.org.ru]/2021/08/25/cg38do.gz
Error: /home/project/pystemon/archive/PastieSite[paste.org.ru]/2021/08/25/cg38do.gz, file not found
archive/PastieSite[pastebin.fr]/2021/08/25/94467.gz
Error: /home/project/pystemon/archive/PastieSite[pastebin.fr]/2021/08/25/94467.gz, file not found
archive/PastieSite[ideone.com]/2021/08/25/61Oxkk.gz
Error: /home/project/pystemon/archive/PastieSite[ideone.com]/2021/08/25/61Oxkk.gz, file not found
archive/PastieSite[codepad.org]/2021/08/25/KpsDHlre.gz
Error: /home/project/pystemon/archive/PastieSite[codepad.org]/2021/08/25/KpsDHlre.gz, file not found
archive/PastieSite[codepad.org]/2021/08/25/qmk33o1O.gz
Error: /home/project/pystemon/archive/PastieSite[codepad.org]/2021/08/25/qmk33o1O.gz, file not found
archive/PastieSite[ideone.com]/2021/08/25/P0nYhp.gz
Error: /home/project/pystemon/archive/PastieSite[ideone.com]/2021/08/25/P0nYhp.gz, file not found
archive/PastieSite[gist.github.com]/2021/08/25/sammolk_85fa80406634fac1360f72ce74c79866.gz
Error: /home/project/pystemon/archive/PastieSite[gist.github.com]/2021/08/25/sammolk_85fa80406634fac1360f72ce74c79866.gz, file not found
archive/PastieSite[pastebin.fr]/2021/08/25/94468.gz
Error: /home/project/pystemon/archive/PastieSite[pastebin.fr]/2021/08/25/94468.gz, file not found
archive/PastieSite[codepad.org]/2021/08/25/SsbJC2MY.gz
Error: /home/project/pystemon/archive/PastieSite[codepad.org]/2021/08/25/SsbJC2MY.gz, file not found
archive/PastieSite[codepad.org]/2021/08/25/b3i76O8R.gz
Error: /home/project/pystemon/archive/PastieSite[codepad.org]/2021/08/25/b3i76O8R.gz, file not found
archive/PastieSite[paste.org.ru]/2021/08/25/ic2wbj.gz
Error: /home/project/pystemon/archive/PastieSite[paste.org.ru]/2021/08/25/ic2wbj.gz, file not found
archive/PastieSite[ideone.com]/2021/08/25/0VJcxK.gz

Inside the cd /pystemon/archive/ you will find only name without PastieSite[]

anyone having this any fix ?

UTF-8 handling of pasties?

Seems like utf-8 handling fails:

I.e.
ThreadPasties for pastesite.com crashed unexpectectly, recovering...: 'ascii' codec can't encode character u'\ufffd' in position 29: ordinal not in range(128)

Python 3 Support

Hi,

Probably I am absolutely wrong, but I have detected a possible issue in pystemon.py (specifically the fork in circl repository, after adding Python 3 support). The problems are in lines: 329 and 338, when variable named: description is a list, it is not possible to use the function decode, because decode it is for binary data. When trying to decode in a list, and error is raised, so I think the solution is this:
replacing: return '[{}]'.format(', '.join(descriptions.decode('utf-8', 'ignore')))
with: return '[{}]'.format(', '.join(descriptions))

For line 339, the same procedure.

If I am wrong, sorry I am just trying to help.

Thank you very much for continuing the development of this project.

Pastebin Pro Feed Crashing

Anyone else having issue's with your pastebin pro account?
All was working successfully for a few weeks and then I noticed AIL was not receiving any paste from my pastebin pro account.
Other paste are downloading successfully (slexy.org, kpaste.net, codepad.org, gist.github.com)
I have triple checked and my IP is whitelisted on Pastebin's site.

My pastebin pro configuration in pystemon.yaml:

pastebin.com_pro:
archive-url: 'https://scrape.pastebin.com/api_scraping.php?limit=250'
archive-regex: '"key": "(.+)",'
download-url: 'https://scrape.pastebin.com/api_scrape_item.php?i={id}'
public-url: 'https://pastebin.com/raw/{id}'
update-max: 50
update-min: 40

The following errors over and over until it reaches 100 then crashes. It does eventually recover on its own but crashes again after 100 tries.

[2018-10-23 21:15:08,671] Failed to download the page because of other HTTPlib error proxy error https://scrape.pastebin.com/api_scraping.php?limit=250 trying again.
[2018-10-23 21:15:08,671] Retry 99/100 for https://scrape.pastebin.com/api_scraping.php?limit=250
[2018-10-23 21:15:08,718] Failed to download the page because of other HTTPlib error proxy error https://scrape.pastebin.com/api_scraping.php?limit=250 trying again.
[2018-10-23 21:15:08,719] Retry 100/100 for https://scrape.pastebin.com/api_scraping.php?limit=250
[2018-10-23 21:15:08,875] Thread for pastebin.com_pro crashed unexpectectly, recovering...: 'NoneType' object has no attribute 'text'

Here is the error when running "./pystemon.py -v":

[2018-10-23 21:34:46,930] Retry 99/100 for https://scrape.pastebin.com/api_scraping.php?limit=250
[2018-10-23 21:34:46,930] Downloading url: https://scrape.pastebin.com/api_scraping.php?limit=250 with proxy: None and user-agent: None
[2018-10-23 21:34:47,039] Failed to download the page because of other HTTPlib error proxy error https://scrape.pastebin.com/api_scraping.php?limit=250 trying again.
[2018-10-23 21:34:47,039] Retry 100/100 for https://scrape.pastebin.com/api_scraping.php?limit=250
[2018-10-23 21:34:47,453] Thread for pastebin.com_pro crashed unexpectectly, recovering...: 'NoneType' object has no attribute 'text'
[2018-10-23 21:34:47,464] Traceback (most recent call last):
File "./pystemon.py", line 127, in run
last_pasties = self.get_last_pasties()
File "./pystemon.py", line 147, in get_last_pasties
htmlPage = response.text
AttributeError: 'NoneType' object has no attribute 'text'

string indices must be integers, not str

Hello,
Any help with the following error would be appreciated:
ThreadPasties for pastebin.com_pro crashed unexpectectly, recovering...: string indices must be integers, not str

My IP address is already whitelisted on pastebin.
Thank you.

Add timestamp to output

Instead of:

Downloading pasties from cdv.lt. Next download scheduled in 17 seconds
Downloading pasties from slexy.org. Next download scheduled in 21 seconds

You get:

[2013-04-24 15:55:00] Downloading pasties from cdv.lt. Next download scheduled in 17 seconds
[2013-04-24 15:55:02] Downloading pasties from slexy.org. Next download scheduled in 21 seconds

Define logging level

Hello,

It would be nice to be able to choose your logging level in the configuration file.
I only interested in error log and I am spammed with info ones.

I'll propose a PR for this.

Proxy Question

AIL version 1.6
Ubuntu 16.04

I'm trying to use a proxy for pystemon. The question is how do you specify the proxy settings? I see at the bottom of the pystemon.yaml file there is the following proxy configuration:

proxy:
random: no
file: 'proxies.txt'

I added my proxy address in the proxies.txt file but this did not help.

telegram

how can i fix this error "Failed to alert through telegram: 'Pastie' object has no attribute 'pastie_id'"

Send content in attachment

I got an email from pystemon with a match on a huge paste and my email client crashed opening it.

Maybe, we can send the the content of the paste as text attachment if the size exceed a limit define in the config file.

I'll work on a pull request for this.

No HTML content

ERROR: HTTP Error ############################# http://pastebin.com/archive
No HTML content for page http://pastebin.com/archive

When ever a trigger successfully hits I recieve this error. I know it was mentioned before that this was a throttling issue but I was wondering if this has been resolved. Is it throttling with pystemon where you need to change update-max & update-min? Or is it throttling by pastebin itself. I apologize about reposting.

pastebin - scrape API URL change

The current URLs of the scraping API of pastebin will be discontinued on 2018-04-27. Details about the new address(es) are available at:

https://pastebin.com/doc_scraping_api

The file pystemon.yaml needs to be modifying accordingly. My understand is that all they did was add the host name 'scrape' to the API URLs.

save state when stopping

Pystemon keeps a list of seen pasties in memory for performance reasons.
When pystemon stops, and is started up immediately it fetches again all data.
It'd be great if pystemon could save his state in a file, and reuse that state when starting up. This way all seen pasties in memory are not re-downloaded again.

Split pystemon.yaml config file

How about separating the pystemon.yaml config file into 2 files ?
For instance, sources.yaml for the site sources maintenance and pystemon.yaml for other stuff like log, network etc.

In production, it addresses 2 kinds of contributors:

  • system administrators for software installation pystemon.yaml
  • and operators or business administrators to keep sources up to date sources.yaml

What do you think about it ?

Weird Output

Also fucks up my console prompt.

Sometime happens to be completely unresponsive and I have to killall python.
Bug or Feature?

Pastebin_Pro - Error in your regex

Signed up and received a pastebin pro account and receive the error below after adding the Developer API Key to the configuration.
The feed dumps the paste to the screen and does not add them to AIL.
Slexy and codepad feeds are working successfully. The issue is only with the pastebin pro config.

"No last pasties matches for regular expression site:pastebin.com_pro regex:"123456790ABCDEFGHIJ": "(.+)",. Error in your regex? Dumping htmlPage"

My pastebin.com pro account configuration in pystemon.yaml:

pastebin.com_pro:
archive-url: 'https://scrape.pastebin.com/api_scraping.php?limit=500'
archive-regex: '"123456790ABCDEFGHIJ": "(.+)",'
download-url: 'https://scrape.pastebin.com/api_scrape_item.php?i={id}'
public-url: 'https://pastebin.com/raw/{id}'
update-max: 50
update-min: 40

pystemon does not work

Hi, i'm trying to use pystemon, i downloaded it and i run it with the default .yaml config but i get this error, any suggestions about that ?

[2020-10-23 11:05:00,603] Retry client=0/5, server=33/100 for http://pastebin.gr/paste.php?download&id=1 [2020-10-23 11:05:00,695] Failed to download the page because of other HTTPlib error proxy error: http://pastebin.gr/paste.php?download&id=1 [2020-10-23 11:05:00,696] Traceback (most recent call last): File "./pystemon.py", line 993, in __download_url__ res = __parse_http__(url, session, random_proxy) File "./pystemon.py", line 956, in __parse_http__ response.raise_for_status() File "/home/parallels/Develop/pystemon/venv/lib/python3.6/site-packages/requests/models.py", line 941, in raise_for_status raise HTTPError(http_error_msg, response=self) requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: http://pastebin.gr/paste.php?download&id=1

Thank you in advance!

Regex Match Email Addresses

There seems to be a bug with regex pattern matching for email addresses when the regex pattern is set to search only for the domain, as in the following example:

  • search: 'domain.com'

If a paste contains "[email protected]", no match is triggered.
Tested on different paste websites.

Can you replicate this issue?

YAMLLoadWarning

When running in python3

pystemon.py:1391: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
yamlconfig = yaml.load(open(configfile))

automatically spawn additional download threads

If for whatever reasons the download threads are not catching up with the new pasties arriving the queues will grow and end up eating lots of memory.

Add a new feature threads: auto that automatically manages the creation of additional download threads for that specific website.
The user can then choose to let pystemon decide what's good (auto) or hard-configure a number of threads per paste-site (like today)

pystemon not searching pastebin.com

Hi. Your tool is just great, but I am encountering a problem:

If I try to search for some specific patterns, I get errors from slexy.org which I do not get while using other patterns:

[2015-11-21 20:17:34,760] Found 21 new pasties for site snipt.net. There are now 20 pasties to be downloaded.
[2015-11-21 20:17:36,885] Found hit for ['--------'] in pastie http://slexy.org/raw/s2SNyQD6FM
[2015-11-21 20:17:36,887] ThreadPasties for slexy.org crashed unexpectectly, recovering...: 'ascii' codec can't encode characters in position 1171-1172: ordinal not in range(128)
[2015-11-21 20:17:38,810] Found hit for ['---------'] in pastie http://slexy.org/raw/s21C3nxB2v
[2015-11-21 20:17:38,811] ThreadPasties for slexy.org crashed unexpectectly, recovering...: 'ascii' codec can't encode characters in position 1171-1172: ordinal not in range(128)

And, most important, I get absolutely no results from pastebin.com, no matter what the pattern is.

I am behind TOR thru "delegated" daemon.

Thanks

Pastebin.com error

It appears the rest of the scraping is working just fine but I noticed Pastebin was having some problems today.

ERROR: URL Error ############################# http://pastebin.com/archive
Thread for pastebin.com crashed unexpectectly, recovering...: 'NoneType' object is not iterable
Traceback (most recent call last):
File "pystemon.py", line 92, in run
last_pasties = self.getLastPasties()
File "pystemon.py", line 106, in getLastPasties
htmlPage, headers = downloadUrl(self.archive_url)
TypeError: 'NoneType' object is not iterable

and

Downloading pasties from pastebin.com. Next download scheduled in 34 seconds
Downloading url: http://pastebin.com/archive with proxy: None and user-agent: None
ERROR: HTTP Error ############################# http://pastebin.com/archive
No HTML content for page http://pastebin.com/archive

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.