f047d8 No.4475 [View All]
How about a thread for discussing/creating/sharing parsing scripts?
I made one for md5 lookup on e621.net (actually I just modified Hydrus_dev's danbooru script). Let me know if I did anything wrong with it, I'm pretty clueless… but it seems to work fine.
[32, "e621 md5", 1, ["http://e621.net/post/show", 0, 1, 1, "md5", {}, [[30, 1, ["we got sent back to main gallery page -- title test", 8, [27, 1, [[["head", {}, 0], ["title", {}, 0]], null]], [true, true, "Image List"]]], [30, 1, ["", 0, [27, 1, [[["li", {"class": "tag-type-general"}, null], ["a", {}, 1]], null]], ""]], [30, 1, ["", 0, [27, 1, [[["li", {"class": "tag-type-copyright"}, null], ["a", {}, 1]], null]], "series"]], [30, 1, ["", 0, [27, 1, [[["li", {"class": "tag-type-artist"}, null], ["a", {}, 1]], null]], "creator"]], [30, 1, ["", 0, [27, 1, [[["li", {"class": "tag-type-character"}, null], ["a", {}, 1]], null]], "character"]], [30, 1, ["", 0, [27, 1, [[["li", {"class": "tag-type-species"}, null], ["a", {}, 1]], null]], "species"]], [30, 1, ["we got sent back to main gallery page -- page links exist", 8, [27, 1, [[["div", {}, null]], "class"]], [true, true, "pagination"]]]]]]
39 posts and 13 image replies omitted. Click [Open thread] to view. ____________________________
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
f3728d No.7822
Booru tag parsing script isn't grabbing the full rez image from Danbooru
These are all variations of the same image and they parsed correctly
http://danbooru.donmai.us/posts/2813183
http://danbooru.donmai.us/posts/2824474
https://gelbooru.com/index.php?page=post&s=view&id=3820627
https://gelbooru.com/index.php?page=post&s=view&id=3820897
https://gelbooru.com/index.php?page=post&s=view&id=3836020
https://yande.re/post/show/405601
This did not parse correctly; it somehow downloaded a sample size of it. It's worth noting that Hydrus itself is unable to parse and download it, but the parsing script at least gets the sample rez
http://danbooru.donmai.us/posts/2812948
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
bf8458 No.9278
Is there are way to scrape files and tags from Zerochan?
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
b461bd No.9280
>>4475
Hitomi, Tsumino, Hentai2Read , HentaiCafe, NHentai, HBrowse and Goddess are what /a/ recommends when avoiding SadPanda
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
aed163 No.11124
Here's an e621 pool lookup.
Seems to work for me, images appear in correct order in the browser pane. I just need to find a better way of tagging page:* and title:*
atm I drag the files onto Krename which outputs to /tmp/hydrus/<title>/<page>.<ext> and use the tag based on file name import option.
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
915732 No.11590
Just made a realbooru one.
Fucking parsers are a pain in the ass.
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
915732 No.11610
>>11590
Wait, I fucked up. Here's the fixed version.
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
f5ba1f No.11616
>>4475
Can any custom parsers handle logins? Like the twitter gallery situation is still out of the picture and has been for a few months now. Fur Affinity and InkBunny if parsers are made but without logins will barely scrape any content as well. I know Hdev said FA gallery parser is coming but without login support it's hardly worth the work to make one imo.
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
aed163 No.11696
>>11616
You can make your own login scripts but IMO it's not worth it, especially when the site makes heavy use of javascript or captchas.
Instead, just copy the cookies from your browser session to get logged in.
>network>data>review session cookies
Inkbunny needs "PHPSESSID"
For other sites just copy anything that has any login looking things like username, base64 or hex string values until it works.
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
bf8458 No.11698
What do I need to learn about HTML or JSON so I can make downloaders?
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
a643ac No.11814
I'm trying to use the iqdb-tagger python script, but there is a PermissionError when it tries to write to windows temp folder. Anyone know how to fix? I tried setting the iqdb-tagger-server.exe, iqdb-tagger.exe and python.exe to run as administrator but it doesn't help. I'm on Windows 10.
https://github.com/rachmadaniHaryono/iqdb_tagger
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
915732 No.11886
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
7b2523 No.12763
>>7394
I've been using the tag parser and server (https://github.com/JetBoom/boorutagparser) fine until recently: random place he decided to host the sound went down, breaking a lot of shit. Thought I'd leave a note for anyone having problems: Just right-click on the script to edit it, then comment out (//) anything to do with the sound or variable it's stored in. That should get it working again.
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
bf8458 No.12848
>>12763
I only use the parser, and just deleted the link to the audio file itself. Everything still works in the parser even with it there, but you get that stupid login prompt. And here I thought the boorus got hit with some new malware or something
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
bf8458 No.12849
What's the deal with it not working on derpibooru anymore?
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
fb1a41 No.12997
I installed and tried using iqdb_tagger but it complains that the 'hydra-python-core' distribution was not found and is required by hydrus. What gives?
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
419f98 No.13042
Has Pixiv parsing stopped working for anyone else recently?
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
bf8458 No.13044
>>13042
What do you mean? There was no parser for pixiv. If you mean those extensions that let you direct load the images then those have broken for 1 year+ since pixiv keeps editing its sites
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
419f98 No.13045
>>13044
I might have found a custom set from CuddleBear92's GitHub repo (I sure as fuck didnt write them) but I had been reliably importing pixiv urls just days ago and now they error out; can't find anything. I havent looked into it too hard yet but was wondering if I'm alone
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
bf8458 No.13048
>>13045
I think it's just you; (I'm using the Hydrus default pixiv parser) I made my 32 artist subs check now and they went through with no errors. But they had already checked recently so there weren't any files to snag.
No idea why it would work yesterday but not today, unless that was made before they revised their site and they just happened to leave the old code running as a fallback till now.
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
303955 No.13052
The built in script for using iqdb to look up tags from danbooru works for me. There are many more like it on cuddlebear92's website, but they are 2 years old and don't seem to work at all anymore. I just want something that works the same as the built in function for other sites like sancom, gelbooru, etc., but it seems I'm left high and dry. I don't understand why it doesn't work anymore either. I went through the logic of the iqdb gelbooru script, for instance, and compared it with the HTML actually sent back by the website, the logic still seems sound.
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
59b96b No.13053
>>13042
Not just you, same happened to me on both the default hydrus parsers and the custom pixiv all-in-one set. Everything gets ignored and has been for a couple of days. All the pages come in as if I'm not logged in for whatever reason
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
303955 No.13054
>>13052
Hmm, I've found that the gelbooru one actually works off and on. Sometimes it oddly just returns a list with 4 crosses, instead of a list of actual tags though. Now then, what I'd really like to do is automate running file look up scripts on more than one file and automatically apply all tags to each file. There doesn't seem to be away to do this through the interface when more than one file is selected, but there has to be a way, right?
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
7fead0 No.13056
Hit the same pixiv issue just now. The login itself doesn't seem to be the issue, I reset and redid the login within Hydrus but that seems to have changed nothing.
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
12d78e No.13057
Pixiv changed their API so the parser had to be redone. You can replace the old one with this one or wait until Wednesday as it should be in the next release. Also pixiv added captcha to login so you have to import cookies manually now. The login in hydrus won't work.
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
566fd9 No.13138
The sankaku parser someone posted on this board that was supposed to remove the 2000 files limit didn't work properly for me, due to the naive way the parser fetched the next gallery page data I think, so I made a fix some while ago that works on my machine (TM). Please let me know if it works on yours, too.
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
303955 No.13162
>>13138
Working a treat right now. I understand a bit of html, but these parsers make no sense to me. Maybe I'll sit down and spend time to figure out how to do this myself sometime.
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
69fa49 No.13525
I'm not sure if this has been fixed yet, but I modified the default 8ch parsers to allow hydrus to download 8kun threads with filenames.
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
085c83 No.13616
The JSON API for boards like gelbooru returns all the tags, as well as the path to the files, hash, source, updated time, etc.
Example
https://gelbooru.com/index.php?page=dapi&json=1&s=post&q=index&limit=50&tags=cat%20rating:safe&pid=2
(The tags are HTML-escaped, but I don't know about other entries)
So why do the gallery downloaders scrape HTML for each page instead of using all the information obtained from a search request?
If I do a search for a set of tags, the downloader has to download the HTML for every single post's page just to check for duplicates and tags.
It's a lot of wasted resources/effort for both client and server.
If I already have all 50 files that turn up in the linked search, in total I did 1 request instead of 51 to verify that.
Similarly, if I had to download all the images, in total it was 51 requests instead of 101, with the bonus that no HTML scraping had to be performed.
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
085c83 No.13620
I noticed gelbooru's JSON API returns tags as a single string with each tag delimited by spaces.
Is there a way to split a JSON string match into multiple entries?
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
2f0bc3 No.13762
So I'm a newcomer to making downloaders, I made a bunch of url classes, such as for an HTML page of an album that contains many images, it redirects to an API call, which also has it's own class, I made parsers for the API response, selected which API query element corresponds to the next page (such as offset) and even added a next page URL in the parser.
But no matter what I do, when I drag & drop an album's URL into Hydrus, it only downloads the first page worth of images and never goes further.
Is it supposed to work like that? Do I have make something like GUG to make the continuous downloading work?
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
1be05b No.13953
Friendly neighborhood anon here - e621 seems to have added 'lore' and 'meta' tagtypes which the default parser can't catch - this updated parser can catch them.
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
dfaa46 No.14001
I previously used a modified version of saucenao's generic script to automatically(-ish) reverse image search untagged images that show up, but now that e621 has their own reverse search, I whipped up my own python script. e621's reverse search also doesn't have a cap on searches done in 30s/24hr (it does require an account tho).
https://gist.github.com/corposim/b7ccb6a2c8814032ddd65db91b371dc2
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
655b38 No.14013
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
a51669 No.14320
This might have been asked before, but is there a downloader for NicoSeiga? If not, does anybody know other tools for that?
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
0efa77 No.14328
I'm trying to get Hydrus to download from smugloli.net. I have made url classes that match the URL and created an API URL for the json, but when I try to watch a thread it instantly says "DEAD" with the log message saying there was no parser. It should work if the "4chan-style API parser" is used, but I have no clue how to make it use that.
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
939c32 No.14372
Anyone know what the situation is with gfycat redirecting NSFW content to some sort of sister site? I guess they intend for you to browse their new site "redgifs" but following old nsfw gfycat links takes me to "gifdeliverynetwork"
Anyway in short I got some sort of gfycat/redgifs downloader bundle from cuddlebear's hydrus scripts git repo but I'm not really sure what to do with them and I can't download videos straight from redgifs like I used to with gfy, anyone else in a similar spot?
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
7c774d No.14403
With the number of artists attempting to migrate to pillowfort from twitter, I tried my hand at building something to parse pillowfort posts. It could probably still use some cleanup and correction, but figured it was worth putting out there since I've gotten it to work for me pretty well so far.
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
655b38 No.14437
here's an updated realbooru downloader; includes a gug, post and gallery urls and a parser
tags work well.
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
0891ac No.14466
Can the nijie parser download video and manga?
It doesn't look like it from what I saw, but I may have missed a step. While I'm asking, How would I automatically fetch the nijie work:# ?
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
7d3571 No.14471
Friendly neighborhood anon here. Someone once asked for an agn.ph downloader. This is an all-in-one that should work for the site.
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
5a9c6e No.14474
is there a parser for the FA Onion Archive?
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
b4a834 No.14558
anything for rule34hentai?
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
963f0c No.14624
^wrong one, here is the one that works, tagging kindaaa works but location tags are busted
(instagram)
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
7907b3 No.14710
>>14437
I think this has changed again, I'll give it a look but I am not good at it at all.
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
3fc0bd No.14739
>>14710
Anything on this anon?
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
6c7e0e No.14748
realbooru parser that functions at least
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
8c2d2c No.14782
Sankaku is now hiding lolis. Is there some way to get around this?
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
4920bd No.14837
I'm not sure if GUGs can make these, but anyone have a module for setting up Youtube subscriptions?
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
a382d2 No.14847
>>14782
They're not hiding lolis. I don't understand why I keep hearing this. Did you check the mature content option in settings and clear your account blacklist? Do you have an account in the first place?
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
f5d248 No.15140
Can someone help me understand what parsing scripts are for, and how to use them?
Are they to improve the amount of tags that are found for images? Like a reverse search?
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.