Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
hbag
Feb 13, 2021



im making a web toy that makes a youtube web player using all the vids in a thread, decided to make this thread to post updates about it instead of making GBS threads up the meta thread
so far ive managed to make new endpoints for the unofficial somethingawful API (check it out) for viewing the posts in threads, so that i dont have to write an entire web scraper for the project and can just make an API call

thankfully the youtube API seems slightly less useless than i first anticipated so getting the actual player to work shouldnt be TOO horrible

Adbot
ADBOT LOVES YOU

hbag
Feb 13, 2021



should mention that the endpoint for viewing full threads absolutely shits itself if the thread is too long (tested it on the cjs thread in yospos and it took so long that cloudflare thought it died) so im probably going to add a version for making playlists out of specific pages and/or page ranges for if you want to grab poo poo from a particularly long thread

but first im going to focus on the Main Thing, and do those two things after thats up and running

hbag
Feb 13, 2021



have got an incredibly simple player up and running but now i need to wrestle with jinja to see how i can import this loving array into js from my flask backend

hot cocoa on the couch
Dec 8, 2009



thanks for the updates hbag. looking forward to seeing this in action

hbag
Feb 13, 2021



i actually have literally the entire player working but now i just need to add a shuffle function and also make it look nicer than
uh
this

hbag
Feb 13, 2021





seems good enough for now
i noticed that a lot of videos seem to not work with the iframe API so they display as "not available" and get skipped, so rip lmao

insane anime
Aug 5, 2018


drat. thats loving sweet. God bless you

hbag
Feb 13, 2021



seems the API can handle, at most, 5 pages at a time so im probably going to make a new endpoint that just gets the last 5 pages of a thread and use THAT for the playlist generator

hbag
Feb 13, 2021



lol nvm turns out i can use async functions instead of waiting for each individual page like a psycho

hbag
Feb 13, 2021



async functions are proving to be dumb and lame so i might just add a function to the site where you can start monitoring a thread and itll build the playlist from a list of links stored in a db or some poo poo

Larry Parrish
Jul 9, 2012



im not much of a computer guy and i have no clue how you set this up on your end. but wouldn't a good way to cut down on scraping and performance problems be to just archive the results and show a list by thread name or some poo poo. then only run a scrape when someone enters a threadid, either to add a new one or update and existing one. this would also make it easier to use because it's kind of a pita on my phone. and this data seems like it should be easy to store as a table or text file of some kind so it's not like it would eat up a bunch of your server space.


edit I guess that's kind of exactly what you said in your last post lol

hbag
Feb 13, 2021



Larry Parrish posted:

im not much of a computer guy and i have no clue how you set this up on your end. but wouldn't a good way to cut down on scraping and performance problems be to just archive the results and show a list by thread name or some poo poo. then only run a scrape when someone enters a threadid, either to add a new one or update and existing one. this would also make it easier to use because it's kind of a pita on my phone. and this data seems like it should be easy to store as a table or text file of some kind so it's not like it would eat up a bunch of your server space.


edit I guess that's kind of exactly what you said in your last post lol

yeah lmao
kinda already up and running - it doesnt scrape the thread when you search it unless its a brand new thread. if the thread has been scraped before, it'll just load the results from a database and use that

Larry Parrish
Jul 9, 2012



u should make an index so my lazy rear end can just click link :cheers:

hbag
Feb 13, 2021



Larry Parrish posted:

u should make an index so my lazy rear end can just click link :cheers:

yeah ill probably do something like that once ive made sure everything's working alright and not doxing me or something

hbag
Feb 13, 2021



gotmore ideas. scribbling them down for later. loving Diabolical

hbag
Feb 13, 2021



also lmao at this loving janky-rear end function for retrieving the playlists

Python code:
############################################
# generate playlist from provided threadid #
############################################

def playlist_actual(thread_id):
    id_array = []
    thread_title = "N/A"

    connection = pymysql.connect(
        host = os.getenv('HOSTIP'),
        user = os.getenv('DBUSER'),
        passwd = os.getenv('DBPASS')
    )

    cursor = connection.cursor()

    cursor.execute('USE playlist_gen')
    cursor.execute('SHOW TABLES')

    try:
        cursor.execute("SELECT thread_id FROM thread_vids WHERE thread_id=%(threadid)s;", {'threadid':thread_id})

        # if the thread exists in the DB, get its list of videos
        if cursor.fetchone() != None:
            # grab vid IDs
            cursor.execute('SELECT vid_ids FROM thread_vids WHERE thread_id=%(threadid)s;', {'threadid':thread_id})
            id_array = cursor.fetchone()[0].split(",")

            # greab thread title
            cursor.execute('SELECT thread_title FROM thread_vids WHERE thread_id=%(threadid)s', {'threadid':thread_id})
            thread_title = cursor.fetchone()[0]
        else:

            try:
                response = requests.get(f'https://api.fyad.club/threaddata/{thread_id}?token={os.getenv("PRIVTOKEN")}&page=1')
                response.raise_for_status()
                # access JSON content
                jsonResponse = response.json()
                if "error" not in jsonResponse:
                    thread_title = jsonResponse['thread_title']
                    for post in jsonResponse:
                        try:
                            videos = jsonResponse[f'{post}']['vids'].split(',')
                            for video in videos:
                                try:
                                    # extract video ID from URLs
                                    vid_id = video.split('/')[4].split('?')[0] # regex can eat my rear end

                                    # if the ID isn't already in the array, add it
                                    if vid_id not in id_array:
                                        id_array.append(vid_id)
                                except IndexError:
                                    pass
                        except (KeyError, TypeError):
                            pass
            finally:
                pass

            # add thread to DB
            if "error" not in jsonResponse:
                cursor.execute('INSERT INTO thread_vids(thread_title,thread_id,vid_ids,last_page_viewed) VALUES (%(threadtitle)s%(threadid)s,%(vidids)s,1)', {'threadtitle':jsonResponse['thread_title'],'threadid':thread_id,'vidids':','.join(id_array)})
                cursor.execute('COMMIT')

    finally:
        cursor.close()
        connection.close()
    
    resp = flask.make_response(flask.render_template('player.html', vid_ids=id_array, thread_title=thread_title))
    return resp

Adbot
ADBOT LOVES YOU

hbag
Feb 13, 2021



the thread scraper that the playlist generator uses has a few bugs that cause it to crash, so im gonna be working on fixing those
letting you know because it means new playlists arent gonna be generated beyond the first page for now

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply