Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
vodkat
Jun 30, 2012



cannot legally be sold as vodka
terrible programmer status:

Before I left last night I wrote a quick and nasty script to process a bunch of data. This morning I found out it took 9 hours to run. Out of embarrassment I've just rewritten it properly and got that down to 5 minutes :cripes:

Adbot
ADBOT LOVES YOU

vodkat
Jun 30, 2012



cannot legally be sold as vodka
I need to make a simple website that will be updated occasionally, is using Jekyll for this a good or bad idea?

vodkat
Jun 30, 2012



cannot legally be sold as vodka

MALE SHOEGAZE posted:

it's pretty incredible that the cpu loop needs to run ~200k times per second in order to hit the target of 2mhz, which is insanely slow by modern standards. i've definitely never written a loop that hot. and yet i still need to clamp it.

anyhow, here's how things look in action:
https://imgur.com/a/oNQBA0G


this is incredibly cool :cheers:

vodkat
Jun 30, 2012



cannot legally be sold as vodka
I've got an issue that is driving me crazy and I can't figure out what is causing it. I need to take a cell from a DataFrame, split it into a list it and then return copy of that DataFrame with a new row for each element of the split. At present I'm getting errors that the size of the new DataFrame and the size of the list don't match and I don't know how this can happen as all I am doing is row * size of list. Anyone got any ideas of what is going wrong here?

code:
def simple_splitter(row, column_name, string):
    column_list = re.split(string, row[column_name].values[0])  # splits cell of df based on string
    arr = np.array(np.concatenate([row] * len(column_list)))  # clones the row to the length of the list
    df = pd.DataFrame(arr, columns=row.columns)  #put back into a df, needed for column names
    df[column_name] = column_list # Adds in the split address <- somehow a mismatch between the length of the df and the column list keeps happening!
    return df

vodkat
Jun 30, 2012



cannot legally be sold as vodka

cinci zoo sniper posted:

Away from pc but I don’t think arr does what you think

I will have a look at this again in the morning but I also tried pd.DataFrame([[row]] * len(column_list)) when trying to fix this and encountered the same error, so all I could think of was that for some reason the way in which the list is being counted is off? From having looked at some output this is normally by a factor of two, but not always, so totally baffled as to what this could be.

vodkat
Jun 30, 2012



cannot legally be sold as vodka

vodkat posted:

code:
def simple_splitter(row, column_name, string):
    column_list = re.split(string, row[column_name].values[0])  # splits cell of df based on string
    arr = np.array(np.concatenate([row] * len(column_list)))  # clones the row to the length of the list
    df = pd.DataFrame(arr, columns=row.columns)  #put back into a df, needed for column names
    df[column_name] = column_list # Adds in the split address <- somehow a mismatch between the length of the df and the column list keeps happening!
    return df


vodkat posted:

I will have a look at this again in the morning but I also tried pd.DataFrame([[row]] * len(column_list)) when trying to fix this and encountered the same error, so all I could think of was that for some reason the way in which the list is being counted is off? From having looked at some output this is normally by a factor of two, but not always, so totally baffled as to what this could be.

Think I've got to the bottom of this, it wasn't this code that was the issue but the way in which I was applying the function using groupby, which due to a duplicate index or something (which I didn't bother trying to figuring out) it was sometimes passing more than one row to this function. I think I've fixed this by stopping trying to be clever with groupby and apply and using the much slower df.iterrows() instead.

vodkat
Jun 30, 2012



cannot legally be sold as vodka
Anyone have any experience scraping data from public tableau dashboards?

Basically , I think this dashboard is setup to stop anyone scraping it but that’s exactly what I need to do. And I’m wracking my brains trying to figure out how to do this.

The dashboard itself is a hit mess of java objects so a simple html scrape won’t do it. With selenium I can get the data up but basically need to click on every cell to get the data out as it won’t display full cells which makes automating navigating a nightmare. I thought about automating selenium to screenshot all the sheets I need and then ocr them but the dashboard is a fixed width (and too small to display all the data) and none of the trick I have used to try and fooling it in to displaying bigger have worked.

Honestly don’t know what else to try? anyone got any ideas of how to crack open these stupid loving dashboards?

(inb4 pat mechanical Turk workers to click on and copy paste all 10,000 cells)

vodkat
Jun 30, 2012



cannot legally be sold as vodka

xtal posted:

By the second or third paragraph I was going to suggest Turking it.

Here's another idea. Turn on the network inspector in your browser or ngrep and use the app. It may be downloading the data from an API and then you can just query or snoop on that.

Also, if you mean Javascript and not Java, you can use the dev tools to pull it out. You can make a query in the dev console to read out all the information from the DOM, or look in the JS source for where it's making network requests or rendering them, and insert breakpoints there where you read the values from Javascript variables.

:worship: thank you! this sounds like it might be my best bet. will have a play around with this tomorrow and see if I can get at it this way.

vodkat
Jun 30, 2012



cannot legally be sold as vodka
So I have looked at this more and what is happening is truly mind boggling.

The public dashboard itself is not loading any data, just PNGs that look like a spreadsheet of the data. And when you click on cell in sends the location of the click on the PNG to the server and sends back a tooltip for that cell with the actual data in it.

So I think it will be possible to get out the data by automating POST requests of all the cells based on their location in the PNGs. Not sure how to do this with selenium though, which I think I will still need for all the cookies session ids etc, but doesn't seem to have a function for making POST requests?

vodkat
Jun 30, 2012



cannot legally be sold as vodka
Yep this is very 2020, I think its probably designed that way to try and stop anyone copying data that is being openly published with as much website fuckery as possible. That said maybe its not intentional and is just good ol' bad programming :shrug:

vodkat
Jun 30, 2012



cannot legally be sold as vodka
I posted a few pages ago about a data table visualisation I am trying to scrape and which is hiding the data behind image maps. So I got around to automating this scrape only to find that there is more to it than this. In that moving location on the image map isn't enough to get a new request back from the server. Instead it looks like it sends the location and a hash value?!? and without the two lining up it wont send the data from that location. I've included a screenshot of the dif between two get requests bellow.



The whole thing is a javascript hot mess but my thinking is this value is determined client side at the moment you click so there must be away of figuring out how to derive this value? Anyone got any ideas how I might do this?

vodkat
Jun 30, 2012



cannot legally be sold as vodka

Soricidus posted:

that’s just the mime multipart delimiter. it’s picks randomly to avoid matching anything in the content. I doubt the server application logic ever sees it.

randomly changing the delimiters along with changing click location still seems to be returning the same data each time. No idea how this working as this is the only information that changes with each get request, but then changing that returns the same data? Could there be some fuckery with cookies or something?

Adbot
ADBOT LOVES YOU

vodkat
Jun 30, 2012



cannot legally be sold as vodka
I have two datasets, one which contains all the data, and one which is a non-random subset of this data. I do not know the variables which were used to make the subset. Is there like some ml library I can throw this at to find out the variables/data selections which were used to make the subset?

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply