|
terrible programmer status: Before I left last night I wrote a quick and nasty script to process a bunch of data. This morning I found out it took 9 hours to run. Out of embarrassment I've just rewritten it properly and got that down to 5 minutes
|
# ¿ Jul 24, 2018 15:59 |
|
|
# ¿ Apr 27, 2024 08:26 |
|
I need to make a simple website that will be updated occasionally, is using Jekyll for this a good or bad idea?
|
# ¿ Aug 16, 2018 12:23 |
|
MALE SHOEGAZE posted:it's pretty incredible that the cpu loop needs to run ~200k times per second in order to hit the target of 2mhz, which is insanely slow by modern standards. i've definitely never written a loop that hot. and yet i still need to clamp it. this is incredibly cool
|
# ¿ Aug 22, 2018 23:46 |
|
I've got an issue that is driving me crazy and I can't figure out what is causing it. I need to take a cell from a DataFrame, split it into a list it and then return copy of that DataFrame with a new row for each element of the split. At present I'm getting errors that the size of the new DataFrame and the size of the list don't match and I don't know how this can happen as all I am doing is row * size of list. Anyone got any ideas of what is going wrong here?code:
|
# ¿ Feb 16, 2019 19:15 |
|
cinci zoo sniper posted:Away from pc but I don’t think arr does what you think I will have a look at this again in the morning but I also tried pd.DataFrame([[row]] * len(column_list)) when trying to fix this and encountered the same error, so all I could think of was that for some reason the way in which the list is being counted is off? From having looked at some output this is normally by a factor of two, but not always, so totally baffled as to what this could be.
|
# ¿ Feb 16, 2019 22:35 |
|
vodkat posted:
vodkat posted:I will have a look at this again in the morning but I also tried pd.DataFrame([[row]] * len(column_list)) when trying to fix this and encountered the same error, so all I could think of was that for some reason the way in which the list is being counted is off? From having looked at some output this is normally by a factor of two, but not always, so totally baffled as to what this could be. Think I've got to the bottom of this, it wasn't this code that was the issue but the way in which I was applying the function using groupby, which due to a duplicate index or something (which I didn't bother trying to figuring out) it was sometimes passing more than one row to this function. I think I've fixed this by stopping trying to be clever with groupby and apply and using the much slower df.iterrows() instead.
|
# ¿ Feb 17, 2019 14:57 |
|
Anyone have any experience scraping data from public tableau dashboards? Basically , I think this dashboard is setup to stop anyone scraping it but that’s exactly what I need to do. And I’m wracking my brains trying to figure out how to do this. The dashboard itself is a hit mess of java objects so a simple html scrape won’t do it. With selenium I can get the data up but basically need to click on every cell to get the data out as it won’t display full cells which makes automating navigating a nightmare. I thought about automating selenium to screenshot all the sheets I need and then ocr them but the dashboard is a fixed width (and too small to display all the data) and none of the trick I have used to try and fooling it in to displaying bigger have worked. Honestly don’t know what else to try? anyone got any ideas of how to crack open these stupid loving dashboards? (inb4 pat mechanical Turk workers to click on and copy paste all 10,000 cells)
|
# ¿ Sep 7, 2020 20:19 |
|
xtal posted:By the second or third paragraph I was going to suggest Turking it. thank you! this sounds like it might be my best bet. will have a play around with this tomorrow and see if I can get at it this way.
|
# ¿ Sep 7, 2020 20:36 |
|
So I have looked at this more and what is happening is truly mind boggling. The public dashboard itself is not loading any data, just PNGs that look like a spreadsheet of the data. And when you click on cell in sends the location of the click on the PNG to the server and sends back a tooltip for that cell with the actual data in it. So I think it will be possible to get out the data by automating POST requests of all the cells based on their location in the PNGs. Not sure how to do this with selenium though, which I think I will still need for all the cookies session ids etc, but doesn't seem to have a function for making POST requests?
|
# ¿ Sep 8, 2020 08:09 |
|
Yep this is very 2020, I think its probably designed that way to try and stop anyone copying data that is being openly published with as much website fuckery as possible. That said maybe its not intentional and is just good ol' bad programming
|
# ¿ Sep 8, 2020 13:06 |
|
I posted a few pages ago about a data table visualisation I am trying to scrape and which is hiding the data behind image maps. So I got around to automating this scrape only to find that there is more to it than this. In that moving location on the image map isn't enough to get a new request back from the server. Instead it looks like it sends the location and a hash value?!? and without the two lining up it wont send the data from that location. I've included a screenshot of the dif between two get requests bellow. The whole thing is a javascript hot mess but my thinking is this value is determined client side at the moment you click so there must be away of figuring out how to derive this value? Anyone got any ideas how I might do this?
|
# ¿ Sep 15, 2020 09:28 |
|
Soricidus posted:that’s just the mime multipart delimiter. it’s picks randomly to avoid matching anything in the content. I doubt the server application logic ever sees it. randomly changing the delimiters along with changing click location still seems to be returning the same data each time. No idea how this working as this is the only information that changes with each get request, but then changing that returns the same data? Could there be some fuckery with cookies or something?
|
# ¿ Sep 15, 2020 17:00 |
|
|
# ¿ Apr 27, 2024 08:26 |
|
I have two datasets, one which contains all the data, and one which is a non-random subset of this data. I do not know the variables which were used to make the subset. Is there like some ml library I can throw this at to find out the variables/data selections which were used to make the subset?
|
# ¿ Mar 4, 2021 12:55 |