Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
QuarkJets
Sep 8, 2008

Tayter Swift posted:

Something I've kinda-sorta struggled with over the last couple of years is reconciling PEP8 and other best practices with the rise of notebooks. Does splitting sections into functions make sense when those sections are in individual cells? Do you avoid referencing results from previous cells when you don't have to, because of what can be a non-linear way of executing code (sometimes I'll just reread data from disk each cell if it's quick)? Docstrings are redundant when we have Markdown, right? What about line lengths?

I don't apply pep8 to notebooks at all - I'll prototype some code there and then the code gets copied into an IDE and then I add the bells and whistles (organization, docstrings, unit tests). If I want to share code with someone who doesn't have access to my repo then I might copy code into a notebook, one function definition per cell, but then it's all already PEP8'd cause it came out of my repo

Adbot
ADBOT LOVES YOU

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.
I've got a weird corner case for distributing wheels that I'm bashing my head against. I'm checking out https://github.com/pypa/cibuildwheel as a way to build wheels for a whole host of different targets using manylinux so that nobody has to compile poo poo themselves from a source distribution, but I noticed that cibuildwheel is directly calling wheel and not building a source distribution first with the build module.

This means that it would fail to detect problems with the sdist, because it never uses the sdist to build a wheel. For example, there were some files that were moved around without updating the MANIFEST.in, which meant that whenever anyone tried to compile from the sdist it would fail due to missing headers.

Am I wrong in thinking that cibuildwheel should build the sdist first and then build a wheel from that vs just building a wheel straight from the repository code? Is this project in python packaging hell and never should have gotten here? I'm going to read more about configuring cibuildwheel and see where I get.

Vulture Culture
Jul 14, 2003

I was never enjoying it. I only eat it for the nutrients.

Oysters Autobio posted:

Notebooks entirely live in a different use of coding.

There's using code to make a tool, package or something reusable. It's creating a system, package etc

Then there's using code to analyze data or explore / test or share lessons and learning. It's to create a product. Something to be consumed once by any number of people.

But after recently working in a data platform designed by DS' who were inspired by Netflix circa 2015, I think it's hell. Every ETL job is mashed into a notebook. Parameterized notebooks that generate other notebooks. Absolutely zero tests.

If I'm writing an analytical report it's fantastic because I can weave in visualizations and text and images.

Or often if I'm testing and building a number of functions, it's nice to call in the REPL on demand and debug things that way. But once that's finished, it goes into a python file. VS Codes has a great extension that can convert .ipynb to a .py file.

But for straight code it's a mess and frankly I find it slows you down. With a plain .py file I just navigate up and down, comment as I need, etc.

Finally once you've ever tried a feature IDE like vscode, you'll never want to go back to jupyterlab. The tab completion is way snappier, you've got great variable viewers and can leverage a big ecosystem of extensions

I'm a complete amateur at python and am only a data analyst, but I'm so glad I moved to VS Code.
Nothing wrong with using both. Set up %autoreload 2 as the first step in your notebook, then modularize any part of your analysis that you actually care about the correctness of as you go

I usually end up with the data-transformation parts of my code living as Python modules in VSCode, and the interactive visualization parts in Jupyter where the ecosystem is just a lot better

Falcon2001
Oct 10, 2004

Eat your hamburgers, Apollo.
Pillbug
So I write a fair amount of CLI tools, and I'm getting a little frustrated by the debug pattern for that in VSCode, was wondering if anyone had any advice or tips. Let me explain a bit.

So let's say I have a series of different scenarios I need to run - either in a runbook or just for manual testing/debug purposes. In VSCode, each and every one of those needs to be it's own JSON blob, which requires going into the file, finding the right JSON dict, and editing or copy/pasting it.

From a human perspective, these are just...multiple strings, really. cmd --arg_one x --arg_two y, etc. They're reasonably easy to swap things around between/etc; but in debug configurations, those each have to a separate JSON list of strings, which is just kind of a pain in the rear end extra step. Also the debug name dropdown just...is too small? So the names are cut off? I realize this is a minor complaint but it's kind of just one more thing going on.

Now, I can just go to where my argparser is parsed and then construct a fake one using a scratch driver script or something like, but that has it's own downsides, namely that it gets a lot trickier if I'm using Click or another framework, and also that I could accidentally fail to check some odd argparser edge case.

Is there any better way of handling these? An extension or something that would improve this behavior? I'd frankly love if I could start a debug session and have VSCode prompt me for the args as a string or something as a specific debug setup.

necrotic
Aug 2, 2005
I owe my brother big time for this!
Instead of a full on argparse “driver” a maybe hacky solution is a debug entry point script that prompts for args and then you just modify the sys.argv and invoke the main function afterwards? Would be transparent to any arg parse library.

shame on an IGA
Apr 8, 2005

try using promptString input variables inside launch.json

Falcon2001
Oct 10, 2004

Eat your hamburgers, Apollo.
Pillbug

shame on an IGA posted:

try using promptString input variables inside launch.json

I don't think this will help much, since the args need to split into a list. I'll poke around though.

Tacos Al Pastor
Jun 20, 2003

Maybe this doesnt belong here but it is python related so here it goes: How do I use a conditional within a regex expression? The string is like this:

{some text} {timestamp in the form: '12:22:59:987654'} {more text} {text I want to match} {numerical data}

I only want to pull timestamps for the text I want to match.

Im using this regex string to pull the timestamp: r'\d{2}:\d{2}:\d{2}.\d{6}

Never done this before so thought I would ask the group.

The Fool
Oct 16, 2003


should be able to use capture groups: https://docs.python.org/3/library/re.html#re.Match.group

Tacos Al Pastor
Jun 20, 2003


Thats awesome, but I dont want to store them in a tuple but rather a list.

necrotic
Aug 2, 2005
I owe my brother big time for this!

Tacos Al Pastor posted:

Thats awesome, but I dont want to store them in a tuple but rather a list.

Pass the tuple to list() then?

Tacos Al Pastor
Jun 20, 2003

necrotic posted:

Pass the tuple to list() then?

Yeah thats cool. I just want to store the time given certain text in the line of a file. An example is: '(\b\d{2}:\d{2}:\d{2}.\d{6}).+(\b[uplink]+\b)", the first part being the timestamp and the second part being the text I am interested in filtering by. Any subsequent lines that dont contain that text, im not interested in collecting. Its then those times I want to store in a list.

The Fool
Oct 16, 2003


Tacos Al Pastor posted:

Thats awesome, but I dont want to store them in a tuple but rather a list.

iterate the lines, if it does't match it returns None, if it does it returns group(0) as a string, append to your list

DELETE CASCADE
Oct 25, 2017

i haven't washed my penis since i jerked it to a phtotograph of george w. bush in 2003
yes but i want to embed all of my logic in the regex don't you see?

Oysters Autobio
Mar 13, 2017
Fairly new to Python here so excuse the naive question: are you trying to extract these strings entirely only using regex and if so, why? By no means am I judging you on this, god knows we all have to do hacky poo poo all the time.

Seems like trying to do this in pure regex is just a massive headache given the complexity of the pattern you're trying to hit. Personally, I'd do what someone else mentioned above and first filter for your keys (ie the strings you have), then extract the timestamps.

Only benefit I could see is if the keys themselves are complex strings too where you're going to have to create something complex in regex to run searches on them too. If you just have the keys already then sounds far easier to go with that option.

Alternatively, what's the format of the files? Log files? JSON? Are these in such a non standard format you couldn't leverage an existing python wrapper package?

edit: sorry for the blahblahblah, I know you came here just looking for a quick regex answer about conditionals but...this is the python thread so, kind of expected to get pythonish answers.

Oysters Autobio fucked around with this message at 15:31 on Mar 5, 2024

Tacos Al Pastor
Jun 20, 2003

^^ yes the format was a .log file. What I did was look for the specific text I wanted and then filtered out those timestamps into a list. Since I needed to do this twice (one for upload and another for download), I decided to have two seperate lists to read from. And yeah I was trying to see if I could extract only two specific strings from a line of text (which I learned can be done using groups).

The Fool posted:

iterate the lines, if it does't match it returns None, if it does it returns group(0) as a string, append to your list

Thanks for this. This is what I ended up doing after playing around with it for a while.

DELETE CASCADE posted:

yes but i want to embed all of my logic in the regex don't you see?

haha yeah this is what I wanted to do. I came to the point that regex and grouping only takes you so far.

Tacos Al Pastor fucked around with this message at 16:40 on Mar 5, 2024

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.
What's the right way to ship a Python package on PyPI that needs boost headers? I know that I can build a hell of a lot of binary wheels with cibuildwheel, but an sdist is likely to be the best option for portability and long-term usability.

In R-land CRAN has a lot of header-only packages available for common C++ dependencies, which means that my R package doesn't need to include Boost, I just depend on BH: https://cran.r-project.org/web/packages/BH/index.html. In conda, there's a whole of lot headers packaged. In PyPI, either I can't find where someone else has packaged Boost, or there's just not a culture of doing this.

I'd really like the sdist to have boost delivered from the same place as the binary wheels, so I'd use conda-packaged boost for distributing the package on conda, and I was hoping to use a PyPI packaged boost rather than system boost, boost from source, boost from conda, or boost from anywhere else.

Falcon2001
Oct 10, 2004

Eat your hamburgers, Apollo.
Pillbug
I'm trying to add a simple caching mechanism to a couple different client libraries, was wondering if there's any easier way to do this. For context: I'm working on a system where I'm making calls where the results are generally safe to cache for a day or more, and when running it several times in a row it can be extremely slow, so I wanted to add in some local caching. I did this already, but I did it after making the calls, so I'm caching the results directly - this works, but I was thinking that just caching directly at the client might be the simplest answer and produce the most straightforward code. The data returned is all easily serializable.

So the clients all are something along the lines of this example - they're all very straightforward and contain a number of methods that make calls to a given service.

Python code:
class SvcClient:

    def api_one(self, args) -> ApiResult:
        ...
    
    def api_two(self, args) -> ApiResult2:
        ...
I'd like to basically extend these classes along the lines of this:

Python code:
class CachedClient:
    def api_one(self, args, clear_cache: bool = False) -> ApiResult:
        if args not in self.cache:
            self.cache[args] = _call_api(args)
        return self.cache[args]
Obviously barebones and I'd want to have a little bit more, but essentially I want to make a class that's the same as an underlying class, except modifying each of the methods a little bit. I can obviously do this by hand, but is there any way to do this programmatically, preferably that would inherit the docstrings of the base class? Mostly just trying to avoid having to do a ton of copy/pasting.

QuarkJets
Sep 8, 2008

Are you able to use functools.lru_cache? You can manually clear the cache of the decorated function with cache_clear(), you could use sched to schedule a function to run after 24h that just clears the cache and then reschedules itself in another 24h (or whatever interval you need). Or if you have your own event loop already just put the cache clearing in there

Falcon2001
Oct 10, 2004

Eat your hamburgers, Apollo.
Pillbug

QuarkJets posted:

Are you able to use functools.lru_cache? You can manually clear the cache of the decorated function with cache_clear(), you could use sched to schedule a function to run after 24h that just clears the cache and then reschedules itself in another 24h (or whatever interval you need). Or if you have your own event loop already just put the cache clearing in there

I realize I was unclear - I'm not trying to modify the existing libraries themselves (they're not mine) or necessarily fork them if I can avoid it; I'm just trying to wrap the client.

QuarkJets
Sep 8, 2008

Falcon2001 posted:

I realize I was unclear - I'm not trying to modify the existing libraries themselves (they're not mine) or necessarily fork them if I can avoid it; I'm just trying to wrap the client.

You could decorate your wrapper with lru_cache

Phobeste
Apr 9, 2006

never, like, count out Touchdown Tom, man
I'm not sure if it's a priority or not but, while this is definitely doable automatically (you can access the functions defined on a class with code via iterating the target class's dict as dir() does and then modify those functions) but this generally will prevent static type checkers from working on it since static type checkers tend to have it as a priority to not actually execute your code. To make it work with static type checkers you would have to copy/paste at least the signature lines and explicitly super-call.

Zoracle Zed
Jul 10, 2001
Anyone got opinions about API docs generation? I kinda hate sphinx for being insanely over-configurable and writing rst feels like unnecessary friction. Sphinx with numpydoc is managable but I still feel like there's got to be something better.

QuarkJets
Sep 8, 2008

Hmm it looks like applying lru_cache to methods prevents instances of that class from ever getting garbage collected, that's a bummer.

Zugzwang
Jan 2, 2005

You have a kind of sick desperation in your laugh.


Ramrod XTreme

QuarkJets posted:

Hmm it looks like applying lru_cache to methods prevents instances of that class from ever getting garbage collected, that's a bummer.
What about after using method.cache_clear()?

In any case, that certainly explains weird bugs I've run into when using lru_cache.

Falcon2001
Oct 10, 2004

Eat your hamburgers, Apollo.
Pillbug
Yeah, it also doesn't help that I'm talking about a persistent offline local cache, so lru_cache doesn't really help. I think maybe the approach I took in the first place, which is to just call the client and then cache the results by entity name, might be the simpler option than reworking the entire client. I'll just slap a time retrieved stamp on there and then set a cache expiry duration that's checked on launch or something like that. This is a proof of concept anyway so I've got room to tinker.

Edit: I wish that there was something simpler for drop-in serialization and deserialization of dataclasses, especially for any situation where you have a display element or something like that - for example, a CSV where the headers should be "Experiment Name" instead of experiment_name. There's also the other stuff like bools and datetimes/etc, or nested dataclasses. Pydantic/etc seem to include a lot more overhead, but maybe I'll check out marshmallow.

Falcon2001 fucked around with this message at 04:57 on Mar 11, 2024

QuarkJets
Sep 8, 2008

Zugzwang posted:

What about after using method.cache_clear()?

In any case, that certainly explains weird bugs I've run into when using lru_cache.

Here's the blog post I read: https://rednafi.com/python/lru_cache_on_methods/

It sounds like cache_clear() makes the object eligible for garbage collection. But also, the cache is not shared between different instances of the class unless the method is a class or static method; I would have guessed that's the case but it's nice to know for sure.

Oysters Autobio
Mar 13, 2017

Zoracle Zed posted:

Anyone got opinions about API docs generation? I kinda hate sphinx for being insanely over-configurable and writing rst feels like unnecessary friction. Sphinx with numpydoc is managable but I still feel like there's got to be something better.

FastAPI swagger page generation is pretty cool but it's not like a drop-in or anything to my knowledge (ie you have to build the API with it from the beginning).

I've been exploring python docs stuff and have found that mkdocs and mkdocs material have a big ecosystem of plugins and the like, all built on markdown instead of rst. Might have some API docs gen options.

Hughmoris
Apr 21, 2007
Let's go to the abyss!
Rookie question here:

What is the easiest way to convert/print a Jupyter notebook to PDF that will look good? I need to submit it for a class.

M. Night Skymall
Mar 22, 2012

Hughmoris posted:

Rookie question here:

What is the easiest way to convert/print a Jupyter notebook to PDF that will look good? I need to submit it for a class.

https://nbconvert.readthedocs.io/en/latest/usage.html probably.

Oysters Autobio
Mar 13, 2017
Just export to HTML unless they're just rigidly demanding it in PDF. I think it's nbconvert that also supports encoded assets (images etc) into the HTML itself so it works basically the same as a PDF file.

(I know this is an unwinnable battle and yet another example to flag when silicon valley ballhuggers go on and on about how the free market only creates the best standards...)

Falcon2001
Oct 10, 2004

Eat your hamburgers, Apollo.
Pillbug

Hughmoris posted:

Rookie question here:

What is the easiest way to convert/print a Jupyter notebook to PDF that will look good? I need to submit it for a class.

The other answers are probably fine, but if you're doing this for a class I'd suggest asking the teacher / TA if they have a preferred method, since you're ultimately trying to make them happy, and depending on your school, many professors could just fail your submission outright for failing to meet some arcane formatting requirement.

If they're expecting you to use jupyter, then this should be a pretty trivial question; if they're not then the question becomes 'what do you need out of my answers' should be something they should be able to answer anyway, and then it's just a matter of working backwards from there.

Edit:

Oysters Autobio posted:

Yeah sorry, don't take my advice as reason to buck against whatever stupid arbitrary thing you're asked to do.

nbconvert is exactly what you're looking for exporting pdf or other formats.

I wouldn't have said anything if it wasn't for school, because you're right and in most environments it wouldn't matter (or the dumb reasons would be something it's your job to work through). I've just had enough professors who have very weird requirements to advise caution. Knew a guy who got an assignment failed for forgetting to set his Word font to Times New Roman instead of Calibri or whatever the default was at the time and the professor just straight up gave him 0 credit because "it was in the syllabus!". I don't think he failed the whole course and so just got a B instead of an A or whatever, but yeah...academia be weird sometimes.

Falcon2001 fucked around with this message at 01:19 on Mar 12, 2024

Oysters Autobio
Mar 13, 2017
Yeah sorry, don't take my advice as reason to buck against whatever stupid arbitrary thing you're asked to do.

nbconvert is exactly what you're looking for exporting pdf or other formats.

Jose Cuervo
Aug 25, 2004
I use PyCharm community edition as my IDE for developing a flask application. When I run my Flask application through PyCharm, it sets up what I believe is a local (local to my computer) development server which then allows me to go to http://127.0.0.1:5000 and access the website from there.

My question is, can anyone else on the network I am on also navigate to http://127.0.0.1:5000 and see the website? Or am I the only one who can see the website because the server is on my computer?

12 rats tied together
Sep 7, 2006

it depends but short answer probably not. if you wanted it to be accessible on that network you'd want to run it bound to an IP address in that network.

if pycharm, specifically, binds to 127.0.0.1, it's not routable, since that means "local computer" and it's ~not possible for a packet to arrive at your computer with that destination address.

if pycharm binds to 0.0.0.0 you can still access it through 127.0.0.1, but it would also be accessible on other addresses, which means it could theoretically be routed to you, which means it could theoretically be accessed by another device in that network. (if nothing blocked it first, like a firewall)

Jose Cuervo
Aug 25, 2004

12 rats tied together posted:

it depends but short answer probably not. if you wanted it to be accessible on that network you'd want to run it bound to an IP address in that network.

if pycharm, specifically, binds to 127.0.0.1, it's not routable, since that means "local computer" and it's ~not possible for a packet to arrive at your computer with that destination address.

if pycharm binds to 0.0.0.0 you can still access it through 127.0.0.1, but it would also be accessible on other addresses, which means it could theoretically be routed to you, which means it could theoretically be accessed by another device in that network. (if nothing blocked it first, like a firewall)

Do you know how I would check if pycharm specifically binds to 127.0.0.1? (I don't want anyone to be able to access the website while I am working on it).

susan b buffering
Nov 14, 2016

Having not used Flask with PyCharm, I'd check the run configuration. You can at the very least probably set up a more specific binding in there if it isn't already what you want.

The Fool
Oct 16, 2003


flask binds to 127.0.0.1 by default, you can change that by using the --host flag

Oysters Autobio
Mar 13, 2017
Just getting into learning flask and my only real question is how people generally start on their HTML templates. Are folks just handwriting these? Or am I missing something here with flask?

Basically I'm looking for any good resources out there for HTML templates to use for scaffolding your initial templates. Tried searching for HTML templates but generally could only find paid products for entire websites, whereas I just want sort of like an HTML component library that has existing HTML. i.e. here's a generic dashboard page html, here's a web form, here's a basic website with top navbar etc.

Bonus points if it already has Jinja too but even if these were just plain HTML it would be awesome.

Oysters Autobio fucked around with this message at 00:13 on Mar 20, 2024

Adbot
ADBOT LOVES YOU

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.

Oysters Autobio posted:

Just getting into learning flask and my only real question is how people generally start on their HTML templates. Are folks just handwriting these? Or am I missing something here with flask?

Basically I'm looking for any good resources out there for HTML templates to use for scaffolding your initial templates. Tried searching for HTML templates but generally could only find paid products for entire websites, whereas I just want sort of like an HTML component library that has existing HTML. i.e. here's a generic dashboard page html, here's a web form, here's a basic website with top navbar etc.

Bonus points if it already has Jinja too but even if these were just plain HTML it would be awesome.

edit: additionally, has anyone used flask app builder before? The built in auth and admin panels are mighty appealing but I'm worried about locking into a heavily opinionated framework (why I chose flask over say Django)

You want Bootstrap or Foundation if you want some of the most common HTML libraries around. They both have whole page examples, but really it's just about designing with their grid system so that stuff works on a phone but doesn't look stupid on a desktop, which is a tricky balance.

They are just HTML/CSS/JS and you're going to be templating the dynamic parts of the site yourself, but there's a reason that Bootstrap has been the most popular option to make a new site for more than a decade now.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply