Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Zugzwang
Jan 2, 2005

You have a kind of sick desperation in your laugh.


Ramrod XTreme
Is this what you're looking for? You only wanted one element at a time from each of the later lists (list2, list3) added to each of the lists in list1, right? Not very pretty, but it seems to have gotten to what you're hinting at:
code:
from itertools import product

def add_element_to_lists(list_of_lists, list_of_elements):
    output_list = []
    for working_list, element in product(list_of_lists, list_of_elements):
        temp_list = working_list.copy()
        temp_list.append(element)
        output_list.append(temp_list)
    return output_list

for element_list in [list2, list3]:
    list1 = add_element_to_list(list1, element_list)
Results in
code:
[['A', 'B', 'C', 'J', 'M'],
['A', 'B', 'C', 'J', 'N'],
...
['G', 'H', 'I', 'L', 'Q']]

Zugzwang fucked around with this message at 03:08 on Sep 24, 2020

Adbot
ADBOT LOVES YOU

Zugzwang
Jan 2, 2005

You have a kind of sick desperation in your laugh.


Ramrod XTreme

Mirconium posted:

Wait holy smokes when did Python get (I assume optional?) types? I've been living under a rock for a while, admittedly, but holy crap!

Do the type hints improve performance substantially?
Type hinting alone doesn’t improve performance. It does help developers know what is supposed to go where and what’s supposed to come out of a function. It’s also not enforced at all; you can hint that a function takes an int argument and returns a string, but write code that accepts a list and returns a dict.

On the other hand, if you write performance-intensive functions in Cython, you can potentially get huge performance gains by doing nothing special except declaring types.

Zugzwang
Jan 2, 2005

You have a kind of sick desperation in your laugh.


Ramrod XTreme

Qtamo posted:

Pandas question: how can I split a column filled with strings into multiple rows by character count? I've got a dataframe that needs to get exported into an xls to be used as an import into an ancient system that limits cell character count to 100 characters. Since the strings are sentences, I'd prefer to split by the whitespace right before hitting 100 characters, but I haven't found a solution. It doesn't need to be efficient, the dataframe is only ~200 rows and probably somewhere around 1000 rows or less after it's been split.

Basic idea:
code:
Initial table
Name			String
Jimmy			300 characters
John			200 characters
Jane			100 characters
code:
Result
Name			String
Jimmy			100 characters
			100 characters
			100 characters
John			100 characters
			100 characters
Jane			100 characters
One simple way would be to use df.iterrows(), and fill a list (let's call it row_list) with each row you want into a dictionary with "Name and "String" keys.

At the first entry for Jimmy, split the string into a list of however many 100-char strings, and append a dictionary with {"Name": "Jimmy", "String": [first 100-char string]} to row_list.

Then loop over the remaining strings, appending a dictionary to row_list with {"String": [next 100-char string]} until the strings are exhausted.

Finally, make a df with DataFrame(row_list, columns=['Name', 'String']).

Should work fine. Looping over dfs isn't efficient but as you said, it's only a few hundred lines so whatever.

Edit: like this.
code:
row_list = []
for _, row in df.iterrows():
    split_string = [row['String'][i: i+100] for i in range(0, len(row['String']), 100)]
    for i, chunk in enumerate(split_string):
        name = row['Name'] if i == 0 else ''
        row_list.append({'Name': name, 'String': chunk})
df2 = DataFrame(row_list, columns=['Name', 'String'])

Zugzwang fucked around with this message at 16:59 on Nov 11, 2020

Zugzwang
Jan 2, 2005

You have a kind of sick desperation in your laugh.


Ramrod XTreme

Qtamo posted:

Thanks for this. I'd read the warning in the pandas docs about not modifying something I'm iterating over and for some reason it didn't occur to me to just throw the stuff into a new dataframe, so I avoided iterrows altogether :doh:
Glad to help! You could also just reassign df to the new DataFrame, i.e. use df = DataFrame(args) at the end. That's not the same as avoiding the modification of something you're iterating over -- an actual example of that would be deleting a dictionary key while iterating through dict.items() or something.

Zugzwang
Jan 2, 2005

You have a kind of sick desperation in your laugh.


Ramrod XTreme
My computer sure did some interesting stuff when I tried to create a ~5 GB string from a list and write it to a text file all in one go. Only made that mistake once.

Zugzwang
Jan 2, 2005

You have a kind of sick desperation in your laugh.


Ramrod XTreme
Do any of y'all have experience distributing packages with Cython? I've been Googling about this as much as I can and have also tried reverse-engineering the Cythonizing components of packages such as pandas, all to no avail.

Right now my folder structure is:
code:
setup.py
setup.cfg (didn't always have this, didn't make a difference when I did)
cy_test/
    __init__.py
    py_code.py
    cy/
        __init__.py
       cy_code.c
       cy_code.pyx
py_code.py and cy_code.pyx both contain boring functions that either print 'hello' or return 5. They work, and the Cython stuff builds fine if I do it locally through the command line, but nothing except __init__.py makes it into the installed cy/ directory when I do a local pip install. The py_code.py-related stuff installs and imports fine.

setup.py currently looks like this:
code:
from setuptools import Extension, setup, find_packages
from Cython.Build import cythonize
import os

extensions = [
	Extension(
		name='cy_test.cy',
		sources=['cy_test\\cy\\cy_code.pyx'],
		language='c',
	)
]

setup(
	name='cy_test',
	version='1.0',
	description='Cython test',
	packages=find_packages(),
	ext_modules=cythonize(extensions, compiler_directives = {"language_level": 3, "embedsignature": True}),
	install_requires=[
		'cython',
	],
	zip_safe=False
)
I've tried including and not including the cy_code.c file in sources, doesn't make a difference.

I noticed that other Cython-using packages have a variety of helper files like MANIFEST.in in the main directory, but none of the helper files seem to mention Cython, so ¯\_(ツ)_/¯

I've also tried this with the built .pyd extension in the cy/ folder, and the imports within __init__.py just fail.

At this point, I've spent way more time on this than I care to think about. Help me Python thread Kenobi, you're my only hope.

Zugzwang fucked around with this message at 04:47 on Dec 31, 2020

Zugzwang
Jan 2, 2005

You have a kind of sick desperation in your laugh.


Ramrod XTreme
The file variable is calling read() and decode() methods. I’m not familiar with tablib either, but it sure looks like that line is converting the file into text and then using that to construct a Dataset.

If the method you tried didn’t work to construct the DataFrame, you should still inspect the contents of the file variable, since I’m not sure where else the data would be coming from. pandas’s read_csv accepts a path or buffer with a read() method, so have you tried just chopping off read() and decode() from file and passing that into read_csv?

Zugzwang fucked around with this message at 03:43 on Nov 8, 2022

Zugzwang
Jan 2, 2005

You have a kind of sick desperation in your laugh.


Ramrod XTreme

duck monster posted:

Zed Shaw is a weird dude. Saw him ranting at someone on twitter the other day about why we shouldnt use django for webapps, when C++ is available and much faster.

Yeah dude we went through that phase in the 1990s, it didn't work out well. I don't get why this guy is considered a high expert without understanding for most businesses the cost of hosting is a tiny fraction of the costs of development. Django might be a little crusty in its old age, but it's a hella productive environment, especially now the continuous cavalcade of suffering that python 2's unicode handling brought is largely a thing of the past.
Maybe his idea of learning Python "the hard way" is to first learn C++, try to use it for a web app, then realize that your substantial time investment would've gone better if you'd just gone with Python in the first place?

Zugzwang
Jan 2, 2005

You have a kind of sick desperation in your laugh.


Ramrod XTreme
I did not know about cached_property. That looks awesome and I can’t wait to simplify some of my code with it.

Zugzwang
Jan 2, 2005

You have a kind of sick desperation in your laugh.


Ramrod XTreme
+1 for ATBS. It’s not just a good intro to the language. It immediately shows you how to do truly useful things with it. (It’s also free!)

Al’s follow up book to that, Beyond the Basic Stuff With Python, is also great. It’s about how to put together nontrivial, maintainable programs once you know the ins and outs of the language.

Zugzwang
Jan 2, 2005

You have a kind of sick desperation in your laugh.


Ramrod XTreme
I just install everything into base and use only the packages that win the ensuing melee.

Zugzwang
Jan 2, 2005

You have a kind of sick desperation in your laugh.


Ramrod XTreme
Just eyeballing it, looks like the part I highlighted in this line is your issue:

profilePic = profileDiv.find_element(By.XPATH, ".//img[contains(@class, 'presence-entity__image')]").get_attribute('src') if profileDiv.find_element(By.XPATH, ".//img[contains(@class, 'presence-entity__image')]").get_attribute('src') else “”

The problem is that if it doesn’t find profilePic, you are still trying to call the get_attribute method on it. Which will crash. Is your error message something like “NoneType has no method get_attribute”?

Anyway, cut that method call and see if it works.

Zugzwang
Jan 2, 2005

You have a kind of sick desperation in your laugh.


Ramrod XTreme
You could also get an arbitrary number of floats by having the user specify something like a space-delimited string. For ex:
code:
numbers = [float(val.strip()) for val in input().split(‘ ‘)]
And then you could enter “30.0 50.0 10.0 100.0 65.0” when prompted. Or however many numbers you want, separated by a space.

Zugzwang
Jan 2, 2005

You have a kind of sick desperation in your laugh.


Ramrod XTreme
Speaking of comprehensions, is there a Pythonic way to populate a list with elements from an iterator until the list hits a certain size? Like, the naive implementation (with the desired list size of 20 in this example) is:
code:
good_stuff = []
for element in stuff_of_potential_interest:
    if len(good_stuff) >= 20:
        break
    if meets_filtering_criteria(element):
        good_stuff.append(element)
But I have some functions that have lots of layers of nesting and filtering and I’m trying to flatten them as much as possible. Having all these if statements with break gets old, and while loops (e.g., while len(good_stuff) < 20…) don’t help the nesting issue much.

I tried looking into some lesser-used functions in itertools but didn’t see anything that jumped out as being appropriate. :shrug:

Zugzwang fucked around with this message at 05:16 on Feb 15, 2023

Zugzwang
Jan 2, 2005

You have a kind of sick desperation in your laugh.


Ramrod XTreme

i vomit kittens posted:

Is there a reason islice wouldn't work or did you just miss it when going through itertools?
Hm. Maybe, if I build the filtering criteria into the iterator and turn it into a generator expression instead of a for/while loop. Thanks, will give that a shot. I’m trying to learn how more of these functional programming iteration constructs are used, and it still feels like dark magic.

Zugzwang fucked around with this message at 06:10 on Feb 15, 2023

Zugzwang
Jan 2, 2005

You have a kind of sick desperation in your laugh.


Ramrod XTreme
Thanks for the iterator input, goons. That was exactly what I needed. Simpler, less verbose/nested code and with better results :buddy:

Zugzwang
Jan 2, 2005

You have a kind of sick desperation in your laugh.


Ramrod XTreme
Do you mean you want to make a list containing grade #n from every list?

That can just be
code:
nth_grades = [grades[n] for grades in student_grades.values()]

Zugzwang
Jan 2, 2005

You have a kind of sick desperation in your laugh.


Ramrod XTreme

duck monster posted:

I've been using python since the 1990s.

And the thing still finds new ways to delight me in the "how did I somehow not know about this amazing thing?" way.

code:
a = ['hey','there','dude']
b = [1,2,3]
zip
<class 'zip'>
zip(a,b)
<zip object at 0x100ae5c00>
dict(zip(a,b))
{'hey': 1, 'there': 2, 'dude': 3}
list(zip(a,b))
[('hey', 1), ('there', 2), ('dude', 3)]
zip() is great. Python’s built-ins do so many cool things that aren’t always obvious.

Like, I recently discovered (through Trey Hunner’s newsletter) a way of avoiding a common for/break pattern. Let’s say you want to assign foo to the first item in an iterable that meets some criterion (such as the first even number), and if you don’t find it, then assign it to None. The verbose implementation is:
code:
numbers = [1, 3, 5, 8]
foo = None
for num in numbers:
    if num % 2 == 0:
        foo = num
        break
foo will be 8 here. And if there were no evens in the list, it’d stay as None. But the code is…not elegant to say the least. You could instead do this as a one-liner:
code:
foo = next((num for num in numbers if num % 2 == 0), None)
Basically next() will run through your generator expression until a value is yielded. If one is, it gets assigned to foo. If not, foo gets assigned to the default argument you supplied, None.

Seems like Python always has a way of simplifying really ugly code.

Zugzwang
Jan 2, 2005

You have a kind of sick desperation in your laugh.


Ramrod XTreme

FISHMANPET posted:

A friend pointed out to me that I don't need the default value in next, especially when the next step in my code (loading the URL I found) would fail if the default value happened, so I just took that out. It'll cause a StopIteration exception which I'm not handling, but what it really means is something is messed up with the project I'm reading from and I'll need to fix my code regardless.
Yeah, I guess the one to pick depends on the context of the rest of your code and how you want to handle it. "Is there an even number in this collection? If not, assign variable to None" vs "roll through all these values and raise StopIteration if you hit the end" both make sense.

Data Graham posted:

Yeah, I hear the Kill Bill klaxon in my head whenever I see [0]

IndexError waiting to happen.
Sometimes it's impossible to avoid if you're calling certain functions that you know will return at least one unpackable value. Some NumPy functions are like this (at least, to my non-NumPy-expert knowledge). But yeah I hate it.

Zugzwang fucked around with this message at 06:17 on Mar 4, 2023

Zugzwang
Jan 2, 2005

You have a kind of sick desperation in your laugh.


Ramrod XTreme
“Beyond the Boring Stuff with Python” and “Practices of the Python Pro” are both great books that are about how to write good Python in general.

Wes McKinney’s (creator of pandas) book is online for free too: https://wesmckinney.com/book/

Also, check out polars (the package) for data analysis. It’s a newer DataFrame library written in Rust. It isn’t a full replacement for everything pandas does, but in general, it’s comically faster.

Zugzwang fucked around with this message at 23:26 on Mar 4, 2023

Zugzwang
Jan 2, 2005

You have a kind of sick desperation in your laugh.


Ramrod XTreme
I really like Al Sweigart’s book Automate the Boring Stuff with Python. It’s “here are the basics of Python, now here are a bunch of very useful things you can do with it.” It’s available for free at https://automatetheboringstuff.com

For further education, the YouTube channel ArjanCodes is great.

Zugzwang
Jan 2, 2005

You have a kind of sick desperation in your laugh.


Ramrod XTreme
PySimpleGUI is legit. It’s basically ergonomic wrappers for the built-in Tkinter library. Not sure if it does everything you want, but it has a lot of demos and examples here: https://www.pysimplegui.org/en/latest/cookbook/

Zugzwang
Jan 2, 2005

You have a kind of sick desperation in your laugh.


Ramrod XTreme
How reliable are modern Python-to-exe packages? I recently learned that a research tool I’ve been working on for my job might eventually need to be deployed to outside users, probably closed source. And due to IP reasons on their end, it can’t be a web app hosted on AWS or whatever because their data/results can’t be on a system outside of their company at any time.

Zugzwang
Jan 2, 2005

You have a kind of sick desperation in your laugh.


Ramrod XTreme

StumblyWumbly posted:

I've used pyInstaller, there are times when it is a massive pain but once you get it working it is fine. Things to be aware of:
- Including non-Python files takes some effort, like if you need to load in some settings from a .json that you expect to be included in the EXE
- Create the exe from a venv that only has what you need, otherwise it may be huge
- It is fairly trivial to reverse "compile", the scripts are just zipped up with an Python executable, so don't expect to have any secrets there

CarForumPoster posted:

You should never ever expect users to have a python install. Thats asking for it-works-on-my-computer hell. PyInstaller has worked well for me.
Good to hear it's working reasonably well now. Being easy to reverse-engineer would be a dealbreaker though if I do need to make it closed-source.

QuarkJets posted:

iirc Singularity containers are basically designed for this kind of situation, where you don't trust either the system owner or other users on the system. And you can build one from a docker container so you can get the best of both worlds
That might be what I need, thanks!

Zugzwang
Jan 2, 2005

You have a kind of sick desperation in your laugh.


Ramrod XTreme
Yeah, that makes sense. This is still hypothetical at this point, so I am mostly trying to figure out what paths I would take if we needed to go there.

To be frank, I’m not even sure it’s feasible for this to be closed source, and I’d rather it not be. Will ultimately be up to management at my organization though.

Zugzwang
Jan 2, 2005

You have a kind of sick desperation in your laugh.


Ramrod XTreme

duck monster posted:

Ah my naive child.

The real world is filled with these poisonous documents. Especially in the sciences and in government.

Wait till you get a load of how they do unicode.

Excel is hell, and excel is everywhere.
Just yesterday, Excel very helpfully removed the leading and trailing zeroes from numbers that were supposed to be strings. What would I do without assistance like this?

Zugzwang
Jan 2, 2005

You have a kind of sick desperation in your laugh.


Ramrod XTreme

Falcon2001 posted:

Here's my smarmy one-line take on it, thanks to the Python standard library.

Python code:
from itertools import permutations

def value_of_x(lst: List[int]) -> bool:
	return any([bool(y == x*2) for x,y in permutations(lst, 2)])
You could also cut the square brackets from any(). This way you get a lazy generator and don't first have to instantiate a list of all those permutations/combinations.

Anyway itertools owns and I am constantly trying to learn more about how to apply its dark magic.

Zugzwang
Jan 2, 2005

You have a kind of sick desperation in your laugh.


Ramrod XTreme
Seconding both ArjanCodes and mCoding. The former has a great deal of stuff on general software engineering, with examples implemented in Python. The latter is mostly but not exclusively Python (he also covers C++ stuff sometimes).

Zugzwang
Jan 2, 2005

You have a kind of sick desperation in your laugh.


Ramrod XTreme
"ChatGPT or idiot?" will now be a question I ask myself frequently.

I recently read a blog post like that one that extolled C++ as a language known for being simple and easy to learn.

Zugzwang
Jan 2, 2005

You have a kind of sick desperation in your laugh.


Ramrod XTreme
IIRC polars doesn't have all the functions pandas does, though I'm not enough of an expert in either to go into detail.

But yeah I've almost totally switched over to polars. The differences in ergonomics, memory usage, and especially speed vs pandas are just plain unfair.

Zugzwang
Jan 2, 2005

You have a kind of sick desperation in your laugh.


Ramrod XTreme
Yeah I really wish Python had something like "type hinting is purely optional, but they will be enforced if you specify them." As opposed to "the Python interpreter does not give the slightest gently caress about type hints."

Julia has the former as part of its multiple dispatch model, but sadly, adoption of Julia continues to be pretty underwhelming.

Zugzwang
Jan 2, 2005

You have a kind of sick desperation in your laugh.


Ramrod XTreme
Cursed comedy option: use for i, _ in enumerate(list), then access elements via the index :getin:

Zugzwang
Jan 2, 2005

You have a kind of sick desperation in your laugh.


Ramrod XTreme
Why write something in Python if C++ will do the job??

Zugzwang
Jan 2, 2005

You have a kind of sick desperation in your laugh.


Ramrod XTreme
I don't have time to import time from time

Zugzwang
Jan 2, 2005

You have a kind of sick desperation in your laugh.


Ramrod XTreme
I personally recommend PyInstrument as a profiler because it only shows you the things where your code spends substantial time. Its HTML output ability is nice. IIRC cProfile shows you every operation/call, many of which will be irrelevant to the program's speed of execution.

Anyway, your question is pretty broad. Sometimes the issue comes from a bit of code that works fine when your dataset is small and not so fine when it isn't. I once had a small section of code that was consuming 25% of the program's runtime, and it's because it was repeatedly calling min() on a very large, growing set. This is not only O(n), it was n operations every time. Tracking the minimum value through another means sped up that section over 500x, and that bit of code didn't even register as noteworthy in the profiler anymore. :shrug:

Zugzwang
Jan 2, 2005

You have a kind of sick desperation in your laugh.


Ramrod XTreme
I'd only use it if I expected a specific kind of error to arise that I knew was okay to ignore.

One of the files I need to regularly read at work comes from a 3rd party and is essentially a giant LZMA2-compressed text file. Whenever I get to the end of it, it raises an EOFError due to a glitch in whatever they're using to compress it. (This error also shows up in 7-Zip; or rather, 7-Zip says "hey this file looks a bit odd" but still reads it okay.) I had to write a context manager specifically for ignoring EOFErrors in that dumb file type because otherwise my code would spend like 30 minutes rolling through it completely fine and then crash when it got to the very last line.

Zugzwang
Jan 2, 2005

You have a kind of sick desperation in your laugh.


Ramrod XTreme
LLMs are useful when there's a lot of data ("how do I web scrape this page in Python") but decidedly less useful when there's not. :shrug:

It doesn't help at all that they don't actually understand anything, they just mush together stuff that is usually mushed together.

Zugzwang fucked around with this message at 19:59 on Oct 6, 2023

Zugzwang
Jan 2, 2005

You have a kind of sick desperation in your laugh.


Ramrod XTreme

ComradePyro posted:

succinct description of most reddit comments
:hmmyes:

Zugzwang
Jan 2, 2005

You have a kind of sick desperation in your laugh.


Ramrod XTreme
I am a mediocre programmer and only know how to commit sane amounts of crime

Adbot
ADBOT LOVES YOU

Zugzwang
Jan 2, 2005

You have a kind of sick desperation in your laugh.


Ramrod XTreme
Wes McKinney's (creator of pandas) book is available for free on his site: https://wesmckinney.com/book/

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply