|
I have very limited programming experience generally and even less experience with Python, so I'll apologize if this is a really stupid question, but I wasn't able to find much from googling: I've got a csv file with a little over 90,000 rows that each have a key in the first column and a value in the second. I also have a list of keys that I want to retrieve the values for. Currently, I'm using csv.reader to read the file into a dictionary and then looping through my list of keys to retrieve the value for each from the dictionary. This works, but I have a feeling that this is a really stupid/inefficient way of going about things. The other approach that comes to mind is creating a duplicate of the list of keys that I want to retrieve values for, iterating through the rows of the file checking if that row matches any of the keys I'm after, storing the value and removing the key from my duplicate list if it does match, and continuing on until the duplicate list is empty. Am I an idiot? Is either of these approaches appropriate? Is there a better solution? Wallet fucked around with this message at 17:43 on Jan 8, 2018 |
# ¿ Jan 8, 2018 17:40 |
|
|
# ¿ Apr 29, 2024 03:03 |
|
Thanks for the responses, guys. I may have to try pandas, but I'll probably stick with the current implementation given that it's apparently reasonable.
|
# ¿ Jan 8, 2018 20:07 |
|
QuarkJets posted:The best approach may depend on what you want to do with the keys and values. Do you want to iterate over every key/value pair? This is a good question, as I realized that I was only really considering the function that was finding the values for a given list of keys while the reason was probably important. It turns out that the only reason the function was ever being called was to figure out which key in a list of keys had the highest numerical value in a given column in the file, so now I'm just creating a list of keys sorted by the relevant value and then iterating through that list until any of the desired keys is found, at which point the rest don't matter. No one will be surprised to learn that this is faster.
|
# ¿ Jan 9, 2018 17:26 |
|
Edit: Turns out the issue wasn't what I thought it was.
Wallet fucked around with this message at 17:09 on Jan 13, 2018 |
# ¿ Jan 13, 2018 14:06 |
|
I have another potentially stupid question that I can hopefully explain in something resembling a comprehensible fashion: Let's say I have a paragraph of text with hard line breaks in a list, with each line as an item: code:
code:
code:
code:
|
# ¿ Jan 26, 2018 14:43 |
|
Thanks for the responses guys, that's more or less what I was thinking but I wasn't sure if I was missing a super clever approach. I will have to look into zip, however, as I wasn't aware of it and it seems extremely useful.
|
# ¿ Jan 26, 2018 18:07 |
|
Is there any reason you're storing the data you're reading in as strings instead of integers?
|
# ¿ Jan 28, 2018 22:02 |
|
If I have a list of numbers, like so: [0, 14, 18, 20, 36, 41, 62, 70, 72] And I want to extract all longest possible lists from it where no two consecutive items are more than, say, 5 apart: [[14, 18, 20], [36, 41], [70, 72]] It's not hard to construct a loop to do this with a few conditionals, but is there an elegant python solution?
|
# ¿ Feb 8, 2018 17:32 |
|
baka kaba posted:I think you'll have to post your code to know if there's a more elegant way, but I don't think Python really encourages anything besides the ol' for loops in this case baka kaba posted:
Python code:
Dr Subterfuge posted:Unless that first term is always supposed to be ignored, it would seem that your output list should start with a [0]. Wallet fucked around with this message at 00:23 on Feb 9, 2018 |
# ¿ Feb 9, 2018 00:15 |
|
Eela6 posted:
Now using a modified version of this that self-filters (the numbers are indexes and can't be negative, though that's a good catch). I had no idea yield or generators were a thing. I did a little reading which left me slightly puzzled, but stepping through it made it a lot clearer what's going on. That was really interesting, and will probably be useful in a lot of other contexts. Thanks!
|
# ¿ Feb 9, 2018 01:41 |
|
Your approach is much more readable, at least to me.baka kaba posted:You can do a length check before you yield (so you only get lists with 2 or more items), but there's two problems - one is that you have to put that code in there twice (you're yielding in two different places), and it also ties that implementation detail into the grouping function. It doesn't just split your list into groups where each item is close to the last, it also does another job of removing lists below a certain size. Personally I'd argue those are two different things (you might not even want the filtering all the time, or you might want to change the size limit), so it makes sense to do that separately, running it on the results of the grouping function I take your point about them being different functions, though in this case there isn't really a context where you'd want length 1 groups. You can do the filtering without duplicating the code twice with a little tweaking (only one extra line of code, I think?): Python code:
baka kaba posted:That doesn't necessarily matter, but you can see how you can set up a big chain that yielded items have to pass through, getting tweaked or dropped or whatever, creating a machine that produces single, finished items on demand. And that can be good for working with very large amounts of data, or something slow where you want to drop things as early as possible if you're not going to be using them later in the pipeline Yeah, there's some combinatorics stuff I was messing with the other day that this approach will probably make possible, where before it just ate all of my memory and poo poo its pants.
|
# ¿ Feb 9, 2018 03:39 |
|
JVNO posted:In list logic, .remove will remove the first item in a given list that matches the query... You can use del to remove an item from a list by index (del list[0]), which might help, depending on what you're actually trying to do.
|
# ¿ Mar 2, 2018 00:55 |
|
I have what might be a dumb question which I've tried to explain comprehensibly with questionable success, but hopefully someone can point me in the right direction. I'm dealing with anywhere from a hundred to a few thousand objects in a list (or not in a list, but the order of the objects is relevant), each of which have ~five attributes which I'll call object.a through object.e for simplicity's sake. What I need to do is find certain patterns of objects with certain attributes: for example, I might need any two objects that do not have the same value for object.a, do have the same value for object.b, and are not separated by any objects that have one of a particular set of values for object.d. There are 30 or so patterns I need to match, many of which don't have a fixed upper bound for how many objects can fit in different positions within the pattern, and many of which require matching the values of objects to each other or the number of objects fulfilling one condition to the number fulfilling another condition (ex: any number of consecutive objects with the same .a value followed by the same number of objects with the same .b value). I could write an individual function to handle each pattern individually, but I'd rather not for obvious reasons, and I can imagine performance becoming a nightmare pretty quickly. Basically, I need the functionality of regex but for objects and across multiple attributes. What's the most reasonable way to approach this?
|
# ¿ Mar 3, 2018 21:08 |
|
Seventh Arrow posted:Wouldn't it be possible to do a regex with an if/elif/else setup? That's the first approach that came to mind, but it seems pretty cumbersome. QuarkJets posted:What are those attributes? Strings? Floats? Other classes? Strings mostly, a few are booleans. JVNO posted:I've completed most of the coding for the L8T trials... But I'm still dealing with a few bugs. Given that it doesn't need to be truly random, it seems like it would be easier to create a function that finds all valid indices where a given form can be inserted and then have it pick a random one, although you would create configurations that would be impossible to complete some percentage of the time. Edit: Like this, but less poo poo/lazy, probably with some if statements for find_spot managing to find no valid positions, and maybe even some randomness in the order of insertion (it's been a long day, but this seems to work correctly): Python code:
Wallet fucked around with this message at 01:17 on Mar 4, 2018 |
# ¿ Mar 4, 2018 00:18 |
|
JVNO posted:Edit: Welp, that's a story as old as time. Spend days working on a piece of code only to have a much simpler solution presented after you finally figure it out. That version works and is a hell of a lot better than my code. Hope you don't mind me yanking that for my experiments? Go for it, happy to help. Just mind that I think it's theoretically possible for it to get itself into a state where it can't finish, which you might want to account for.
|
# ¿ Mar 4, 2018 04:08 |
|
JVNO posted:I ran 100 000 iterations of your list generation with no errors, so I'll take my chances Fair enough; I wasn't sure if the distribution was always the same or not, and I was also too lazy to test it 100,000 times.
|
# ¿ Mar 4, 2018 13:20 |
|
Seventh Arrow posted:No, I'm using the u/p that they sent me in an email - so it's possible that the info is incorrect. Wouldn't an invalid username or password bring up a different kind of error, though? Presumably 401, yes.
|
# ¿ Mar 31, 2018 17:31 |
|
unpacked robinhood posted:Is there a simple common method to correct typos ? I don't know if there's a common method, but pyenchant implements Enchant in python and works pretty well (although the author of pyenchant has very recently stopped actively maintaining the project).
|
# ¿ Apr 8, 2018 00:01 |
|
unpacked robinhood posted:I'm giving it a try but I'm missing something. Is this a correct usage to spellcheck against a word list: You probably want something like this: Python code:
Wallet fucked around with this message at 14:18 on Apr 8, 2018 |
# ¿ Apr 8, 2018 14:15 |
|
unpacked robinhood posted:Thanks ! At a glance it seems way faster too. It's fairly performant from my experience; I've been using it in a recent project to deal with checking fairly large texts against a 130,000 entry pwl dictionary and it generally takes longer to load the word list than it does to check the words.
|
# ¿ Apr 8, 2018 23:06 |
|
Yeah, I'm pretty sure you're going to need to convert at least linebreaks for formatting. json2html might do what you need fairly easily. I'm not sure if you actually need it to be functional JSON or if you're just trying to make it readable.
Wallet fucked around with this message at 12:54 on Apr 12, 2018 |
# ¿ Apr 12, 2018 12:51 |
|
<pre> wraps around preformatted text, which is displayed in fixed-width and preserves consecutive spaces and linebreaks. It's probably a good solution.
|
# ¿ Apr 12, 2018 16:06 |
|
Dominoes posted:Pycharm's also an outstanding IDE for web languages (JS/TS/HTML/CSS etc), and with the official plugin, Rust. Really? I assumed that wouldn't be the case as Jet Brains has a separate JS IDE.
|
# ¿ Jul 2, 2018 00:08 |
|
du -hast posted:I would imagine this is too obscure for anyone to help, but I am having some trouble. I'm not sure I 100% follow what you're trying to do, but can you not get what you're after using an xpath like code:
Wallet fucked around with this message at 17:17 on Aug 26, 2018 |
# ¿ Aug 26, 2018 17:08 |
|
CarForumPoster posted:I have a dumb question, at some point in developing stuff does it become more natural to read and work with json? Like are there some of you who can deal with json as intuitively as you would data in a table/dataframe? I find JSON rather convenient and straightforward to work with, but it's definitely not the format I would chose if being able to conveniently read the data is important. When people try to get too cute with it and use it to store data that is irregular or deeply nested it quickly turns into a massive ball-ache. cinci zoo sniper posted:Despite having JSON sources in each work project, however, and being comfortable working with it, I still loathe the format immensely. What makes you loathe it, out of interest?
|
# ¿ Nov 22, 2018 21:55 |
|
cinci zoo sniper posted:My use case is consumption of complex documents, and I'm not a web developer, so it's lack of quite literally everything that makes XML "bloated" - schema standard, querying standard, metadata, custom data types, namespaces, comments. Hell, even CDATA although it's often a container for war crimes against common sense. JSON does not offer a single decisive advantage that I can think of. That makes sense. I feel like the decisive advantage of JSON is that it's a quick, straightforward way to dump/store/load simple data. I'm currently dealing with the mess left by a bunch of idiots who decided to use Mongo for a project that desperately needed a relational database, so I am entirely sympathetic to the hell people create when they try to use tools that are too simple for the job.
|
# ¿ Nov 23, 2018 15:21 |
|
Methanar posted:Mongo and javascript are really cool. Basically this, over and over again forever. The best way to store a date? Clearly a string, except when you feel like using an ISODate. Want to record whether something is supposed to be on or off? Use a string and set the value to "on" or "off", obviously. Want a bunch of strings all bundled together, possibly in order? Use an object, and make sure that all of its properties have names that include apostrophes and dashes, then use them to store your strings. Want to record the ID of a related record and what type of relationship the current record has to it? Don't just use a plain old string—instead, use the ID of the related record as the field name and then store the kind of relationship in that field. What could go wrong? Wallet fucked around with this message at 01:39 on Nov 24, 2018 |
# ¿ Nov 24, 2018 01:31 |
|
QuarkJets posted:This discussion is reminding me that I hate lambda and how it always makes code look super ugly. Does anyone else hate lambda? Yes. It's yucky.
|
# ¿ Nov 29, 2018 01:45 |
|
Bundy posted:Seconding this, why use something as awful as pickle when json exists? This is what I was thinking as I read that, but then I thought maybe they know something I don't. A simple dictionary is exactly the kind of case where json is super convenient, particularly if you want it to be easy to use with whatever else.
|
# ¿ Apr 29, 2019 13:52 |
|
shrike82 posted:As an aside, do you guys learn mainly from written stuff (books, articles etc.)? I find the push towards all kinds of reference information being transformed into video content completely insufferable. Not only do people talk way, way slower than you can read, it also makes it impossible to skim for the bit of information you actually want. For other kinds of content I prefer video—I'd rather watch conference talks than read them (there have been some awesome ones posted in this thread)—but if I want technical information video can gently caress right off. As far as actual books, I find them much more useful for learning about theory/architecture/etc rather than highly specific detail simply because of how quickly that stuff goes out of date. To be fair, I don't think I've ever actually followed an online tutorial all the way through in my life no matter what form it was provided in. Wallet fucked around with this message at 13:33 on May 3, 2019 |
# ¿ May 3, 2019 13:27 |
|
shrike82 posted:Preprocessing texts in spacy shouldn't be that compute intensive so consider just storing the raw texts and computing on demand especially if users are given the option to upload new texts. I imagine this is the way you'd want to go because spaCy is taking text and adding a whole lot of extra data to it by its very nature. Depending on what you're up to you could also split the difference and store the original text and whatever the relevant processed/aggregated results are but not the entire marked up data from spaCy.
|
# ¿ May 31, 2019 13:47 |
|
baka kaba posted:You probably need to convert each name and your input letter to lowercase (or whatever, just so there's no mismatch) because 'e' won't match 'E' The list of vowels posted has both capital and lowercase (for whatever reason) so I assume that's not the issue. I'm also not sure I really follow what the question is, exactly, since Mycroft Holmes seems to be talking about two different questions at the same time. Mycroft Holmes posted:so i need to find each instance of the values in list vowels in list a and then increment a counter. There's no reason to use a range here, because you want to iterate through all items in a list and for x in list will do that right out of the box. Based on the questions you posted before it, this seems to be a lesson about loops, and so I assume they want you to use nested loops to do something like this: Python code:
Python code:
Python code:
Wallet fucked around with this message at 23:22 on Jun 25, 2019 |
# ¿ Jun 25, 2019 23:20 |
|
The Fool posted:I believe he was counting just 'E' or just 'e', and I assume he needed the count of both. Entirely possible.
|
# ¿ Jun 25, 2019 23:38 |
|
a foolish pianist posted:I really prefer a control variable set ahead of time to a while True: ... break() There are more or less infinite ways to do this but if you just put it in a function you can hide it somewhere and never look at your dirty while True loop: Python code:
Wallet fucked around with this message at 00:03 on Aug 30, 2019 |
# ¿ Aug 29, 2019 22:26 |
|
necrotic posted:Given the "problem" requires repeating the question unless "Y" is given, and having different responses for "N" vs any other character, that doesn't seem like a great approach... I was being lazy (and still am), but you get the idea: Python code:
Wallet fucked around with this message at 13:44 on Aug 30, 2019 |
# ¿ Aug 30, 2019 13:24 |
|
bromplicated posted:I wrote it in a way where it defines a new function for each food category. Is there a way I could write one function to handle all the different food types? Solumin gave a great overview of why you might want to use classes and what you might use them for. I don't know exactly what your script did, but to give you a simple version of putting different categories of food into a fridge and then listing everything that was added back at the end: Python code:
Wallet fucked around with this message at 01:03 on Sep 5, 2019 |
# ¿ Sep 5, 2019 00:39 |
|
cinci zoo sniper posted:Viewing this through work environment prism, I prefer native > cli > web, for performance/power efficiency and consistent applicability of accessibility features. In that context my preference for web is stronger than it is personally (I don't really give a poo poo for personal use) because web poo poo (at least now that flash is well and truly dead) has largely been designed to accommodate presenting reflowable content on devices of different sizes with different capabilities while other things haven't. If performance is a serious concern then native becomes a lot more attractive but it often isn't for me.
|
# ¿ Jan 9, 2020 22:38 |
|
Private Speech posted:I don't have that much web experience but from I've seen you basically end up making different designs for smartphones, tablets, etc., and then switching between them depending on the screen size, which doesn't seem dramatically easier. And you still need a library (or plural) for that. Partially, but that's because you simply don't want content to be displayed the same way on a vertically oriented phone screen as you do on a big old computer monitor. CSS handles checking viewport sizes and switching styles/layouts just fine out of the box. A system like Bootstrap isn't set up (out of the box) for making completely different designs for different types of devices, it just has breakpoints that allow you to modify the sizes/layouts of elements at different screen sizes very easily.
|
# ¿ Jan 12, 2020 14:43 |
|
KICK BAMA KICK posted:Only thing I would add is, unless I'm missing something the list isn't really serving any purpose -- just start with a set(), put the numbers the user enters into it with the add method instead of append (the different method names reflect that lists are ordered while sets are not, but they do the same thing), and then at the end check whether the set contains five values. Really you would combine these two improvements so you're setting N once and both using that as your range and also checking the length of your set against it.
|
# ¿ Feb 4, 2020 14:27 |
|
|
# ¿ Apr 29, 2024 03:03 |
|
Rocko Bonaparte posted:Edit: Does anybody else got the impression that their Google search results improve after posting a question about their problem? Yes but mostly because after I have formulated it into a question I don't feel embarrassed to ask other humans I have a better sense of the specific thing I actually want an answer to.
|
# ¿ Feb 4, 2020 23:38 |