Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Lib and let die
Aug 26, 2004

Hey y'all

I've got some hosed up data I'm working with and I fear if I keep at this on my own much longer the repeated stress of cranial impact is going to eventually shatter my desk to pieces.

I (currently) quote co-delivery service contracts for a living. This involves me going into our vendor's website and putting together a quote using their pricing tool. Once I've got the vendor quote in-hand, I then have to take the guts of that quote and put it into a quote template that my company uses, and then present that to the customer. I can get the vendor quote as an xml file (some kind of weird-rear end xml file that needs some sort of remote auth key for me to even try and do anything in python with it, so that's out), as a live html page, a csv file, or an xlsx document. Since it's all just tables anyway, I usually pull down either the HTML version or the XLSX version.

Easy-fuckin-peasy, copy paste from one workbook to the other, right?

loving wrong.

Look at this poo poo. Look at this absolute horseshit we get from our vendor:



20 columns of data, spread out across sixty-loving-six columns - with most of them loving hidden. What's in them, you ask?

Absofuckinglutely nothing. They're blank. Not a loving thing in them, Just resized down to 0.0001 pixels wide, and merged and centered across the range.



Of course, when I try and paste this data into a clean sheet, it's an abhorrent mess of empty cells that don't even line up properly (see row 1 as compared to the remaining rows) :



I can, fairly efficiently, put in the manual work to unfuck it - punch row 1 over into column b, multi-select empty rows and columns, remove, blah blah blah. This is making me insane. It's a bug-gently caress-nutty time sink and it drags our entire team down. I don't want to do it manually anymore.

So far:

I've tried macros, which have helped to automate some of the process - the number of blank columns is always consistent so those are easy to account for in a macro. Unfortunately, the variable number of line items (and tables full of line items - there can be multiple sites in a quote, each with their own table of line items) make removing the rows an imprecise science at best.

I've tried a little bit of python but I'm still very new to it, I thought about using a for loop but the inconsistent number of blank rows/columns that need to be removed is a bit beyond my level of understanding right now.

Ultimately, I want the most simple, plain table of data you could ask for.



What do you think would be the best way to approach something like this, save finding whomever designed the format for these quotes and holding their head underwater until that one last bubble "bloop"s its way to the surface?

Adbot
ADBOT LOVES YOU

Lib and let die
Aug 26, 2004

Harminoff posted:

Have you tried anything with Power Query yet?

I adore Power Query. I use it when I don't feel like spinning up wamp and doing things with SQL - though I'll admit the way the raw data comes out had me thinking that Power Query would just...poo poo itself trying to do anything with it. I may give it a go


DRINK ME posted:

I get to deal with this sort of stuff for work every now and then and congrats - that’s some of the worst data I’ve seen.

My first step would be to try all the export file types and see if any of them are a bit cleaner once pulled into Excel, but I’m kind of doubtful with what you have here.

Next step would be open the file, run the below to remove line breaks, carriage returns and then see how you’re looking, I think this might fix some of the main issues.

code:
Bad_char1 = chr(7)
Bad_char2 = chr(10)
Bad_char3 = chr(13)
Good_char = “ “

Cells.replace what:=Bad_char1, replacement:=Good_char, lookat:=xlpart, searchorder:=xlbyrows, matchcase:=false, searchformat:=false, replaceformat:=false

Cells.replace what:=Bad_char2, replacement:=Good_char, lookat:=xlpart, searchorder:=xlbyrows, matchcase:=false, searchformat:=false, replaceformat:=false

Cells.replace what:=Bad_char3, replacement:=Good_char, lookat:=xlpart, searchorder:=xlbyrows, matchcase:=false, searchformat:=false, replaceformat:=false
I manually rewrote this on iPhone while looking at the work computer so it might have an error but should be pretty close.

If you still need to do more work, the best general advise I can offer:
*Save your code as you work in Personal Macro Workbook so it’s available whenever you need it without finding the file you saved it in.
*Record a macro of you cleaning it up, then you can convert this code to repeatable actions. Like if you always delete columns B, C, E. You can have this in your cleanup macro.

This is all MCode, yeah? Or is this VBA? I've really only used Power Query to emulate SQL (that's my real fuckin' jam) for purposes of like, customer inventories and the like so I'm not familiar really with the code - of course with data this lovely, I don't even want to begin trying to load it into a db until it makes something resembling sense. I do have a personal macro that removes the columns I don't need (I really am only concerned with 4 of them - Service Description, Service Code, Svc Quantity, and Total List Price) as well as the empties so, I have the start of something.

I actually brought the CSV version of the export over to the python thread a while back and I got essentially the same reaction to that: 100% :wtc: so, yeah. Fun. It looks a bit like this:

code:
Something,,,,,,somethingelse,,,somethingmore,,,,,,,evensomethingmore,,,,,,,,,,,,,,,,,,,,,anotherthing
I'm gonna give this code a shot in PowerQuery and see where it gets me - thanks a ton!

Lib and let die
Aug 26, 2004

Fun! I've never done VBA before. I was thinking of grabbing a Udemy class on it, I think I will next time they have a sale. You talked me into it.

Lib and let die
Aug 26, 2004

Wiggly Wayne DDS posted:

do you have a rough example file? if i had to deal with that using only excel first thought would be to make a parser sheet pointing at the downloaded file as a named range then try match(1,index(column_name)) for each uniquely named column. i gather the data is structurally under the column but messes up when you're copy/pasting. if you threw together a naive parser for the first 100 rows it'd give you a more set table to copy out of and filter out the empty rows

running off to vba when you could abuse formulas seems rather common, what is their csv format fuckup that excel is having issues with it anyway?

I definitely wanted to over-engineer a solution.

If you open the csv in Excel like the vendor intends, it actually displays almost exactly the way I wanted to Rube Goldberg it to.



Ain't that something. I was dead set on using it like this:



Still probably going to take a VBA course, another handy skill to have I'm sure

Lib and let die fucked around with this message at 22:51 on Dec 16, 2020

Lib and let die
Aug 26, 2004

Is there a way to get Excel to stop doing..."intelligent" things like trying to determine the format type of data you're doing a paste values on?

Specific example: I'm pasting 4 values from a webpage into an excel sheet. Product name, version, quantity, and a 4th field that's useless that I end up deleting but have to copy over initially so the formatting stays right.

The version field can contain a 7, an 8, or an "8/7" because the license works with both versions of the software. Excel thinks I'm trying to paste in 7-Aug. Ok, so format cells, general, no, that doesn't work, format cells, text, and...Excel converts it to a loving unicode date so I've got to go back in and manually enter "8/7"

Yes I can find/replace after the fact, but, I don't want to.

E: specifically a "now and for ever" solution, not something I have to pre-configure every time I make a new work book

Lib and let die fucked around with this message at 20:30 on Dec 30, 2020

Lib and let die
Aug 26, 2004

Big Bad Beetleborg posted:

A bunch of genes were renamed earlier this year because Excel kept reading them as dates. It was agreed to be easier to change a what a whole branch of science referred to things as than get Excel to stop doing that.

Well it's nice to know this isn't a problem relegated to morons like me. Really highlights the extremes of the love/hate relationship I have with Excel.

Lib and let die
Aug 26, 2004

me your dad posted:

Thank you both - I haven't yet had a chance to try these solutions. I grossly underestimated the amount of data per file. It's more like 45k to 55k records across six files. I considered today that our email platform provider (Salesforce) may be able to assist us with this from their back end of things. Before I try going forward with the unwieldy process of managing so much data in Excel, I am going to see what they come back with. I opened a case with their support team earlier.

I know this is the Excel thread, but have you considered/are you familiar with using SQL? Having both SQL and Excel familiarity this is something I might try and figure out with PowerQuery, even. The Excel gurus here probably have a better idea how PowerQuery could pull it off, but if you wanted to approach it from SQL side, I'd probably create a table from each file and join each table on mailer1.emailaddy=mailer2.emailaddy and then count either all of the null values or all of the non-null values for each returned row.

Lib and let die
Aug 26, 2004

Whybird posted:

Wouldn't it be better to create a single table with emailaddy and datereceived columns, and then just do
code:

Select Emailaddy, Count (*) 
from table 
group by emailaddy

Probably, but there's nothing I enjoy more than over engineering a solution.

Lib and let die
Aug 26, 2004

It's been a while, but I had a workbook with a power query in it that would update based on the value of a specific cell in the workbook (I'd enter a contract number, refresh data, and the query would refresh to the external data source of [contract number from cell].xlsx and display it in a format that was easier for me to copy and paste into the renewal quote). Perhaps something like that would work, where your variable would be the field where your value picker is?

Lib and let die
Aug 26, 2004

Kibayasu posted:

Is there some way to remove trailing spaces from cell values without using the TRIM function? I'm going to be sent a monthly table and one of the columns I'm going to be using as a look up reference seems like it has to have 10 characters in it and it fills those characters with spaces if there wasn't 10 text characters entered. The problem is I'm generating the same references in a summary table on my end but without the trailing spaces. If all the references were the same length I could find/replace the number of spaces but they won't be. It wouldn't exactly be hard either to just insert my own column into the table, use TRIM from top to bottom, and reference the new column instead so I'm mostly just wondering if there's some easy method I don't know about.

Are there spaces IN the data or just at the end? If there are only trailing spaces you can use text to cells under the data ribbon and set the delimiter character to space.

Lib and let die
Aug 26, 2004

I don't know that this is the best thread for it, but since Excel is so often used for working with csv files...is there a better tool? Or a specific set of options I can turn on to have Excel not do any of its allegedly helpful things with say, date and time stamps?

I'm talking 500,000+ rows of data, so a notepad++/general IDE is out because I need the cell-based display, but good lord Excel really needs a very basic no-frills raw csv mode or something.

Lib and let die
Aug 26, 2004

Wandering Orange posted:

Before pasting CSV data or otherwise importing, select the entire sheet and change the number formatting to Text?

And then when I re-open the file to work with it, I have to convert it all to text again else it saves dates in whatever hosed up format it's decided is best. I suppose I could macrofy it, but then I have to have a workbook with my macros in it open when I'm working with the files too...It's absolutely infuriating. I'm close to giving up on Excel entirely and seeing if Google Sheets handles them any better - I've tried about everything I can think of with Excel (hence turning to my fellow goons for advice)

Lib and let die
Aug 26, 2004

It's just raw UTF-8 CSV files, opening them from explorer. My little machine with 16gb or ram would choke to death copying....

Lemme check real quick....225x24000 out of a csv file and pasting values...back into a csv file.

If I had my way, we'd just be getting full SQL dumps to load and work with but....

Yes, that's 225 columns of data. Exceed Beyond is....silly.

Lib and let die
Aug 26, 2004

Wandering Orange posted:

Oh god you're dealing with that sort of stuff. Yeah I would be using SSIS/SQL or even Access if I had to as Excel is not going to be friendly.

It's both shocking and completely unsurprising that the companies that do business with Nonprofits will do anything and everything the can to prevent customers from leaving their jank-rear end systems, up to and including making the export process as difficult as possible.

Lib and let die
Aug 26, 2004

I don't understand lambdas in python, and I don't understand them in excel. why create a function that's just getting thrown away???????

Lib and let die
Aug 26, 2004

Harminoff posted:

Is anyone watching Excel Esports on ESPN?

https://www.youtube.com/watch?v=x1RVNGDSdw4

there's no way this is a thing

there's no way i'm going to watch this

why am i watching this

Lib and let die
Aug 26, 2004

Regex and Xlookup?

https://www.youtube.com/watch?v=kCzSiOHj_lw

Lib and let die
Aug 26, 2004

This is super weird - I can't get XLOOKUP to work at all ('Unknown function', but if I just do XLOOKUP() it tells me it's expecting between 3 and 6 parameters) but LOOKUP seems to function like an XLOOKUP does? Am I missing something?

e:

Lib and let die fucked around with this message at 22:43 on Aug 25, 2022

Lib and let die
Aug 26, 2004

Strong Sauce posted:

https://workspaceupdates.googleblog.com/2022/08/named-functions-google-sheets.html

Rapid Release and Scheduled Release domains: Gradual rollout (up to 15 days for feature visibility) starting on August 24, 2022

:argh:

Lib and let die
Aug 26, 2004

Jack the Lad posted:

Is there a way to write the name of a named range as text in a cell, then pass the contents of that cell to a formula that parses it as the named range?

For example I want to put BobsFaveFoods in A1 and then use =SUMPRODUCT(COUNTIF(B2:B10,A1)) instead of =SUMPRODUCT(COUNTIF(B2:B10,BobsFaveFoods)) so that I can have this across a bunch of rows/tabs and change which named range I'm referencing on the fly if possible.

Really stumped on this but probably just being stupid.

INDIRECT is probably the function you want, it can pass the value of a cell along as part of a formula

Lib and let die
Aug 26, 2004

Jack the Lad posted:

Is there a way to write the name of a named range as text in a cell, then pass the contents of that cell to a formula that parses it as the named range?

For example I want to put BobsFaveFoods in A1 and then use =SUMPRODUCT(COUNTIF(B2:B10,A1)) instead of =SUMPRODUCT(COUNTIF(B2:B10,BobsFaveFoods)) so that I can have this across a bunch of rows/tabs and change which named range I'm referencing on the fly if possible.

Really stumped on this but probably just being stupid.

a little more info on the INDIRECT function:

I set up a basic sheet and A1 to BobsFaveFoods and set the named range to B2:B10, the original formula functions as intended, but it also works as such:
code:
=SUMPRODUCT(COUNTIF(B:B,INDIRECT($A$1)))


You can even get really fancy and create formulas from multiple indirect cell references like this:


e: in your case you might want to use something like
code:
=SUMPRODUCT(COUNTIF(INDIRECT($A$1),INDIRECT($A$1)))

Lib and let die fucked around with this message at 03:02 on Nov 8, 2022

Lib and let die
Aug 26, 2004

kumba posted:

word to the wise: indirect is a volatile function which means it has to recalculate every time any cell in the entire workbook is changed

indirect works just fine for small things, but as soon as your workbook starts growing it's time to look for a better solution

Yeah INDIRECT is one of those functions where if you find yourself using it a lot, you may want to consider using like...an actual database.

I work with a data conversion workbook that uses INDIRECT in a lot of places and it's really slick in the sense that holy poo poo a google sheets can do that but I also have to force some pages to update manually by deleting a row, hitting undo, and then sheets will recalculate everything (I don't know if F9 would behave the same way in Excel, in Sheets there doesn't seem to be a manual "refresh all my formulas NOW" option)

Lib and let die
Aug 26, 2004

Here's an incredibly dumb question:

Can I resize all my columns by just the length of the column header? I seem to recall doing that somehow in the past but I can't figure it out now.

grumble grumble loving salesforce export files

Lib and let die
Aug 26, 2004

esquilax posted:

Highlight the cells you want to fit the columns to.

Home->Cells->Format->Autofit Column Width
Alt->h->o->i is the alt key shortcut

I think it messes up if you use word wrap.

oh thank gently caress, i knew it was something incredibly simple but i couldn't even mangle a decent google search to find it lmao

Lib and let die
Aug 26, 2004

Small Excel Questions Not Worth Being Answered by a Human

Adbot
ADBOT LOVES YOU

Lib and let die
Aug 26, 2004


Does native python support in Excel going to be considered some sort of PED for this type of stuff?

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply