|
Is there enough interest here for a dedicated Data Visualization thread? EDIT: Yes and here it is. PM me if you want your message edited or removed. Stuff that isn't Software Data Stories is the best (and possibly only) dedicated data vis podcast. visualising data is an insanely dense collection of project links with interesting commentary. Check out their yearly reviews: January to June 2011 July to December 2011 January to June 2012 July to December 2012 January to June 2013 July to December 2013 January to June 2014 Data-vis-jobs is a relatively small Google group that attracts great job/freelance/fellowship opportunities. The work of designer/statistician Edward Tufte is regularly hyped. I can recommend his first book. ItBurns posted:I'll second Tufte's books, not that they need it. I have the one in the OP next to me right now. If nothing else they're beautifully illustrated and you can put them on your coffee table. MrMoo posted:I like to think Reuters Graphics has awesome charting for all its TV, magazine and digital work, but I not aware of any of it and certainly working at Reuters we have only access to F/OSS solutions like D3 Software D3.js is a powerful web framework for interactive visualizations using JavaScript/HTML5/CSS. It covers almost all of jQuery's functionality, though the jQuery thread suggests learning D3 second and warns that D3 has more browser restrictions. This book is a very approachable D3/web design guide hosted for free. It was so good I bought the Kindle version. bl.ocksplorer.org is a great search engine for D3 functions that crawls the site where people host their D3 projects. These two videos give a good intro to D3. The first is from a design perspective. The second goes into detail on the point of D3 and is by a guy that absolutely has to be a Goon. MrMoo posted:The spiffiest software for data visualisation is Tableau with hosted and deployed models. Lumpy posted:Bokeh: http://bokeh.pydata.org is a data vis library that at least one goon works on. ItBurns posted:I'm using a couple things not mentioned. The first is Highcharts/Highstock. It seems to be super common as I've noticed it on a ton of websites in lieu of static images of simpler bar/line charts. In some ways it's a less flexible D3, but it's still highly customizable and the built-in types cover a lot of ground with new stuff being added pretty regularly. Open Questions Ahz posted:I'm working on integrating data viz into my app and was thinking about the current climate for client-side processing vs. server side. ItBurns posted:I'm most curious about what people are doing to generate/serve data. Being an R guy I'm doing 99% of my analytics in R and producing JSON/CSV files at regular intervals. It's not real-time and it's a real pain for anything where the user has a lot of freedom. Plugs Post your website or portfolio and show off to other Goons Analytic Engine fucked around with this message at 05:35 on Aug 20, 2014 |
# ? Aug 13, 2014 02:16 |
|
|
# ? Apr 18, 2024 18:19 |
|
I guess we'll find out!
|
# ? Aug 13, 2014 16:48 |
|
I'd be interested in some good resources.
|
# ? Aug 14, 2014 06:28 |
|
Ahz posted:I'd be interested in some good resources. Edit: I added stuff to the OP. Analytic Engine fucked around with this message at 07:09 on Aug 14, 2014 |
# ? Aug 14, 2014 07:06 |
|
I'm working on integrating data viz into my app and was thinking about the current climate for client-side processing vs. server side. I can order and group my data as needed for server-side. But I'm wondering if it's worth it to save some processing on my end and put it on the client if most clients these days can handle the calcs averaging datasets of maybe 10,000 rows by 5-10 columns. But them I'm also increasing my transfer, so I just don't know. Also, are some types of visualizations more/better suited to mobile vs. desktop?
|
# ? Aug 14, 2014 07:38 |
|
Ahz posted:I'm working on integrating data viz into my app and was thinking about the current climate for client-side processing vs. server side. These are great questions and I don't know the answers, hopefully more Goons will check out the thread. I've found interactivity to be harder on mobile since there's no hovering and finger gestures already navigate the browser. I had a fiddly context-sensitive menu that was unusable on mobile so I added a separate set of big touchable buttons.
|
# ? Aug 14, 2014 23:23 |
|
The spiffiest software for data visualisation is Tableau with hosted and deployed models. For some lame reason a competitor Datawatch exists, there is a video attempting to explain the advantages. Generally around RAD or "exploratory analytics" which is hilarious as the former CTO announced they don't support that. I like to think Reuters Graphics has awesome charting for all its TV, magazine and digital work, but I not aware of any of it and certainly working at Reuters we have only access to F/OSS solutions like D3 Both Reuters, Bloomberg, The Times have job openings for data visualisation experts to munge Javasciprt, HTML, CSS for politics coverage, markets, and anything else. MrMoo fucked around with this message at 00:10 on Aug 16, 2014 |
# ? Aug 16, 2014 00:08 |
|
Bokeh: http://bokeh.pydata.org is a data vis library that at least one goon works on. Probably OP worthy.
|
# ? Aug 16, 2014 01:09 |
|
I love visualizations. However I my knowledge extends nowhere beyond using Python (pandas, seaborn, etc.) and I'm not very good at it. Would love to learn more in this thread.
|
# ? Aug 16, 2014 07:09 |
|
I'm using a couple things not mentioned. The first is Highcharts/Highstock. It seems to be super common as I've noticed it on a ton of websites in lieu of static images of simpler bar/line charts. In some ways it's a less flexible D3, but it's still highly customizable and the built-in types cover a lot of ground with new stuff being added pretty regularly. http://www.highcharts.com/ Example: http://www.gw2spidy.com/item/46741 Some other options for people who are accustomed to R might be Shiny or ggVis. Neither are very mature, but they're good for simple stuff if you aren't a web designer as they take the guesswork out of making things presentable. ggVis is really similar to ggPlot if you've used that package. http://shiny.rstudio.com/ http://ggvis.rstudio.com/ Leaflet is a mapping program similar to what you might get from Google maps, or exactly what you'd get on Craigslist (because they use it). Setting it up is really simple, but the actual business of mapping data is probably where it's going to get tricky for some people if you aren't familiar with projections. On the other hand, if you've ever said 'gently caress shape files forever' then this is going to be a breath of fresh air. http://leafletjs.com/ I'm most curious about what people are doing to generate/serve data. Being an R guy I'm doing 99% of my analytics in R and producing JSON/CSV files at regular intervals. It's not real-time and it's a real pain for anything where the user has a lot of freedom. Edit: I'll second Tufte's books, not that they need it. I have the one in the OP next to me right now. If nothing else they're beautifully illustrated and you can put them on your coffee table. ItBurns fucked around with this message at 21:44 on Aug 16, 2014 |
# ? Aug 16, 2014 21:30 |
|
Good idea for a thread!quote:Post your website or portfolio and show off to other Goons I've got nothing public to show, but I just threw this together as a tech demo for management last week. It's an example of call data for water related faults logged over an 18-month period, to show how we might visualise non-financial data. The slider at the bottom changes the selected month, mouse-over changes the info up top. Tools are pure js + telerik/kendo ui slider and treemap components, which were fairly easy to get going.
|
# ? Aug 18, 2014 06:04 |
|
This is only half as impressive without sound, but I've been developing an annotation/basic audio editing tool and I use d3 for drawing waveforms. (click for gfy) This sits a little outside of true data visualization, but having the capability to zoom, scroll, and mute the waveform in a meaningful visual way is super helpful.
|
# ? Aug 18, 2014 15:34 |
|
I should make a note about high-charts and in fact many JS chart libraries. We've had to abandon using them at work (Government science department in australia) because of its license. They've licensed it under a "creative commons non commercial" license which can't be used with the GPL or in fact most share-alike licenses because it restricts people from selling the end result, a restriction the GPL actually forbids. If your using GPL type libraries, don't combine them with highchart or other non-free libraries. Instead stick with either GPL or permissive MIT/BSD licensed libraries. edit: LeafletJS is awesome. We do some really seriously complicated GIS stuff here, like hundreds of layers with thousands of points and polygons being shat out of an oracle backend, and it handles it flawlessly. Its really good stuff. duck monster fucked around with this message at 04:47 on Aug 19, 2014 |
# ? Aug 19, 2014 04:45 |
|
visualising data is an insanely dense collection of project links with interesting commentary. Check out their yearly reviews of the data vis world: January to June 2011 July to December 2011 January to June 2012 July to December 2012 January to June 2013 July to December 2013 January to June 2014
|
# ? Aug 20, 2014 05:33 |
|
I like history of tree maps, never knew they were so complicated. The first problem when looking at a tree map is that the squares for equal values are not always the same size.
|
# ? Aug 20, 2014 18:53 |
|
YO MAMA HEAD posted:This is only half as impressive without sound, but I've been developing an annotation/basic audio editing tool and I use d3 for drawing waveforms. This is actually interesting to me, as I do a fair amount of work tooling around with the webAudioAPI, but I'd never though about combining D3.js and FFT data before, so there goes my next free weekend.
|
# ? Aug 21, 2014 11:15 |
|
Web Audio unfortunately wasn't quite viable when I started the project. Amplitude measurements are made on the server by SoX and stored in the database, audio playback is done with SoundJS, and d3.js manages the waveform. It's all kind of fake, but works pretty convincingly—muting is done client-side by temporarily killing the SoundJS playback volume and setting the d3 bars to 0, and then gets sent to Gearman on the server for a re-render in SoX (becoming a "true" muted wavefor).
|
# ? Aug 21, 2014 17:29 |
|
mortarr posted:I've got nothing public to show, but I just threw this together as a tech demo for management last week. It's an example of call data for water related faults logged over an 18-month period, to show how we might visualise non-financial data. The slider at the bottom changes the selected month, mouse-over changes the info up top. MrMoo posted:I like history of tree maps, never knew they were so complicated. The first problem when looking at a tree map is that the squares for equal values are not always the same size.
|
# ? Aug 22, 2014 00:56 |
|
After starting my career in Business Intelligence and moving into programming, hopefully I can be of some help in this thread! Qlikview and Tableau are the current front runners right now for amateur data analysis. MicroStrategy is lagging behind now, which is unfortunate but not surprising. Qlikview is a bit more complex in what it can do, including having an internal scripting language that can be run at execution time to load in data from databases, scrape websites, etc to update your dashboard. Tableau is better for on-demand stuff where your data has already been cleaned and you'd like something better than Excel for displaying it. The public version is nice; allows you to link and embed your dashboards online. Very exciting stuff there. If you're familiar with web development, I would say it might be better to use Qlikview and Tableau as prototyping, and then roll your own dashboards. Anytime you get to the point where you'd like to display up-to-date data to the public, unfortunately their license costs get very prohibitive for the average person. (If you're running an organization and have access to a large budget, then go ahead!) Highcharts and D3 are great for this. I haven't played with http://www.chartjs.org/ yet, but it might be something to check out if want something simple and don't want to deal with Highcharts licensing. ItBurns posted:I'm most curious about what people are doing to generate/serve data. Being an R guy I'm doing 99% of my analytics in R and producing JSON/CSV files at regular intervals. It's not real-time and it's a real pain for anything where the user has a lot of freedom. I have the most experience using Python and SQL to generate data, and then transforming it into JSON for Highcharts. There was a task that ran every hour that aggregated data in a stats database I built. Each page in the website would request a certain filter of that aggregated data, and then populate the charts with it. Strangely enough for all the stats work I do, I'm very naive about R, but I wonder if Shiny might be good for what you're trying to do?
|
# ? Aug 25, 2014 19:43 |
|
Nybble posted:I have the most experience using Python and SQL to generate data, and then transforming it into JSON for Highcharts. There was a task that ran every hour that aggregated data in a stats database I built. Each page in the website would request a certain filter of that aggregated data, and then populate the charts with it. Strangely enough for all the stats work I do, I'm very naive about R, but I wonder if Shiny might be good for what you're trying to do? This isn't all that different from what I have now, sans Shiny. The main issue with that approach is that I end up generating a lot of stuff that nobody will ever use, and that it's restricted to being x-hours old.
|
# ? Aug 26, 2014 02:27 |
|
nth-ing the Tableau love. We just started using it a couple months ago and it's been awesome for rapid development. What we use for most of our day-to-day stuff, though, is Splunk. The biggest downside is that it's expensive (the license is per gig of data indexed per day), but if you can afford it and have the resources to do dev work against it, it's extremely powerful. It might not be as flashy as some of the other solutions out there, but when anybody in your organization can run ad hoc queries against your indexed data, suddenly you have a lot more perspectives about what's important and what should be readily available, which turns into new dashboard ideas. Edit: I've been thinking about having our Splunk instance process data and dump it somewhere that I can pick it up with Tableau for modeling, but I haven't had time to start looking into it. It'd probably work OK, though, since I know it can output CSVs from scheduled jobs, and I imagine there's probably some way to have it write to a DB. Kreeblah fucked around with this message at 06:44 on Aug 26, 2014 |
# ? Aug 26, 2014 06:35 |
|
Yeah, I just tried Tableau for the first time today with the data I used in this example I posted earlier: That took me something like a day of data-wrangling and writing javascript, but gently caress me it only took about five minutes to get rougly the same result in Tableau. Dunno if I'd give tableau to all our users, but for the more clued up ones and for prototyping it looks pretty sweet. Are there any chart type add-ins you can get for it? I'd like something that could generate a streamgraph rather than just stacked area.
|
# ? Aug 26, 2014 09:48 |
|
Tableau has won over a few of the guys at work so I guess that's another vote for it. I was really hoping Bokeh or something like that in the Python world would be able to do interactive, linked-axis graphs (that is to say, be able to drag on an 'index' graph along the bottom to select area on a main plot, like google's finance display in flash) but I haven't seen a solid example of it. I would imagine D3 could, but that's Javascript and I do Python though if it's what we need then maybe I'll have to jump across. Anyone have any advice? Nam Taf fucked around with this message at 09:24 on Aug 27, 2014 |
# ? Aug 26, 2014 10:58 |
|
Kreeblah posted:What we use for most of our day-to-day stuff, though, is Splunk. The biggest downside is that it's expensive (the license is per gig of data indexed per day), but if you can afford it and have the resources to do dev work against it, it's extremely powerful.
|
# ? Aug 26, 2014 14:31 |
|
Kreeblah posted:nth-ing the Tableau love. We just started using it a couple months ago and it's been awesome for rapid development. Splunk is awesome. Sumo Logic is basically hosted Splunk (not as powerful, but affordable) and does some pretty visualizations. Grafana is a nice front-end for Graphite. Very pretty and very easy to use. Datadog is a hosted service for StatsD data, graphs, and some other things. I've been liking it.
|
# ? Aug 29, 2014 05:27 |
|
Tableau is pretty excellent, but the one major knock against it is that it's not great for working with data in context. You can build beautiful dashboards and visualizations, but the second you need to integrate it into a web page it becomes an exercise in futility, especially if the data is sensitive and/or you're dealing with some flavor of Tableau that's non-public. That said, we run our own Tableau server at work and I manage it. I'm not the best with Tableau, but I'm passable so I can lend my expertise if anyone needs it.
|
# ? Sep 2, 2014 13:34 |
|
Here's some amazing looking data visualization software one of the clients I work for may be implementing: http://www.datazen.com/ Pretty much priced for enterprise only though.
|
# ? Sep 3, 2014 17:06 |
|
Knyteguy posted:Here's some amazing looking data visualization software one of the clients I work for may be implementing: http://www.datazen.com/ Could you post a trip report if they end up going with it? The Win8 requirement sucks, but those dashboards look really nice.
|
# ? Sep 3, 2014 22:22 |
|
Kreeblah posted:Could you post a trip report if they end up going with it? The Win8 requirement sucks, but those dashboards look really nice. You bet. I'd say it's about a 90% chance they'll go with it.
|
# ? Sep 4, 2014 03:32 |
|
Ahz posted:I'm working on integrating data viz into my app and was thinking about the current climate for client-side processing vs. server side. I'm by no means an expert, but I've been using DC.js (D3 + Crossfilter). My current working prototype sanitizes data across monthly spreadsheets for the year to date, outputting a ~15,000 record, 7-column JSON file (just over 2 Mb). I then pull that down to the client, which does all the analysis. Currently that involves computing a half-dozen dimensions and producing interactive graphs. I very roughly benchmarked it at around a million various map-reduce calls. I've been really surprised by how quickly rendering completes. Testing on desktop and an iPhone 5 over 4G shows no noticeable processing overhead once the initial JSON file has been downloaded. I expect my app to grow by roughly an order of magnitude by the end, to somewhere in the order of 100k records. At this point I'm convinced that I'm far more likely to introduce memory leaks or use an inefficient algorithm before I reach the limits of the Javascript libraries I'm using. For comparison, the linked Crossfilter example uses a 230k record, 5.3 Mb dataset. It's amazing what you can do with Javascript these days...
|
# ? Sep 6, 2014 03:06 |
|
Maybe this is the wrong thread, but I want to visualize some data, the visualization isn't the problem (yet) the problem would be modeling data. I want to visualize accelerometer data as positions, (this data can be cleaned and improved with on device magnetometer, barometer, and gyroscope data), by integrating twice and doing more math. Normally this wouldn't be possible(remotely accurate) because the small % errors at each tick cause huge errors when integrated twice. However I know several facts about the movement I am trying to capture that might help me in modeling it. For example approximate distance and target location will probably be known ahead of time, and once moved to a location the device will return near to the starting point. Starting point is also 0,0,0. Is anyone familiar with work like this or could point to resources?
|
# ? Oct 13, 2014 04:55 |
|
A Kalman filter, perhaps?
|
# ? Oct 13, 2014 05:12 |
|
ohgodwhat posted:A Kalman filter, perhaps? Kalman came up, and I have started doing the research. A normal 2d free body might be a good start with an extra dimension for z added, and there are some good resources for that.
|
# ? Oct 13, 2014 05:45 |
|
I'm an idiot who usually uses code or code-esque things to make charts, graphs, plots, and the like, however, I want to like easier-to-use tools for (at least!) less complex visualizations of data; tools like Tableau. My stupid question is: mechanically, how do you use this thing? I mean, I must be dumb if I can't figure this out, but to be fair (to other people), I haven't really watched all of the videos and what not they've made (though I distinctly recall watching at least one or two). I've opened up and poked at a few of the sample workbooks, but I guess I'm ideally looking for a HOWTO type set of instructions as text. I have data in a variety of formats, but the de facto format I can almost always distill things down to is some sort of CSV/TSV, or fairly basic SQLite databases. For whatever it's worth, this is me using Tableau Desktop 8.2. EDIT: Oh, I should perhaps add, while this isn't the main thing I'm looking for, I'm also very interested in the R integration available in Tableau (8.1 and later, including the 8.2 version I'm tinkering with), if anyone else here has used that at all. minidracula fucked around with this message at 05:47 on Oct 14, 2014 |
# ? Oct 14, 2014 05:35 |
|
minidracula posted:I'm an idiot who usually uses code or code-esque things to make charts, graphs, plots, and the like, however, I want to like easier-to-use tools for (at least!) less complex visualizations of data; tools like Tableau. My stupid question is: mechanically, how do you use this thing? In my experience, the R integration was fairly disappointing. One of my old coworkers used it to make a fairly interesting dashboard but it seemed like a lot more trouble than it's worth. If I were to try and do what she did, I think I would just start from the R direction and use Shiny. Where Tableau really shone in my organization was the integration with Active Directory. We had a fairly complex set of security requirements for who could see what data and by using row-level permissions and the USERNAME (or whatever) calculated field as a filter, it was a fast and cheap way to implement it in the visualizations. Here is the basic workflow, based on my experience: 1. Connect to your data. I used CSV files for rapid stuff or connected to SQL Server if it needed to be live. If you're going to be using the connection a lot or sharing it amongst people, publish it to the Tableau Server (if you have access to one) and extract it - you'll get huge performance gains although the extracts can take forever to refresh if you try and put too much data in them. 2. Start laying out individual worksheets. These are individual displays that make up the components of a dashboard - line charts, histograms, whatever. See below for more on this. 3. Once I have my worksheets created, start assembling them into dashboards. These will be what you'll publish and your user's will actually interact with. I usually wait until this point to start tying together the worksheets with action filters or anything that needs to operate across worksheets/dashboards. 4. Publish to the server - if you have the ability to have an admin create separate 'sites' it's super useful to have a testing server and a live one. Creating individual worksheets: I honestly think the best way to learn this is just to watch videos and mess around until you get results. But basically, the interface consists of shelves. You can place fields from your data (or calculations) onto these shelves. Once on there, they inform Tableau about how to display the data. So if you had a categorical variable on the column shelf, it would divide your worksheet into columns based on those variables. Similarly, if you placed it instead on the color shelf, it would change the color. I find it really helpful to think about the underlying SQL queries that Tableau is running (you can actually find these in a log file if you really want) - the values you're displaying are the SELECT and various aggregations and the structural shelves are what the data is being grouped by. If people would find it useful, I will post my strategy for developing table calculations when I have time to type it up. I think that Tableau is pretty awesome and intuitive until you start using TCs and then the learning curve goes vertical and your life sucks. I was in a situation where we had limited ability to change the structure of the data we were visualizing so I spent way too much time wrangling data into the proper shape using TCs so I think I'm fairly good at it. And having a basic template for setting them up is enormously helpful, in my opinion.
|
# ? Oct 14, 2014 06:20 |
|
Flash Gordon posted:If people would find it useful, I will post my strategy for developing table calculations when I have time to type it up. I think that Tableau is pretty awesome and intuitive until you start using TCs and then the learning curve goes vertical and your life sucks. I was in a situation where we had limited ability to change the structure of the data we were visualizing so I spent way too much time wrangling data into the proper shape using TCs so I think I'm fairly good at it. And having a basic template for setting them up is enormously helpful, in my opinion. I dunno about anybody else, but I'd really appreciate this. I ran into a roadblock I couldn't figure out trying to use it with our JIRA DB (namely, for a particular project, there are two custom datetime fields that I need to use to calculate a duration; the duration calculation itself would be doable, but the normalized form of the DB made figuring out how to define my data source and calculated fields to get the datetime data in the first place kinda brain-breaking). I think I'm just going to write an app to extract data from their API and turn it into a TDE, but it'd be nice to have a better idea of what the alternatives would be.
|
# ? Oct 14, 2014 07:57 |
|
Loving all this stats and graphing stuff; I've had good times with Highcharts in the past, and would love to use D3 at work, if only IE8 wasn't such a pressing requirement. Not stats related, but a new system requirement crossed my desk for a workflow builder, where you start at Point A and basically work through a series of defined tasks until you hit Point X. The guy doing the UI obviously wasn't feeling too confident about what we could do, because what he drew up was basically an HTML table with vertical/horizontal arrows between cells. You click a cell to edit it, and you pick the "parent" activity from a drop-down list contained within that cells settings. Bleh. A bit of Googling and a few hours (days) later, I was able to put out a prototype using PlumbJs within AngularJs. Now we can drag-and-drop node types from a palette into a workflow canvas, and drag-and-drop connections between nodes to establish a workflow between them. I basically ripped off Unreal Engine 4's Blueprint and I'm loving the results. As far as visualising the execution path goes, it's miles ahead of what we could have ended up with.
|
# ? Oct 14, 2014 11:10 |
|
Yay, cool! This is what I want to do my upcoming project in. I have a few questions on doing dataviz: How do I know that a problem I'm thinking of can have dataviz applied to it? Like, how can I pick up on something possibly being well-explained using data visualization? I don't know if that makes much sense - basically, how can I "detect" that a dataviz application would be helpful? Where does everybody get their data? I've found data.gov, and that seems to be the best for public, governmental data, but I've had trouble finding other sources for stuff like Twitter or video games. Speaking of Twitter, how does everyone do their data collection on it? One of my project ideas involved analyzing #GamerGate tweets and comparing the word usage of the trolls vs. their targets. However, I've had trouble getting a large enough data set to sample - 1500 tweets max Where else can I go to read more about dataviz practices/how to get better at it?
|
# ? Oct 14, 2014 14:21 |
|
I saw a link to http://www.reddit.com/r/dataisbeautiful/ today, and there are some nice visualizations there.
|
# ? Oct 14, 2014 19:11 |
|
|
# ? Apr 18, 2024 18:19 |
|
Possibly bizarre question that might be better posted elsewhere, but: does anyone have any recommendations for graphing/plotting/visualization libraries for Fortran that aren't DSLIN? Ideally something free for commercial use as well (which DSLIN isn't), e.g. MIT or BSD licensed. Hrm, now that I think of it, maybe PLplot is worth using for this... written in C, but with a Fortran interface (among others)... Still, willing to entertain other suggestions.
|
# ? Oct 14, 2014 21:07 |