-
• #27
What are you trying to do?
-
• #28
Make pretty pictures.
I have a dataset of around 600k individual words, and their frequency of use in two corpora. I'm comparing the two (one corpus is 'correct' the other is not). I wanted to show how much corruption there was in the wrong corpus by visualizing the entirety of it (a box) with histograms on the bottom-left and top-right showing false positives and false negatives eating into the 'clean' data.
I could just say the data is x, y% are false positive, z% are false negatives. But it's just not the same, is it?
-
• #29
Could you make a box whisker plot work for you?
I dunno. It's quite large.
-
• #30
Probably... But is it pretty? I'm giving a talk to a humanists and generally box plots scare/confuse them.
-
• #31
What about a pyramid chart?
Vertical axis in the middle. Usually used for population but you could show differences . -
• #32
What would you suggest would be the best tool / language to learn in order to improve one's skills in data acquisition / manipulation and reporting?
-
• #33
Where do you want to work?
-
• #34
Python and R are robust enough to not limit you. Python is probably more used in the acquisition and reporting stages; R more so in the manipulation (although I'm seeing more and more familiar, R-based, visualizations in papers). But, as Chalfie is insinuating, depends on what you're doing.
Python is much more more useful generally.
-
• #35
R and python are the two go to languages.
I barely know any. I work, badly, in excel because I only do simple stuff (but I try to make it look nice, use good principles).
Biologically and statistics, I'd say r.
Epidemiology and public health was Stata, now r.If you're looking to handle big stuff then you'll need to loop, list, iterate, Markov, learn, etc.
If you want to do pretty pics, just get your head around tidyverse in r.
-
• #37
Python, on a cursory glance looks as impenetrable as any other programming language
-
• #38
At some point you may have to language.
If not,tableau, power Bi, excel.
-
• #39
How's progress, did you get anywhere?
It took me a couple of hours to get facet charts... -
• #40
I tried to post the results. I'll try again later. They're not pretty but they're "correct". I think more time cleaning them up would be worth it.
-
• #41
Do you mind me asking what the corpus is? I might attempt something similar at some point so this looks really interesting.
Although, I can't really help with the visualising, I'd probably want to try it in Power BI or Splunk or something but I don't even know how we'd structure the data yet.
-
• #42
I've done a bit of Qlik Sense before, might revisit that for visualisation
-
• #43
Happy to chat about it. Corpus is duplicates of certain Eighteenth century texts, ocr'd and keyed, but this is a subset of a larger corpus which is only ocr'd. We use the texts for "proper" historic work, but trying to use digital methods to do so. Stuff I'm doing now is trying to say something about the quality of the ocr'd corpus, and what methods may work best/worst with texts with OCR errors.
-
• #44
My company use Power Bi, which I can therefore have a licence for, but I use a Mac by preference - is there a cloud based variant that I can use, or am I into running parallels (something that I've never had any success in making work previously)?
-
• #45
Sort of. You can do basic editing of existing Power BI files in the cloud but for building data models and creating the initial visualisations, you will need a windows desktop.
-
• #46
Bugger. Who would like to come to my flat and create a functioning windows desktop in a window on my MBP? I can make tea, proffer home made crumpets?
Alternatively, I have a powerbi file that I would be happy* to send to one of you data-viz chaps to work your magic on/export to .csv?
*Overjoyed, in fact
-
• #47
I'm on slacker time at work. Happy to help.
-
• #48
I have some data that I'd like to make a little more impactful than just Excel charts.
What can I do to make this more intuitive? Could I combine these in a way that would make sense, or would it be too busy/confusing?
2 Attachments
-
• #49
Little racecars on the far right of each line?
-
• #50
Pyramid chart.
I did it. I had to use geom_rect, extract coordinates from the dataframe for each feature, sum them (for the x axis), and then draw them individually.
And it's ugly.