You are reading a single comment by @arrowplum and its replies. Click here to read the full conversation.
  • I need to teach myself a bit of programmering for some research I'm doing. It involves using the Oxford Dictionary of National Biography to, first create a list of people who share one particular attribute, then find links (again, with the ODNB) between those people by looking for references to each name in the list on each persons biographical entry.

    I think this is pretty simple stuff, and could be done by hand were I do enjoy tedious as fuck work. But I don't, so I'm going to allow myself the time it would take to do it by hand to learn how to ask computers to do it.

    So. Advice?

    Python with the "natural language toolkit" (I vaguely have some idea of what that means) seems to be potentially what I want. Someone else mentioned R (apparently quite useful in social science research generally).

    Assuming one or the other, can anyone advise me on how to go about doing (i.e., learning) this? Particularly good DIY teaching methods, for example.

    kthxbai!

  • I think this might not quite be the right thread (but I don't know the right programming thread).

    I haven't ever read the ODNB so I don't know what the data looks like, but to understand the task, you want to:

    1. scan the entries of people for a particular attribute
      1. is the attribute you are looking for in a regular place? Or are you searching the text itself?
    2. build a list of those entries.
    3. for each of those entries scan their text for a name on your list
    4. build a graph of the relationships between the people.

    I don't think you want the natural language bit (if the above is correct). That would be more for parsing the sentences and trying to capture their meaning etc. Depending on the size you may want a database of some sort, but I bet you could do all of this in memory instead.

    I think for this sort of task any language with good string handling is what you would want. Perl and python both fit that bill. So does javascript for that matter. I don't think you want to mess with R (in fact a lot of people use python first to get their data set up and then use R to process it).

    If you have never touched programming at all it might be a little hard for you. Going through and generating the first list is pretty straight forward. Finding the links between entries would be fairly straight forward. If you just want each entry to have a list of related entries that should be ok, but you do need to understand how to create at least objects and arrays.

    I think javascript might be your best bet. There are a ton of intro tutorials and you could quick-and-dirty make something that works.

About

Avatar for arrowplum @arrowplum started