Code Monkey home page Code Monkey logo

bdaca's Introduction

Course Materials Big Data and Automated Content Analysis

This repo contains the course materias for the course Big Data and Automated Content Analysis, which I teach at the Research Master program of the Graduate School of Communication Science, University of Amsterdam. It contains several elements:

The book

The folder 'bdaca-book' contains the LaTeX source of the tutorial 'Doing Computational Social Science with Python: An Introduction'. You can cite it as: Trilling, D. (2016). Doing computational social science with Python: An introduction. Social Science Research Network. doi:XXXXXX

bdaca's People

Contributors

annekroon avatar damian0604 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bdaca's Issues

change statsmodels example

in boek en .ipynb

best practice:

import statsmodels as sm
import statsmodels.formula.api as smf

mod = smf.ols(formula='Lottery ~ Literacy + Wealth * Region', data=df)
res = mod.fit()
print(res.summary())

new literature

New book on Data Science with Python:
[https://github.com/jakevdp/PythonDataScienceHandbook/blob/master/README.md]

ipython-versie

make sure that people check whether they have IPython>5

zo niet:
sudo apt-get remove ipython3
sudo pip3 install ipython

alternative for /text() in XPATH

When there is a line/paragraph page within the results of an XPATH, the /text() function might not function properly, as it sees each part as a seperate element.
Fix:
leave away the /text() in the xpath itself and use the .text_content() method later on:

reviews = tree.xpath('//div/div/div[2]/div[*]/div[2]/p[1]')
print (len(reviews),"reviews scraped. Showing the first 60 characters of each:")
i=0
for review in reviews:
    print("Review",i,":",review.text_content())
    i+=1

Add this as alternative solution to XPATH-chapter

verwarring bash/python

  • verschillende kleuren voor bash en python code
  • overzichtje maken van commando's die in de bash moeten worden uitgevoerd

changing keyboard layout in lubuntu

explain what to do if accidently chosen wrong setting

sudo dpkg-reconfigure keyboard-configuration

Model: 'Algemeen 105 toetsen internationaal
Oorsprong van het toetsenbord (Engels VS)
Indeling: Engels (VS)
standaard
geen samenstelling
X-server: nee

taal (Nederlands) --> maakt niet uit

add tweepy-example (streaming)

github-link

eerst:
sudo pip3 install tweepy

en dan de minimale code:
(maar je wilt wsl niet printen maar naar een bestand schrijven)

from tweepy.streaming import StreamListener
from tweepy import OAuthHandler
from tweepy import Stream

consumer_key = "..."
consumer_secret = "..."
access_token = "..."
access_token_secret = "..."


class StdOutListener(StreamListener):
    """ A listener handles tweets that are received from the stream.
    This is a basic listener that just prints received tweets to stdout.
    """
    def on_data(self, data):
        print(data)
        return True

    def on_error(self, status):
        print(status)

l = StdOutListener()
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)

stream = Stream(auth, l)
stream.filter(track=['basketball'])

SML-voorbeeld p. 84

when reading the data, labels are integers: 1 and -1
when trying to calculate precision, the code uses strings '1', '-1'
--> this cannot work, change either first to strings or second to ints.

nltk POS tagger

we geven twee taggers, maar eentje blijkt niet meer te bestaan. checken

p. 46 command color

head -5 sanders.tsv has the color of python, should be the terminal color

best (from your computer),
theo :)

add info on mounting host system

To mount a directory from your "real" computer, the so-called host, in the file system of the VM, you can do the following:
Select the folder you want to share in Virtual Box (see screenshot). Select permanent, but NOT auto-mounting (screenshot).
Remember the name of the share (in my example, Desktop)

Within the VM, now create a folder where you want the content of your host folder to appear. For example, this could be:

mkdir /home/damian/myrealcomputer

Now, use the following line to mount the share (in my example, it is called Desktop) to that folder:

sudo mount -t vboxsf -o uid=$UID,gid=$(id -g) Desktop /home/damian/myrealcomputer/

addfolder

whatsapp-exercise

zie ook mail.
oplossing:

import csv

data = open('_chat.txt').readlines()
timestamps=[e[:17] for e in data]
text_unparsed =[e[19:] for e in data]
namen = [e.split(':')[0] for e in text_unparsed]
text = [":".join(e.split(':')[1:]) for e in text_unparsed]

output = zip(timestamps,namen,text)
with open('output.csv',mode='w') as fo:
  writer=csv.writer(fo)
  writer.writerows(output)
'''

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.