Code Monkey home page Code Monkey logo

r_functions's Introduction

r-functions

================= is a collection of R functions I wrote to speed up my daily workflow. Thats why they do not relate to a common working field such as functions in a package would.

collapseGraph

this function takes an igraph graph object and collapses it by a given node attribute (byAtt). The agg-option allows to reduce certain node attributes by functions as for example sum,mean,median,min,max for numeric values or paste, unique etc. for strings.

domain

tries to identify the domain of a given set of urls. The result is not just the first part of an URL but the "real" domain. this returns derstandard.at for http://m.derstandard.at/xxx/yyy/zzz and not m.derstandard.at therefore a list of registered topleveldomains (tlds.txt) is used. appending toplevel domains to this list allows you to identify subdomains of services. for example adding wordpress.com will return supersambo.wordpress.com for http://supersambo.wordpress.com/p=123

dupurls

means duplicated urls. tries to identify different urls which lead to the same contents. This is done by searching for different patterns. for a vector of urls it returns a vector of ids of the same length. if two urls get the same id the function identified them as leading to the same content. This is working quite well but has to be checked manually. It is also very slow when it comes to tens of thousands of urls.

overview

tries to identify overview page in a vector of urls based on certain patterns

w_clean

simply counts occurence of words in a string and returns a data frame.

elAtt

a function for igraph objects which matches nodes attributes with edgelist column. The function returns a data.frame or a new igraph object with the corresponding additional edge attributes

bridgeness

this function computes a bridgeness score for graphs with a given community partition. It's based on what Tamás Nepusz one of the creators of igraph suggested here. It takes: *g igraph object *att the name of the vertex attribute containing the community membership *mode the neigbhorhood mode c("in","out","all"), which is directly passed to igraphs neighboorhood function *calc the type of calculation to be done c("membscore","expent"), where membscore (for membership score) is supposed to be the first option you suggested and expent (for expontiated entropy of the memebership score vector) which hopefully implements your second suggestion

it returns a vector of the length length(V(g)) with the respective bridgeness score for each vertex.

citGraph

converts an unweighted directed graph into an weighted undirected graph by calculating bibliographic coupling or cocitation for each pair of nodes. This is similar to a onemode projection of a bipartite graph, but in this case the input graph is onemode as well. Input is given int the form of an edgelist and a nodelist just like this is used for igraphs graph.data.frame() function.

writer

writer is just a shortcut for writing dataframes to csv. It just takes a dataframe and saves it by its own name. It saves to exports/ if available (that's my normal workflow) and if not directly to the working directory. opts are row.names=FALSE, sep=";"

##most was originally made to return the element of a vector with the most occurrencies. It's now a little bit more complex and lets you adress all elements in a sorted dataframe. The default output is the element ("which") with the most (nr=1) occurrencies. but its also possible the adress the elements name, its number of occurencies and the fraction of the whole vector most(x,nr=1, c("which", "freq", "perc"))

##maintext is basically an interface to Full-text Rss service from FiveFilters.org installed on a private server. In this case the function is used to retrieve the main text of pages with articles on it. Its not intended to work as rss-feed creator. It takes an *inputURL: url of which the text is to be retrieved *host: baseURL of the specific installation. *parsed: if result should be a parsed hmtl object or a character string

*the required apikey should be saved in a text file and the location of the latter hast do be specified it returns Title, Text, Author, Date and the Hyperlinks found in the text

##geo_function is an interface to yahoos placemaker api.

##panoramio_function is an interface to the panoramio API. Allows to get photos of places by their respective longitude and latitude (min, max). Returns a data.frame with urls an other info. Api is very simple allows 100 000 calls a day.

##colorbright_function lets you change brightness of a hexacode color by a provided factor (higher than 1=brighter. Smaller than 1=darker). Note that this function is not very sophisticated. It manipulates RGB Integers and controls so they do not pass 255.

r_functions's People

Contributors

supersambo avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.