Comments (12)
from qframe.
from qframe.
I am on go version go1.11 linux/amd64 - thats pretty new isn't it?
from qframe.
from qframe.
Ah yeah you are right, I thought i was running 1.11 on my local machine, but I have upgraded the server to match and the install works.
I am trying to figure out how to save the qf to a MySQL, that's not been tested, right? So maybe I should switch to a postgres for better compatibility?
from qframe.
from qframe.
I just did a quick test against MySQL to confirm and everything is still working. Note I would recommend MariaDB over MySQL if that is an option for you. If you could share a little bit about your use case I could suggest a solution that might fit.
To test I started a MySQL container with Docker docker run --rm -ti -e MYSQL_ROOT_PASSWORD=qframe_test --net host mysql:8
and then modified a variation of the sqlite example from the readme:
package main
import (
"database/sql"
"fmt"
_ "github.com/go-sql-driver/mysql"
"github.com/tobgu/qframe"
qsql "github.com/tobgu/qframe/config/sql"
)
func main() {
db, _ := sql.Open("mysql", "root:qframe_test@/mysql")
db.Exec(`
CREATE TABLE test (
COL1 INT,
COL2 REAL,
COL3 TEXT,
COL4 BOOL
);`)
qf := qframe.New(map[string]interface{}{
"COL1": []int{1, 2, 3},
"COL2": []float64{1.1, 2.2, 3.3},
"COL3": []string{"one", "two", "three"},
"COL4": []bool{true, true, true},
})
fmt.Println(qf)
tx, _ := db.Begin()
qf.ToSQL(
tx,
qsql.Table("test"),
qsql.MySQL(),
)
newQf := qframe.ReadSQL(
tx,
qsql.Query("SELECT * FROM test"),
qsql.Coerce(
qsql.CoercePair{Column: "COL4", Type: qsql.Int64ToBool},
),
qsql.MySQL(),
)
fmt.Println(newQf)
fmt.Println(newQf.Equals(qf))
tx.Commit() // save your changes to the db
}
Output:
COL1(i) COL2(f) COL3(s) COL4(b)
------- ------- ------- -------
1 1.1 one true
2 2.2 two true
3 3.3 three true
Dims = 4 x 3
COL1(i) COL2(f) COL3(s) COL4(b)
------- ------- ------- -------
1 1.1 one true
2 2.2 two true
3 3.3 three true
Dims = 4 x 3
true
from qframe.
Man, you are absolutely awesome, thanks for doing that.
My use case is as below - I take a bunch of data from the Twitter API, calculate some stats and then output it to a QFrame. I'll save that to a DB, which will then be rendered in some BI dashboards.
I am 100% confident my code is not very efficient, but I am just gtting started with Golang, so if you have any comments, I would be very appreciative.
package main
import "fmt"
import "strings"
import "net/url"
import "github.com/tobgu/qframe"
import "math"
import "github.com/tobgu/qframe/config/groupby"
func getPosts(profileName string, post_channel chan string) {
api := connect()
v, _ := url.ParseQuery("screen_name="+profileName+"&count=5&include_rts=False&tweet_mode=extended")
searchResult, _ := api.GetUserTimeline(v)
id_list := []string{}
date_list := []string{}
fav_list := []int{}
rt_list := []int{}
age_list := []float64{}
hour_list := []string{}
day_list := []string{}
handle_list := []string{}
interaction_list := []int{}
is_max := []string{}
is_max_fav := []string{}
random_id := []int{}
media := []int{}
for _, value := range searchResult {
id := value.IdStr
CreatedAt := value.CreatedAt
FavoriteCount := value.FavoriteCount
RetweetCount := value.RetweetCount
interactionCount := FavoriteCount+RetweetCount
Posted := CalcAge(CreatedAt)
CreatedDate := CleanDate(CreatedAt)
hour := extract_hour(CreatedAt)
day := strings.Split(CreatedAt, " ")[0]
rounded := math.Floor(Posted*100)/100 //rounds number
id_list = append(id_list, id)
date_list = append(date_list, CreatedDate)
fav_list = append(fav_list, FavoriteCount)
rt_list = append(rt_list, RetweetCount)
age_list = append(age_list, rounded)
hour_list = append(hour_list, hour)
day_list = append(day_list, day)
handle_list = append(handle_list, profileName)
interaction_list = append(interaction_list, interactionCount)
random_id = append(random_id, 1)
media = append(media, len(value.ExtendedEntities.Media))
}
total_retweets := sumRetweets(rt_list)
total_favorites := sumFavorites(fav_list)
max_rt := maxRetweets(rt_list)
max_fav := maxFav(fav_list)
fmt.Println(total_retweets)
fmt.Println(total_favorites)
fmt.Println(max_rt)
//loop back through and assess whether that tweet has the highest retweet count
for _, value := range searchResult {
RetweetCount := value.RetweetCount
if RetweetCount == max_rt {
is_max = append(is_max, "YES")
} else {
is_max = append(is_max, "NO")
}
}
//loop back through and assess whether that tweet has the highest favorite count
for _, value := range searchResult {
FavoriteCount := value.FavoriteCount
if FavoriteCount == max_fav {
is_max_fav = append(is_max_fav, "YES")
} else {
is_max_fav = append(is_max_fav, "NO")
}
}
//Store output to a map
f := qframe.New(map[string]interface{}{
"media_included": media,
"TweetID": id_list,
"random_id": random_id,
"Handle": handle_list,
"CreatedAt": date_list,
"Age": age_list,
"FavoriteCount": fav_list,
"FavMax": is_max_fav,
"RetweetCount": rt_list,
"RetweetMax": is_max,
"interactionCount": interaction_list,
"hour": hour_list,
"day": day_list,
})
fmt.Println(f)
datesum := func(xx []int) int {
result := 0
for _, x := range xx {
result += x
}
return result
}
//aggregate for each date, sum favorites, retweets and total interactions
g := f.GroupBy(groupby.Columns("CreatedAt")).Aggregate(qframe.Aggregation{Fn: datesum, Column: "interactionCount"},
qframe.Aggregation{Fn: datesum, Column: "RetweetCount"},
qframe.Aggregation{Fn: datesum, Column: "FavoriteCount"},
qframe.Aggregation{Fn: datesum, Column: "random_id"})
fmt.Println(g)
post_channel <- "Detailed Profile Complete"
}
from qframe.
from qframe.
I've reworked the piece of code that you posted above a bit to (as I see it) increase readability, make better use of QFrame and some other improvements. I'm not 100% sure it compiles and works since some of the code required to run it is missing. It should still give some hints of potential improvements. I've also added a bunch of comments to explain my reasoning, they're all prefixed with dashes (-).
package main
// - Single import statement with ordered list of imports
import (
"fmt"
"github.com/tobgu/qframe"
"github.com/tobgu/qframe/config/groupby"
"math"
"net/url"
)
// - Use camel case variable and function names, start with lower case if not intended to be public.
// - In general avoid suffixing or prefixing variable names with the type name (list in this example, which
// is really a slice). That may be used in dynamic langs such as Python but is given by the static type in Go.
// - I've removed a bunch of temporary/intermediate variables that I did not think contributed to clarity but
// only to line count. This is a matter of taste though.
// - Always run "go fmt" on your code to make the formatting consistent. Also consider adding something like
// https://github.com/golangci/golangci-lint to your builds as that will produce results that you can learn from.
func intMax(x, y int) int {
if x > y {
return x
}
return y
}
func getPosts(profileName string, done chan string) {
api := connect()
// - Always check for errors in production code.
v, _ := url.ParseQuery("screen_name=" + profileName + "&count=5&include_rts=False&tweet_mode=extended")
searchResult, _ := api.GetUserTimeline(v)
// - Consider replacing the below declarations with something like:
// ids := make([]string, 0, len(searchResults))
// That will preallocate the underlying array of the slice to be of the correct length
ids := []string{}
dates := []string{}
favs := []int{}
rts := []int{}
ages := []float64{}
hours := []string{}
days := []string{}
handles := []string{}
interactions := []int{}
randomId := []int{}
media := []int{}
totalRetweets, totalFavourites, maxRt, maxFav := 0, 0, 0, 0
for _, value := range searchResult {
ids = append(ids, value.IdStr)
dates = append(dates, cleanDate(value.CreatedAt))
favs = append(favs, value.FavoriteCount)
rts = append(rts, value.RetweetCount)
roundedAge := math.Floor(calcAge(value.CreatedAt)*100) / 100
ages = append(ages, roundedAge)
hours = append(hours, extractHour(createdAt))
days = append(days, extractDay(createdAt))
handles = append(handles, profileName)
interactions = append(interactions, value.FavoriteCount + value.RetweetCount)
randomId = append(randomId, 1)
media = append(media, len(value.ExtendedEntities.Media))
totalFavourites += value.FavoriteCount
totalRetweets += value.RetweetCount
maxFav = intMax(maxFav, value.FavouriteCount)
maxRt = intMax(maxRt, value.RetweetCount)
}
fmt.Println(totalFavourites)
fmt.Println(totalRetweets)
fmt.Println(maxFav)
fmt.Println(maxRt)
// Load data into qframe
f := qframe.New(map[string]interface{}{
"mediaIncluded": media,
"tweetID": ids,
"randomId": randomId,
"handle": handles,
"createdAt": dates,
"age": ages,
"favoriteCount": favs,
"retweetCount": rts,
"interactionCount": interactions,
"hour": hours,
"day": days,
})
// - Replace explicit loops with "in-frame" operations to tag max favourites and retweets
isEqual := func(expectedVal int) func(int) string {
return func(val int) string {
if expectedVal == val {
return "YES"
}
return "NO"
}
}
f = f.Apply(
qframe.Instruction{Fn: isEqual(maxFav), DstCol: "favouriteMax", SrcCol1: "favoriteCount"},
qframe.Instruction{Fn: isEqual(maxRt), DstCol: "retweetMax", SrcCol1: "retweetCount"})
fmt.Println(f)
// aggregate for each date, sum favorites, retweets and total interactions
// - Replace custom sum function with QFrame built in "sum"
g := f.GroupBy(groupby.Columns("createdAt")).Aggregate(
qframe.Aggregation{Fn: "sum", Column: "interactionCount"},
qframe.Aggregation{Fn: "sum", Column: "retweetCount"},
qframe.Aggregation{Fn: "sum", Column: "favoriteCount"},
qframe.Aggregation{Fn: "sum", Column: "randomId"})
fmt.Println(g)
done <- "Detailed Profile Complete"
}
from qframe.
from qframe.
Closing this as it seems resolved.
from qframe.
Related Issues (20)
- Joining dataframes HOT 3
- Possibly add a qframe.ReadParquet method HOT 1
- Group by not working HOT 1
- Referencing columns from QF HOT 11
- "row" operations HOT 5
- Apply function issues HOT 2
- Passing a struct or a map into the df creation HOT 6
- INSERT INTO table (id, name, age) VALUES(1, "A", 19) ON DUPLICATE KEY UPDATE HOT 2
- Groupby error HOT 2
- error in READCSV HOT 2
- Increasing Column width HOT 1
- Can't extend Column type HOT 3
- Support for multiple aggregations HOT 4
- Not able to read CSV with multiple empty (duplicate column names) HOT 3
- Unable to iterate through the GroupBy Grouper object HOT 8
- time.Time column type HOT 4
- Add equivalent of `pandas`.`read_html` HOT 1
- how to achieve multi index ?
- Quotes not handled correctly when <CR><LF> is used as line delimiter
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from qframe.