Comments (9)
For N-Hot, the input should be a Seq[String]
, so say if we assign Sun, Mon, Tue, ... Sat to 0-6, we'll get [0, 1, 0, 1, 0, 0, 0]
for Seq("Mon", "Wed")
.
What's the desired input/output you want?
from featran.
Indeed. So the thing is I want to aggregate the String
on the counts of this Seq[String]
.
I want top-k n-hot encoding of topics. Let's say I have inputs of Seq("News", "Politics", "Underwater Basket Weaving", "The Year 564 in North America")
out of 100 topics so a n-hot encoding would look like[1,1,1,1,0x96]
. But let's say k=10 in a top-k n hot encoding, and only politics and news is in the top 10 most aggregated topics, so in a top 100 n hot encoding we would have [1,1, 0x8]
, I want to aggregate over strings not Seq[String]
from featran.
I want to aggregate over strings not Seq[String]
This part I don't understand. Each input row has only 1 string topic? And Seq("News", "Politics", "Underwater Basket Weaving", "The Year 564 in North America")
is 4 input rows, not a single row with 4 topics? In that case the output would be 4 rows
- [1, 0x9]
- [0, 1, 0x8]
- [0, 0, 1, 0x7]
- [0, 0, 0, 1, 0x6]
from featran.
It is single row with 4 topics. I want to count the total number of each topic, for ex: Politics: 1034, News: 533. Underwater Basket Weaving: 5
, and filter the input which is of type Seq[String]
based on these aggregation of strings.
from featran.
If that's not the case, and the Seq(...)
is a single row input with multiple topics. Then the 1st type param A
of Transformer
should be Seq[String]
. You got that part in extends Transformer[Seq[String]
.
And the aggregator should use the same type A
@transient override lazy val aggregator
: Aggregator[Seq[String], SketchMap[String, Long], SortedMap[String, Int]]
Note Seq[String]
instead of String
for the 1st type param. We'll have to implement the aggregator differently though instead of using SketchMap
directly, and provide custom prepare function from Seq[String]
to SkeptchMap[String, Long]
, does that make sense? I can come up with a snippet in a bit.
from featran.
Hmm, I somehow got into my head that the first generic type in the SketchMap
has to be of type A
, so that is not the case?
from featran.
No, A
is your input type, so Seq[String]
. B
is the intermediate summable type so same SketchMap[String, Long]
. The only difference should be Seq(a, b, c, ...) => SketchMap(a -> 1, b -> 1, ...)
instead of a => SketchMap(a -> 1)
. Make sense?
from featran.
Yep, thanks for the clarification!
from featran.
BTW feel free to submit a PR if you get it to work. Could be a nice addition.
from featran.
Related Issues (20)
- Can we use scaladoc 2.12? HOT 2
- Add documentation site with paradox HOT 1
- Add Scala Binary Compatibility validation tool – "MiMa"
- Performance issue in TensorFlow FeatureBuilder HOT 1
- PositionEncoder doesn't support input as "Seq" of Strings HOT 3
- Feature transformations order lost after filtering on a MultiFeatureSpec HOT 1
- Use JsonSerializable typeclass for FlatReader[String] and FlatWriter[String]
- Upgrade TensorFlow to 1.9.0 HOT 3
- FlatExtractor performance. HOT 1
- Add java api for FlatConverter & FaltExtractor
- `featran` root artifact published by mistake
- Switch xgboost to official release package
- Is featran thread-safe and can be intergrated in akka
- Sequential composition of transformers HOT 2
- sbt `release skip-tests` not skipping tests? HOT 1
- Add dotty cross-compile support
- Can't mix `featran-xgboost` dependency with newer versions of xgboost
- Update TensorFlow to >=2.3.1 HOT 1
- Could you help upgrade the vulnerble dependency in featran?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from featran.