Code Monkey home page Code Monkey logo

screen's Introduction

Instructions

  • Build this docker image
  • Run bash as the command with an iteractive tty to get into the image:
docker run --rm -it ${whatever-you-named-the-image} /bin/bash
  • The data is in the directory /root/data on said image
  • Create a Pull Request with your code for review

You're free to use whatever language you want just as long as you include the instructions on how to run your code. (Bonus points if you modify the Dockerfile instead)

Note that you do not have to use a Big Data stack like Hadoop or Spark. If you do use those, provide either a docker-swarm or kubernetes configuration file(s) in your Pull Request that will setup the cluster or else we won't be able to run the code

Questions

what's the average number of fields across all the .csv files?

output should be a simple number

sample output

5

shell command to generate the average number of fields

file_cnt=`find . -name '*.csv'|wc -l`;for file in `find . -name '*.csv'`;do cat $file|perl -pe 's/\r(?!\n)/\r\n/g'|head -n 1;done|perl -wnlp -e 's/\t/,/g;'|perl -pe 's/,/\n/g' |sort |uniq -c |awk '{$1=$1};1'|sed 's/ /|/1'|awk -F $'|' ' { t = $1; $1 = $2; $2 = t; print; } ' OFS=$','|sed 's/^M//g'|awk -F "\"*,\"*" '{print $2}'|awk '{s+=$1} END {print s}'|{ bc | tr -d '\n' ; echo ",$file_cnt"; }|awk -F $',' ' { printf("%.0f\n", $1/$2) } '

Note: Above command needs to be executed in root/data directory

create a csv file that shows the word count of every value of every dataset (dataset being a .csv file)

output should be a csv file that has a header row with fields value and count and one entry for every value found:

sample output

value,count
some value,435
another value,234
word,45
...

shell command to generate word count of CSV files excluding header

echo "value,count" > wordcount.dat ; for file in `find . -name '*.csv'`;do cat $file|perl -pe 's/\r(?!\n)/\r\n/g'|tail -n +2;done|perl -wnlp -e 's/\t/,/g;'|perl -pe 's/,/\n/g' |sort |uniq -c |awk '{$1=$1};1'|sed 's/ /|/1'|awk -F $'|' ' { t = $1; $1 = $2; $2 = t; print; } ' OFS=$',' >> wordcount.dat

Note: Output is stored in wordcount.dat in root/data directory; Perform vi wordcount.dat or cat wordcount.dat in the command line to view the output

what's the total number or rows for the all the .csv files?

output should be a simple number

sample output

1000000000

shell command to generate total number rows in all CSV files excluding header

for file in `find . -name '*.csv'`;do cat $file|perl -pe 's/\r(?!\n)/\r\n/g'|tail -n +2;done|wc -l

screen's People

Contributors

andres-lowrie avatar svairavelu avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.