Looking for Big Data Project Presentation.

Our academic writers are ready and waiting to assist with any assignment you may have. From simple essays to full dissertations, you're guaranteed we've got a writing expert to perfectly match your needs.


Order a Similar Paper Order a Different Paper

Looking for Big Data Project Presentation.

Looking for Big Data Project Presentation.
DSA 5620: Group Presentations in AI Applications Each group should pick one of the topics below to investigate, work on, and prepare a 10-15 mins presentation to their class in the last couple of weeks of classes. Each group must pick a topic from one of the three categories below. Topics – 3 Categories: ● Category 1: Implement a MapReduce application to perform one of the following: a) Matrix Multiplication. b) Relational algebra Selection and Projection (set-based ‘no duplicates’ version, and bag- based ‘with duplicates’ version) c) Relational algebra Union, Intersection, and Difference (set- & bag-based if applicable) d) Relational algebra Natural Join operation Notes: 1. Please refer to chapter 2 from Mining Massive Datasets that can be freely accessed on the book website, which outlines the necessary processing by mappers and reducers to perform each of the operations above. http://www.mmds.org/ 2. Don’t make any assumption about the number of input files or their filenames. The entries from both matrices could appear in any order in the file(s). Of course, this requires storing additional information in the data files such as the matrix name, and the indices of each entry in addition to the values of the entries. For example: Let A and B be two matrices, given below, and we would like to find their multiplication C = AB. A is a 2×2 matrix while B is a 2×1 matrix (vector) Matrix A Matrix B 0 1 0 0 25 9 0 44 1 31 17 1 13 The entries from both matrices can be stored in one, or more files. The data files can show entries from either matrix in any order. Each line represent one entry from either matrix. For example, one possible content of the input data files: A, 0, 0, 25 B, 1, 0, 13 A, 1, 1, 17 A, 0, 1, 9 A, 1, 0, 31 B, 0, 0, 44 3. The note above also holds for relations in relational algebra (tables in SQL). For example, for natural join, rows/tuples from operand tables can appear in the same file or in different files, in one, two, or more files. Of course this requires the table name to be stored in the data files. 4. The shape of the input matrices and the schema of the input relations (tables) to the mappers and reducers must be passed as additional input upon job submission. Please consider using the job object to pass this additional input to the mappers and reducers, using: job.getConfiguration().set() // in the driver code context.getConfiguration().get() // in the setup() method of the mapper and reducer ● Category 2: Hadoop ecosystem: Kafka, Flume, HBase, Storm, etc. ● Category 3: Other big data solutions: Snowflake, Elasticsearch, Amazon Redshift, etc. Instructions: 1. Each groups must pick a project from one of the three categories above: MapReduce application, a tool from Hadoop ecosystem, or a big data platform. 2. If you choose MapReduce: (a) You have to submit the sourcecode files as well. (b) In your talk you have to cover the code, how it works, and do a sample run in the front of the class. Please prepare the necessary input data files to test your code and confirm if it generates the correct output. 3. If you choose a tool/framework from the Hadoop ecosystem, you have to cover: (a) The main components/daemons of the tool, what exactly needs to be running to use it. (b) What it is used for? What kind or processing? Alternative tools that serve the same purpose if any. (c) A practical simple example on how we use the tool: code, scripting language, commands, etc. (d) Your presentation should be enough for anyone to know the basics of the tool and start using it for simple processing. 4. If you choose a big data platform: (a) Same as above: a, b, c, and d. 5. Scoring: ◦ 50 points: overall quality of slides, presentation, and talk, ◦ 50 points: code and demo, system components (if applicable), daemons, MapReduce code and how it works, etc. 6. At least two group members should give the presentation, using the same laptop/machine. Hopefully we can squeeze each talk between 10 to 15 mins, to give time for all of the groups. 7. Expected time: each group should have the code (if any) and slides ready within 2 to 3 weeks. 8. Final note: please don’t worry much about your score and focus on exploring and learning something new. It should be an exciting experience for the whole class including myself.

Writerbay.net

Do you need help with this or a different assignment? In a world where academic success does not come without efforts, we do our best to provide the most proficient and capable essay writing service. After all, impressing professors shouldn’t be hard, we make that possible. If you decide to make your order on our website, you will get 15 % off your first order. You only need to indicate the discount code GET15.


Order a Similar Paper Order a Different Paper