I got really drunk yesterday as my friends came over so nothing much to learn.
So let’s just complete the task given yesterday and see if it works or not.
Meet friends at coffee shops rather than pubs.
See the Problem : Today’s Problem
Response Needed
For each file:
File: file1.txt
Lines: 120
Words: 945
Unique Words: 348
Top 5 Words:
1. the – 52
2. java – 41
3. and – 35
4. you – 33
5. code – 31
At the end of the report:
=== Benchmark: Analysis Completed ===
Time: 1,238,912 ns
Memory Used: 1,045,672 bytes
Files Processed: 5
Solving the Part 1 : File Analyser
fileAnalyser.java
package day2;
import java.io.File;
public class fileAnalyser {
public static String getCurrentWorkingDirectory(){
return System.getProperty("user.dir");
}
public static File[] getAllFilesFromAnAddress(String fileAddress){
File folder = new File(fileAddress);
File[] listOfFiles = folder.listFiles();
return listOfFiles;
}
public static boolean checkFileExtension(File file, String extension){
return file.getName().endsWith(extension);
}
public static void main(String[] args) {
String fileAddress = getCurrentWorkingDirectory() + "/day2/files";
File[] files = getAllFilesFromAnAddress(fileAddress);
fileStats.IfileStats[] fileStatsArr = new fileStats.IfileStats[files.length];
int i =0;
for (File file : files ){
if(!checkFileExtension(file, ".txt")) continue;
fileStats.IfileStats _fileStats = fileStats.getFileState(file);
fileStatsArr[i++] = _fileStats;
}
for (fileStats.IfileStats _fileStats : fileStatsArr){
System.out.print(_fileStats.toString());
}
}
}
fileStats.java
In this Imp things to know are
-
StringBuilder
-
PriorityQueue
-
How Sorting Works
-
How to iterate over Objects in Java
-
As I am mostly experienced in Nodejs thus I have been using the map , reduce and filter most of the time
-
There are still some issue in it but Still this works okay.
-
Tomorrow I guess I’ll have to continue it
package day2;
import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.util.*;
public class fileStats {
public static class IfileStats {
int numberOfLines;
int numberOfWords;
String longestWord;
String mostFrequentWord;
int mostFrequentWordCount;
int numberOfUniqueWords;
String fileName;
HashMap<String, Integer> wordFrequency;
PriorityQueue<Map.Entry<String, Integer>> topWords;
public IfileStats() {
this.numberOfLines = 0;
this.numberOfWords = 0;
this.numberOfUniqueWords = 0;
this.longestWord = "";
this.mostFrequentWord = "";
this.mostFrequentWordCount = 0;
this.fileName = "";
this.wordFrequency = new HashMap<>();
// Read More !!
this.topWords = new PriorityQueue<>((a,b)-> b.getValue().compareTo(a.getValue())
);
}
@Override
public String toString() {
StringBuilder output = new StringBuilder();
output.append("File: ").append(this.fileName).append("\n");
output.append("Lines: ").append(this.numberOfLines).append("\n");
output.append("Words: ").append(this.numberOfWords).append("\n");
output.append("Unique Words: ").append(this.numberOfUniqueWords).append("\n");
output.append("Top 5 Words:\n");
for (int i = 0; i < 5; i++) {
if(topWords.isEmpty()) break;
Map.Entry<String, Integer> entry = topWords.poll();
output.append(i + 1 + ". " + entry.getValue() + " - " + entry.getKey()+"\n");
}
return output.toString();
}
}
public static IfileStats getFileState(File file) {
IfileStats fileStats = new IfileStats();
fileStats.fileName = file.getName();
try {
BufferedReader reader = new BufferedReader(new FileReader(file));
while (true) {
String line = reader.readLine();
if (line == null) break;
fileStats.numberOfLines++;
// In the getFileState method, replace the simple split with regex pattern
// Change this line:
// for (String word : line.split(" ")) {
// To this pattern that handles multiple cases:
for (String word : line.toLowerCase().split("\\s+|[^a-zA-Z0-9]+")) {
if (word.isEmpty()) continue; // Skip empty strings
int wordCount = fileStats.wordFrequency.getOrDefault(word, 0) + 1;
int wordLength = word.length();
// this is not working as expected
if (wordLength > 2 && wordCount > fileStats.mostFrequentWordCount) {
fileStats.mostFrequentWordCount = wordCount;
fileStats.mostFrequentWord = word;
}
if (wordLength > fileStats.longestWord.length()) {
fileStats.longestWord = word;
}
fileStats.wordFrequency.put(word, wordCount);
fileStats.numberOfWords++;
}
}
fileStats.numberOfUniqueWords = fileStats.wordFrequency.size();
fileStats.wordFrequency.forEach((w, c) -> {
fileStats.topWords.add(Map.entry(w, c));
});
reader.close();
} catch (Exception error) {
System.out.println("Error: " + error);
return fileStats;
}
return fileStats;
}
}
[]()This was all done in day2 : Not a very interesting day
Things learned : Maps, Priorityqueue , sorting
Day’s efficiency : 2/10
Tomorrow I should
-
Complete the benchmark problem of this
-
Run the file analyser in parallel
-
know about streaming Java Streamsand do this again in streaming
-
You can also take the bank’s csv analyser in here !! (Just saying → that could be fun )