In this blog, we will learn about implementing a Text Recognizer Using Camera and Firebase ML Kit.
With the updated firebase release, the developers have released new powers to image processing in a very easy and resource-friendly way.
Now, If you want you can use image processing and machine learning techniques in your application very easily using Firebase ML kit.
ML Kit is a mobile SDK that brings Google’s machine learning expertise to Android and iOS apps in a powerful yet easy-to-use package. Whether you’re new or experienced in machine learning, you can implement the functionality you need in just a few lines of code. There’s no need to have deep knowledge of neural networks or model optimization to get started. On the other hand, if you are an experienced ML developer, ML Kit provides convenient APIs that help you use your custom TensorFlow Lite models in your mobile apps.
— Firebase Developer Guide
Well, the firebase ML kit contains 5 options currently, which are :
- Recognize Text
- Face Detection
- Barcode Scanning
- Label images
- Recognize Landmarks
We will currently focus on how we can recognize the text using the camera of an Android device.
Just a precap of what you will be able to do after reading this blog :
Things you need to get started :
- Android Studio 3.1(or a newer version).
- Google Account to link firebase.
With these, you are good to go.
Before you start coding :
- Open build.gradle of your root android project and check these lines
12345dependencies {// ...classpath 'com.android.tools.build:gradle:3.1.2'classpath 'com.google.gms:google-services:3.2.0'}
These versions are the minimum value required for the ML kit to function smoothly. - Open build.gradle of your module and add these dependencies
1234dependencies {// ...implementation 'com.google.firebase:firebase-ml-vision:16.0.0'} - Open your Manifest File and Add These lines
123456789101112//...<uses-feature android:name="android.hardware.camera" /><uses-feature android:name="android.hardware.camera.autofocus" />//...<application ...>...<meta-dataandroid:name="com.google.firebase.ml.vision.DEPENDENCIES"android:value="text" /><!-- To use multiple models: android:value="label,text" --></application>
After these initial steps, you are good to start writing code for the text recognizer.
APPROACH
- You need to create custom view classes CameraSourcePreview, GraphicOverlay to interact with the camera hardware and manage the content on the device screen.
For quicker integration, you can check the official sample project and use their classes from here
Classes you will need to include in your project: CameraSource.java, CameraSourcePreview.java, FrameMetadata.java, GraphicOverlay.java, VisionImageProcessor.java, VisionProcessorBase.java, TextRecognitionProcessor.java - After you have successfully added these classes to your project, now you just need to create an activity/ fragment and manage these classes and results from firebase.
- The results will be received as an object of FirebaseVisionText class and you will need to parse them as per your use case.
CODE
I have named my Activity as LauncherActivity.
Xml File :->
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 |
<?xml version="1.0" encoding="utf-8"?> <RelativeLayout xmlns:android="http://schemas.android.com/apk/res/android" xmlns:tools="http://schemas.android.com/tools" android:id="@+id/fireTopLayout" android:layout_width="match_parent" android:layout_height="match_parent" android:background="#000" android:keepScreenOn="true" android:orientation="vertical"> <com.webkul.mobikul.mlkitdemo.customviews.CameraSourcePreview android:id="@+id/Preview" android:layout_width="match_parent" android:layout_height="match_parent" android:layout_alignParentLeft="true" android:layout_alignParentStart="true" android:layout_alignParentTop="true"> <com.webkul.mobikul.mlkitdemo.customviews.GraphicOverlay android:id="@+id/Overlay" android:layout_width="match_parent" android:layout_height="match_parent" android:layout_alignParentBottom="true" android:layout_alignParentLeft="true" android:layout_alignParentStart="true" android:layout_alignParentTop="true" /> </com.webkul.mobikul.mlkitdemo.customviews.CameraSourcePreview>> <RelativeLayout android:id="@+id/control" android:layout_width="match_parent" android:layout_height="60dp" android:layout_alignParentBottom="true" android:layout_alignParentLeft="true" android:layout_alignParentStart="true" android:layout_toEndOf="@id/Preview" android:layout_toRightOf="@id/Preview" > <TextView android:id="@+id/resultsMessageTv" android:layout_width="wrap_content" android:layout_height="wrap_content" android:layout_centerHorizontal="true" android:layout_centerVertical="true" android:drawableTint="@android:color/white" android:padding="6dp" android:text="@string/results_found" android:textColor="@android:color/white" /> </RelativeLayout> <LinearLayout android:id="@+id/resultsContainer" android:layout_width="match_parent" android:layout_height="wrap_content" android:gravity="bottom" android:layout_above="@id/control" > <android.support.v7.widget.RecyclerView android:id="@+id/results_spinner" android:layout_width="match_parent" android:layout_height="match_parent" tools:listitem="@layout/camera_simple_spinner_item" /> </LinearLayout> </RelativeLayout> |
Java Class File( LauncherActivity ) : ->
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 |
public class LauncherActivity extends AppCompatActivity { private static final String TAG = "LauncherActivity"; private CameraSourcePreview preview; // To handle the camera private GraphicOverlay graphicOverlay; // To draw over the camera screen private CameraSource cameraSource = null; //To handle the camera private RecyclerView resultSpinner;// To display the results recieved from Firebase MLKit private static final int PERMISSION_REQUESTS = 1; // to handle the runtime permissions private List<String> displayList; // to manage the adapter of the results recieved private ResultAdapter displayAdapter; // adapter bound with the result recycler view ---> Contains a simple textview with background private TextView resultNumberTv;// to display the number of results private LinearLayout resultContainer;// just another layout to maintain the symmetry @Override protected void onCreate(Bundle savedInstanceState) { super.onCreate(savedInstanceState); setContentView(R.layout.activity_launcher); // getting views from the xml resultNumberTv = (TextView) findViewById(R.id.resultsMessageTv); resultContainer = (LinearLayout) findViewById(R.id.resultsContainer); resultSpinner = (RecyclerView) findViewById(R.id.results_spinner); preview = (CameraSourcePreview) findViewById(R.id.Preview); graphicOverlay = (GraphicOverlay) findViewById(R.id.Overlay); // intializing views displayList = new ArrayList<>(); resultSpinner.setLayoutManager(new LinearLayoutManager(LauncherActivity.this, LinearLayoutManager.VERTICAL, false)); displayAdapter = new ResultAdapter(LauncherActivity.this, displayList); resultSpinner.setAdapter(displayAdapter); resultContainer.getLayoutParams().height = (int) (Resources.getSystem().getDisplayMetrics().heightPixels * 0.65); resultNumberTv.setText(getString(R.string.x_results_found, displayList.size())); if (preview == null) { Log.d(TAG, " Preview is null "); } if (graphicOverlay == null) { Log.d(TAG, "graphicOverlay is null "); } if (allPermissionsGranted()) { createCameraSource(); } else { getRuntimePermissions(); } } @Override protected void onResume() { super.onResume(); startCameraSource(); } @Override protected void onPause() { super.onPause(); preview.stop(); } @Override public void onDestroy() { super.onDestroy(); if (cameraSource != null) { cameraSource.release(); } } // Actual code to start the camera private void startCameraSource() { if (cameraSource != null) { try { if (preview == null) { Log.d(TAG, "startCameraSource resume: Preview is null "); } if (graphicOverlay == null) { Log.d(TAG, "startCameraSource resume: graphOverlay is null "); } preview.start(cameraSource, graphicOverlay); } catch (IOException e) { Log.d(TAG, "startCameraSource : Unable to start camera source." + e.getMessage()); cameraSource.release(); cameraSource = null; } } } // Function to check if all permissions given by the user private boolean allPermissionsGranted() { for (String permission : getRequiredPermissions()) { if (!isPermissionGranted(this, permission)) { return false; } } return true; } // List of permissions required by the application to run. private String[] getRequiredPermissions() { return new String[]{android.Manifest.permission.CAMERA, android.Manifest.permission.INTERNET, Manifest.permission.WRITE_EXTERNAL_STORAGE}; } // Checking a Runtime permission value private static boolean isPermissionGranted(Context context, String permission) { if (ContextCompat.checkSelfPermission(context, permission) == PackageManager.PERMISSION_GRANTED) { Log.d(TAG, "isPermissionGranted Permission granted : " + permission); return true; } Log.d(TAG, "isPermissionGranted: Permission NOT granted -->" + permission); return false; } // getting runtime permissions private void getRuntimePermissions() { List<String> allNeededPermissions = new ArrayList<>(); for (String permission : getRequiredPermissions()) { if (!isPermissionGranted(this, permission)) { allNeededPermissions.add(permission); } } if (!allNeededPermissions.isEmpty()) { ActivityCompat.requestPermissions( this, allNeededPermissions.toArray(new String[0]), PERMISSION_REQUESTS); } } // Function to create a camera source and retain it. private void createCameraSource() { // If there's no existing cameraSource, create one. if (cameraSource == null) { cameraSource = new CameraSource(this, graphicOverlay); } try { cameraSource.setMachineLearningFrameProcessor(new TextRecognitionProcessor(this)); } catch (Exception e) { Log.d(TAG, "createCameraSource can not create camera source: " + e.getCause()); e.printStackTrace(); } } // updating and displaying the results recieved from Firebase Text Processor Api public void updateSpinnerFromTextResults(FirebaseVisionText textresults) { List<FirebaseVisionText.Block> blocks = textresults.getBlocks(); for (FirebaseVisionText.Block eachBlock : blocks) { for (FirebaseVisionText.Line eachLine : eachBlock.getLines()) { for (FirebaseVisionText.Element eachElement : eachLine.getElements()) { if (!displayList.contains(eachElement.getText()) && displayList.size() <= 9) { displayList.add(eachElement.getText()); } } } } resultNumberTv.setText(getString(R.string.x_results_found, displayList.size())); displayAdapter.notifyDataSetChanged(); } } |
TextRecognitionProcessor : –>
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 |
public class TextRecognitionProcessor extends VisionProcessorBase<FirebaseVisionText> { private static final String TAG = "TextRecognitionProcessr"; private final FirebaseVisionTextDetector detector; private final LauncherActivity activityInstance; public TextRecognitionProcessor(LauncherActivity activity) { detector = FirebaseVision.getInstance().getVisionTextDetector(); activityInstance = activity; } @Override public void stop() { try { detector.close(); } catch (IOException e) { Log.e(TAG, "Exception thrown while trying to close Text Detector: " + e); } } @Override protected Task<FirebaseVisionText> detectInImage(FirebaseVisionImage image) { return detector.detectInImage(image); } @Override protected void onSuccess( @NonNull FirebaseVisionText results, @NonNull FrameMetadata frameMetadata, @NonNull GraphicOverlay graphicOverlay) { graphicOverlay.clear(); activityInstance.updateSpinnerFromTextResults(results); } @Override protected void onFailure(@NonNull Exception e) { Log.w(TAG, "Text detection failed." + e); } } |
ResultAdapter :–>
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
public class ResultAdapter extends RecyclerView.Adapter<ResultAdapter.ViewHolder> { private Context context; private List<String> labelList; public ResultAdapter(Context context, List<String> labelList) { this.context = context; this.labelList = labelList; } @NonNull @Override public ViewHolder onCreateViewHolder(@NonNull ViewGroup parent, int viewType) { View view = LayoutInflater.from(context).inflate(R.layout.camera_result_item, parent, false); return new ViewHolder(view); } @Override public void onBindViewHolder(@NonNull ViewHolder holder, final int position) { ((TextView) holder.itemView.findViewById(R.id.label_tv)).setText(labelList.get(position)); } @Override public int getItemCount() { return labelList.size(); } public class ViewHolder extends RecyclerView.ViewHolder { public ViewHolder(View itemView) { super(itemView); } } } |
camera_result_item (Xml File) : –>
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
<?xml version="1.0" encoding="utf-8"?> <LinearLayout xmlns:android="http://schemas.android.com/apk/res/android" xmlns:tools="http://schemas.android.com/tools" android:layout_height="wrap_content" android:layout_width="match_parent" android:orientation="vertical" > <TextView android:id="@+id/label_tv" android:layout_width="match_parent" android:layout_height="wrap_content" android:minHeight="48dp" tools:text="label"/> <View android:layout_width="match_parent" android:layout_height="1dp" android:background="#a8ffffff" /> </LinearLayout> |
Some other demos :
Sources :
https://firebase.google.com/docs/ml-kit/android/recognize-text
https://github.com/firebase/quickstart-android/tree/master/mlkit/
Keep coding and Keep Sharing 🙂