How to Validate File Type Using Magic Bytes and MIME Type in Javascript
Checking File Types by Reading File Content
File uploads are a common feature in web applications. From simply accepting images for a profile picture to uploading documents for processing, inappropriate handling of uploaded files could expose your application to simple application errors, huge security vulnerabilities and a lot of undesirable outcomes in between. So ensuring the integrity of the file uploaded by your users can mitigate the risk of an uploaded profile picture containing an executable script or other malicious content.
This article will walk you through how to implement a file validation system in JavaScript with magic bytes.
What are magic bytes?
Magic byte is a sequence of bytes that appears at the beginning of a file. They are also known as the file signature and are unique to each file type.
The diagram above represents the contents of a simple PNG file in hexadecimal. The highlighted bytes make up the magic byte for this file.
Using the FileReader API, we check if these bytes match the first few bytes of the file uploaded. For this tutorial, we will work with PNG, JPEG, and WEBP as valid file formats.
Preliminary check
Before we jump into reading the raw file content, it is prudent to start from the part of the application where the upload begins - the input element. To ensure that the user can only select files with the .png, .jpeg, or .webp extensions, we specify these extensions in the accept attribute of the input element.
<input
type="file"
id="fileInput"
accept=".png,.jpeg,.jpg,.webp"
onchange="validateFile()"
/>
The magic bytes array
We store the MIME types and magic bytes, of the valid file types in an Object. Find more magic bytes here
const signatures = {
jpeg: [0xFF, 0xD8, 0xFF],
png: [0x89, 0x50, 0x4E, 0x47, 0x0D, 0x0A, 0x1A, 0x0A],
webp: [0x52, 0x49, 0x46, 0x46, 0x57, 0x45, 0x42, 0x50]
};
Note that .jpg and .jpeg both have the image/jpeg MIME type.
Obtain the mime type of the selected file
MIME (Multipurpose Internet Mail Extensions) types are basically the format of a file and are written as type/subtype (e.g. image/png). Each MIME type has its unique magic byte.
Knowing the MIME type gives us knowledge of which signature to match against.
function validateFile(e){
const file = e.target.files[0];
if (file) {
const mimeType = file.type.split('/')[1];
const allowedExtensions = ["png", "jpg", "pdf"];
if (allowedExtensions.includes(mimeType)) {
return readFileContent(file, mimeType);
} else {
console.log("Invalid file type");
return false;
}
}
return false;
}
When a file is selected, the onChange event triggers the
validateFile
functionCheck for the file's MIME type in the array of allowed MIME types. If found, then we call the
readFileContent
function to read the content of the file.
Reading the file content
Using FileReader we can read the contents of a file or blob.
function readFileContent(file, mimeType) {
const fileReader = new FileReader();
fileReader.readAsArrayBuffer(file.slice(0, signatures[mimeType].length));
fileReader.onload = (e) => {
const unit8ByteArray = new Uint8Array(e.target.result);
if (isSignatureMatch(unit8ByteArray, signatures[type])) {
console.log("File content is a valid");
return true
} else {
console.log("Invalid file content");
return false
}
};
fileReader.onerror = function() {
console.log("Error reading file");
};
}
readFileContent
accepts two arguments -file
which is the selected file, andmimiType
which tells the function the expected MIME type to be matched against.A new
FileReader
instance is created and using itsreadAsArrayBuffer
the program can start reading the file content.To avoid redundancy, we only read the needed file length using the slice operation.
When the read operation has been successfully completed, the
load
event is triggered, and the byte array will be contained in the result property.Since the byte array cannot be directly manipulated, we convert it to a unit8Array typed array.
The
unit8ByteArray
, alongside themimeType
is then passed into theisSignatureMatch
function.
The
error
event of theFileReader
interface is fired if the read fails due to an error.
Checking for a match
We simply loop through to compare the bytes.
function isSignatureMatch(unit8ByteArray, signature) {
for (let i = 0; i < signature.length; i++) {
if (unit8ByteArray[i] !== signature[i]) {
return false;
}
}
return true;
}
Why we need a promise
The validateFile
function returns undefined
. This is because file reading is asynchronous, and you have to wait for the process to be completed. This can be handled by wrapping the fileReader in a promise.
function readFileContent(file, type) {
return new Promise((resolve, reject) => {
const fileReader = new FileReader();
fileReader.readAsArrayBuffer(file.slice(0, signatures[type].length));
fileReader.onload = (e) => {
const unit8ByteArray = new Uint8Array(e.target.result);
if (isSignatureMatch(unit8ByteArray, signatures[type])) {
resolve({
success: true,
message: "file is valid"
});
} else {
reject(new Error({
success: false,
message: "file content is compromised"
}));
}
};
fileReader.onerror = function() {
reject(new Error({
success: false,
message: "Error reading file"
}));
};
});
}
Now the function returns a promise, that resolves or rejects with a comprehensive response.
The validateFile can then be accessed using
asyncawait
This function can be further expanded to cater for more mime types to suit the needs of the application.
For robust file validation, proper file handling on the server side should never be neglected. Client-side validation provides immediate feedback to users, preventing unnecessary server requests for disallowed file types. However, server-side validation is essential as it acts as a safety net, double-checking the file type to prevent tampering or bypassing of client-side restrictions.